BLEU and BLEURT: evaluation for text generation 정리

BLEU and BLEURT: evaluation for text generation 정리

2020. 10. 7. 19:34ㆍreading

BLEU and BLEURT

1. BLEU (Bilingual Evaluation Understudy, 2002)

“The closer a machine translation is to a professional human translation, the better it is.”

0-1 사이의 score

- BP: Brevity Penalty

- p_n: modified n-gram precision

- w_n: positive weights (baseline: 1/N)

- N: n-gram의 전체 길이 (baseline: N=4)

1.1. modified n-gram precision

- unigram precision: 7/7

- (candidate 토큰 중 reference에도 있는 토큰의 개수) / (candidate 전체 토큰의 개수)

- modified unigram precision: 2/7

- combining the modified n-gram precisions

- n이 커질수록 precision 급격히 줄어듦(exponential decay) → linear avg는 부적합

- average logarith w/ uniform weights

1.2. brevity penalty

- naive recall: candidate 2보다 candidate 1이 좋은 번역으로 계산함

- reference보다 긴 candidate은 modified n-gram precision에서 이미 penalty 받음

- reference보다 짧은 candidate만 penalty 주면 됨

r: sum of the best match length for each candidate sentence in the reference corpus

c: total length of the candidate corpus

* best match length: closest reference sentence length

- len(candidate) = 13

- len(ref.1) = 12, len(ref.2) = 15, len(ref.3) = 17

→ best match length = 12

2. BLEURT (Bilingual Evaluation Understudy with Representations from Transformers, 2020)

* x: reference sentence

* x-bar: prediction(candidate) sentence

* y-bar: human rating에 대한 BERT의 prediction

Pre-Training on Synthetic Data

1) mask-filling w/ BERT

- insert mask

- fill the mask w/ language model

2) backtranslation

- 문장 형태는 다르지만 의미는 유지하도록

- (ex) 영어를 프랑스어로 번역 후 그 프랑스어 문장을 다시 영어로 번역

3) dropping words

- randomly drop words

9 pre-training tasks

- BLEU는 문장의 의미를 고려하지 못함

- BLEURT는 의미 고려 가능

참고

https://www.aclweb.org/anthology/P02-1040/

https://ai.googleblog.com/2020/05/evaluating-natural-language-generation.html

https://arxiv.org/abs/2004.04696

저작자표시 비영리 변경금지 (새창열림)

'reading' 카테고리의 다른 글

Longformer: The Long-Document Transformer 논문 정리 (0)	2020.10.07
GPT 정리 (0)	2020.10.07
StructBERT: Incorporating Language Structures into Pretraining for Deep Language Understandin 논문 정리 (0)	2020.10.07
데이터 읽기의 기술 (0)	2019.12.20
단어의 사생활 (0)	2019.10.14

codlingual

codlingual

태그

최근글

댓글

공지사항

아카이브

'reading' 카테고리의 다른 글

관련글

티스토리툴바