Evaluation(2)
-
Micro- and Macro F1 scores
1. Micro Averaged Metrics 2. Macro Averaged Metrics
2020.10.07 -
BLEU and BLEURT: evaluation for text generation 정리
BLEU and BLEURT 1. BLEU (Bilingual Evaluation Understudy, 2002) “The closer a machine translation is to a professional human translation, the better it is.” 0-1 사이의 score - BP: Brevity Penalty - p_n: modified n-gram precision - w_n: positive weights (baseline: 1/N) - N: n-gram의 전체 길이 (baseline: N=4) 1.1. modified n-gram precision - unigram precision: 7/7 - (candidate 토큰 중 reference에도 있는 토큰의 개수) ..
2020.10.07