Transformers(3)
-
Longformer: The Long-Document Transformer 논문 정리
Longformer: The Long-Document Transformer Longformer Transformer complexity O(n^2) scales linearly O(n) scales quadratically attention local windowed attention + global attention (d) self-attention (a) max length 4,096 512 [ Attention Pattern ] 1) Sliding Window • fixed-size window attention for local context • complexity: O(n × w) • n: input sequence length • w: fixed window size (layer마다 달라질 수..
2020.10.07 -
GPT 정리
1. GPT (Generative Pre-Training) • goal: learn a universal representation • generative pre-training (unlabeled text) + discriminative fine-tuning (labeled text) 1.1. Unsupervised pre-training 1.2. Supervised fine-tuning 2. GPT-2 • difference from BERT GPT-2 BERT Direction uni-directional auto-regression mask future tokens bi-directional Tokenizer BPE(Byte-pair Encoding) WordPiece Tokenizer Fine-..
2020.10.07 -
BLEU and BLEURT: evaluation for text generation 정리
BLEU and BLEURT 1. BLEU (Bilingual Evaluation Understudy, 2002) “The closer a machine translation is to a professional human translation, the better it is.” 0-1 사이의 score - BP: Brevity Penalty - p_n: modified n-gram precision - w_n: positive weights (baseline: 1/N) - N: n-gram의 전체 길이 (baseline: N=4) 1.1. modified n-gram precision - unigram precision: 7/7 - (candidate 토큰 중 reference에도 있는 토큰의 개수) ..
2020.10.07