'NLP' 태그의 글 목록 (11 Page)

Imbalanced Text Data Resampling 후 huggingface 🤗 로 학습하기

Imbalanced Data가 있으면 이를 Resampling 해주어야 제대로 학습이 가능하다. Resampling은 크게 (1) Undersampling (2) Oversampling으로 나눌 수 있다 예를 들어 label이 0인 데이터는 1,234개, label이 1인 데이터는 5,678개 있다면, label 0은 전체 데이터의 17.8%, label 1은 전체 데이터의 82.14%이므로 Imbalanced Data이다. 이를 (1) Undersampling하면 크기가 작은 label 0을 기준으로 데이터의 크기를 통일한다. label 0도 1,234개, label 1도 1,234개로 통일하는 것이다. (2) Oversampling하면 크기가 큰 label 1을 기준으로 데이터의 크기를 통일한다. l..

2021.06.30

Longformer: The Long-Document Transformer 논문 정리

Longformer: The Long-Document Transformer Longformer Transformer complexity O(n^2) scales linearly O(n) scales quadratically attention local windowed attention + global attention (d) self-attention (a) max length 4,096 512 [ Attention Pattern ] 1) Sliding Window • fixed-size window attention for local context • complexity: O(n × w) • n: input sequence length • w: fixed window size (layer마다 달라질 수..

2020.10.07

GPT 정리

1. GPT (Generative Pre-Training) • goal: learn a universal representation • generative pre-training (unlabeled text) + discriminative fine-tuning (labeled text) 1.1. Unsupervised pre-training 1.2. Supervised fine-tuning 2. GPT-2 • difference from BERT GPT-2 BERT Direction uni-directional auto-regression mask future tokens bi-directional Tokenizer BPE(Byte-pair Encoding) WordPiece Tokenizer Fine-..

2020.10.07

BLEU and BLEURT: evaluation for text generation 정리

BLEU and BLEURT 1. BLEU (Bilingual Evaluation Understudy, 2002) “The closer a machine translation is to a professional human translation, the better it is.” 0-1 사이의 score - BP: Brevity Penalty - p_n: modified n-gram precision - w_n: positive weights (baseline: 1/N) - N: n-gram의 전체 길이 (baseline: N=4) 1.1. modified n-gram precision - unigram precision: 7/7 - (candidate 토큰 중 reference에도 있는 토큰의 개수) ..

2020.10.07

StructBERT: Incorporating Language Structures into Pretraining for Deep Language Understandin 논문 정리

StructBERT: incorporated language structures into pre-training (사실상 language structure라기 보단 어순) 1) word-level ordering 2) sentence-level ordering 1) word-level ordering : 기존 BERT처럼 일부 토큰 masking 후, masked되지 않은 토큰 3개(trigram) 골라 순서 섞기 * 4개로 했을 때 성능 차이가 크지 않았고, robustness 고려하여 3개로 선택 → masked된 토큰의 final hidden state → softmax classifier → 본래 토큰 예측 → shuffled된 토큰들의 final hidden state → softmax clas..

2020.10.07

그림으로 보는 BERT 번역 및 정리

http://jalammar.github.io/illustrated-bert/ The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning) Discussions: Hacker News (98 points, 19 comments), Reddit r/MachineLearning (164 points, 20 comments) Translations: Chinese (Simplified), Persian The year 2018 has been an inflection point for machine learning models handling text (or more accurately, Natu jalammar.github.io BERT (..

2020.02.11

codlingual

codlingual

태그

최근글

댓글

공지사항

아카이브

NLP(76)

티스토리툴바