2020. 10. 7. 19:43ㆍreading
1. GPT (Generative Pre-Training)
• goal: learn a universal representation
• generative pre-training (unlabeled text) + discriminative fine-tuning (labeled text)
1.1. Unsupervised pre-training
1.2. Supervised fine-tuning
2. GPT-2
• difference from BERT
|
GPT-2 |
BERT |
Direction |
uni-directional auto-regression mask future tokens |
bi-directional |
Tokenizer |
BPE(Byte-pair Encoding) |
WordPiece Tokenizer |
Fine-Tuning |
X (zero-shot) |
O |
Transformer |
Decoder |
Encoder |
* auto-regression: after each token is produced, that token is added to the sequence of inputs. And that new sequence becomes the input to the model in its next step
* Tokenizer 비교: https://lovit.github.io/nlp/2018/04/02/wpm/
* zero-shot: not trained on any of the data specific to any of these tasks, only evaluated on them as a final test
• translation w/o encoder
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')
input_ids = tokenizer.encode("Are you there?", return_tensors='pt')
greedy_output = model.generate(input_ids, max_length=50)
print(tokenizer.decode(greedy_output[0], skip_special_tokens=True))
"""
[OUTPUT]
Are you there?
I'm here to help you.
"""
참고
GPT original paper
https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
GPT-2 original paper
The Illustrated GPT-2 (Visualizing Transformer Language Models)
http://jalammar.github.io/illustrated-gpt2/
GPT-2 OpenAI blog
https://openai.com/blog/better-language-models/
Text generation code
https://huggingface.co/blog/how-to-generate
GPT-3 original paper
https://arxiv.org/abs/2005.14165
GPT-3 blog post
GPT-3 paper explained (youtube)
https://www.youtube.com/watch?v=p24JUVgDkQk
https://www.youtube.com/watch?v=SY5PvZrJhLE
'reading' 카테고리의 다른 글
E-petition popularity: Do linguistic and semantic factors matter? 논문 정리 (0) | 2020.10.07 |
---|---|
Longformer: The Long-Document Transformer 논문 정리 (0) | 2020.10.07 |
BLEU and BLEURT: evaluation for text generation 정리 (0) | 2020.10.07 |
StructBERT: Incorporating Language Structures into Pretraining for Deep Language Understandin 논문 정리 (0) | 2020.10.07 |
데이터 읽기의 기술 (0) | 2019.12.20 |