GPT 정리

reading

GPT 정리

codlingual 2020. 10. 7. 19:43

1. GPT (Generative Pre-Training)

• goal: learn a universal representation

• generative pre-training (unlabeled text) + discriminative fine-tuning (labeled text)

1.1. Unsupervised pre-training

1.2. Supervised fine-tuning

2. GPT-2

• difference from BERT

	GPT-2	BERT
Direction	uni-directional auto-regression mask future tokens	bi-directional
Tokenizer	BPE(Byte-pair Encoding)	WordPiece Tokenizer
Fine-Tuning	X (zero-shot)	O
Transformer	Decoder	Encoder

* auto-regression: after each token is produced, that token is added to the sequence of inputs. And that new sequence becomes the input to the model in its next step

* Tokenizer 비교: https://lovit.github.io/nlp/2018/04/02/wpm/

* zero-shot: not trained on any of the data specific to any of these tasks, only evaluated on them as a final test

• translation w/o encoder

import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel 
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

input_ids = tokenizer.encode("Are you there?", return_tensors='pt')
greedy_output = model.generate(input_ids, max_length=50)
print(tokenizer.decode(greedy_output[0], skip_special_tokens=True))

"""
[OUTPUT]
Are you there?
I'm here to help you.
"""

참고

GPT original paper

https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf

GPT-2 original paper

https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf

The Illustrated GPT-2 (Visualizing Transformer Language Models)

http://jalammar.github.io/illustrated-gpt2/

GPT-2 OpenAI blog

https://openai.com/blog/better-language-models/

Text generation code

https://huggingface.co/blog/how-to-generate

GPT-3 original paper