GPT 정리

2020. 10. 7. 19:43reading

반응형

1. GPT (Generative Pre-Training)

• goal: learn a universal representation

• generative pre-training (unlabeled text) + discriminative fine-tuning (labeled text)

 

1.1. Unsupervised pre-training

 

 

1.2. Supervised fine-tuning

 

 

 

 

2. GPT-2

 

• difference from BERT

 

GPT-2

BERT

Direction

uni-directional

auto-regression

mask future tokens

bi-directional

Tokenizer

BPE(Byte-pair Encoding)

WordPiece Tokenizer

Fine-Tuning

X (zero-shot)

O

Transformer

Decoder

Encoder

 

* auto-regression: after each token is produced, that token is added to the sequence of inputs. And that new sequence becomes the input to the model in its next step

* Tokenizer 비교: https://lovit.github.io/nlp/2018/04/02/wpm/

* zero-shot: not trained on any of the data specific to any of these tasks, only evaluated on them as a final test

 

 

 

• translation w/o encoder

 

 

 

import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel 
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

input_ids = tokenizer.encode("Are you there?", return_tensors='pt')
greedy_output = model.generate(input_ids, max_length=50)
print(tokenizer.decode(greedy_output[0], skip_special_tokens=True))

"""
[OUTPUT]
Are you there?
I'm here to help you.
"""

 

 

 

 

참고

GPT original paper

https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf

GPT-2 original paper

https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf

The Illustrated GPT-2 (Visualizing Transformer Language Models)

http://jalammar.github.io/illustrated-gpt2/

GPT-2 OpenAI blog

https://openai.com/blog/better-language-models/

Text generation code

https://huggingface.co/blog/how-to-generate

 

GPT-3 original paper

https://arxiv.org/abs/2005.14165

GPT-3 blog post

https://thenextweb.com/neural/2020/07/23/openais-new-gpt-3-language-explained-in-under-3-minutes-syndication/

GPT-3 paper explained (youtube)

https://www.youtube.com/watch?v=p24JUVgDkQk

https://www.youtube.com/watch?v=SY5PvZrJhLE

 

반응형