GPT 정리

GPT 정리

2020. 10. 7. 19:43ㆍreading

1. GPT (Generative Pre-Training)

• goal: learn a universal representation

• generative pre-training (unlabeled text) + discriminative fine-tuning (labeled text)

1.1. Unsupervised pre-training

1.2. Supervised fine-tuning

2. GPT-2

• difference from BERT

	GPT-2	BERT
Direction	uni-directional auto-regression mask future tokens	bi-directional
Tokenizer	BPE(Byte-pair Encoding)	WordPiece Tokenizer
Fine-Tuning	X (zero-shot)	O
Transformer	Decoder	Encoder

* auto-regression: after each token is produced, that token is added to the sequence of inputs. And that new sequence becomes the input to the model in its next step

* Tokenizer 비교: https://lovit.github.io/nlp/2018/04/02/wpm/

* zero-shot: not trained on any of the data specific to any of these tasks, only evaluated on them as a final test

• translation w/o encoder

import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel 
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

input_ids = tokenizer.encode("Are you there?", return_tensors='pt')
greedy_output = model.generate(input_ids, max_length=50)
print(tokenizer.decode(greedy_output[0], skip_special_tokens=True))

"""
[OUTPUT]
Are you there?
I'm here to help you.
"""

참고

GPT original paper

https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf

GPT-2 original paper

https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf

The Illustrated GPT-2 (Visualizing Transformer Language Models)

http://jalammar.github.io/illustrated-gpt2/

GPT-2 OpenAI blog

https://openai.com/blog/better-language-models/

Text generation code

https://huggingface.co/blog/how-to-generate

GPT-3 original paper

https://arxiv.org/abs/2005.14165

GPT-3 blog post

https://thenextweb.com/neural/2020/07/23/openais-new-gpt-3-language-explained-in-under-3-minutes-syndication/

GPT-3 paper explained (youtube)

https://www.youtube.com/watch?v=p24JUVgDkQk

https://www.youtube.com/watch?v=SY5PvZrJhLE

저작자표시 비영리 변경금지 (새창열림)

'reading' 카테고리의 다른 글

E-petition popularity: Do linguistic and semantic factors matter? 논문 정리 (0)	2020.10.07
Longformer: The Long-Document Transformer 논문 정리 (0)	2020.10.07
BLEU and BLEURT: evaluation for text generation 정리 (0)	2020.10.07
StructBERT: Incorporating Language Structures into Pretraining for Deep Language Understandin 논문 정리 (0)	2020.10.07
데이터 읽기의 기술 (0)	2019.12.20

codlingual

codlingual

태그

최근글

댓글

공지사항

아카이브

'reading' 카테고리의 다른 글

관련글

티스토리툴바