Attention(3)
-
Longformer: The Long-Document Transformer 논문 정리
Longformer: The Long-Document Transformer Longformer Transformer complexity O(n^2) scales linearly O(n) scales quadratically attention local windowed attention + global attention (d) self-attention (a) max length 4,096 512 [ Attention Pattern ] 1) Sliding Window • fixed-size window attention for local context • complexity: O(n × w) • n: input sequence length • w: fixed window size (layer마다 달라질 수..
2020.10.07 -
그림으로 보는 Transformer 번역 및 정리
https://jalammar.github.io/illustrated-transformer/ The Illustrated Transformer Discussions: Hacker News (65 points, 4 comments), Reddit r/MachineLearning (29 points, 3 comments) Translations: Chinese (Simplified), Korean Watch: MIT’s Deep Learning State of the Art lecture referencing this post In the previous post, we looked at Atten jalammar.github.io 1) Encoder - 첫번째(맨 아래) Encoder만 word embed..
2020.02.10 -
Attention Model 번역 및 정리
출처 1) Neural Machine Translation By Jointly Learning to Align and Translate 2) Attention: Illustrated Attention 3) Attention and Memory in Deep Learning and NLP 기존 Encoder-Decoder RNN/LSTM 모델의 문제점 - 아무리 긴 input sentence가 주어져도 고정 길이 벡터fixed-length vector로 압축해서 표현해야 함 - Decoder는 Encoder의 마지막 은닉상태만 전달받음 → 엄청 긴 문장이라면 엄청 많이 까먹음 기존 Encoder-Decoder RNN/LSTM 모델의 문제점 해결 - 고정길이벡터 X - input sentence는 여러 벡터..
2020.02.10