Longformer: The Long-Document Transformer 논문 정리
Longformer: The Long-Document Transformer Longformer Transformer complexity O(n^2) scales linearly O(n) scales quadratically attention local windowed attention + global attention (d) self-attention (a) max length 4,096 512 [ Attention Pattern ] 1) Sliding Window • fixed-size window attention for local context • complexity: O(n × w) • n: input sequence length • w: fixed window size (layer마다 달라질 수..
2020.10.07