Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features Paper • 1703.02507 • Published Mar 7, 2017
DoGE: Domain Reweighting with Generalization Estimation Paper • 2310.15393 • Published Oct 23, 2023 • 1
DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging Paper • 2402.02622 • Published Feb 4, 2024 • 3
Faster Causal Attention Over Large Sequences Through Sparse Flash Attention Paper • 2306.01160 • Published Jun 1, 2023 • 1