-
Attention Is All You Need
Paper • 1706.03762 • Published • 77 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 19 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 9 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 17
Collections
Discover the best community collections!
Collections including paper arxiv:2307.09288
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 19 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published • 1 -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 2
-
Mistral 7B
Paper • 2310.06825 • Published • 51 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 243 -
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
Paper • 2309.11235 • Published • 15 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 416
-
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 373 -
Qwen2.5-Coder Technical Report
Paper • 2409.12186 • Published • 151 -
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
Paper • 2409.12122 • Published • 4 -
Qwen2.5-VL Technical Report
Paper • 2502.13923 • Published • 200
-
Qwen Technical Report
Paper • 2309.16609 • Published • 37 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 10 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 167 -
Qwen2-Audio Technical Report
Paper • 2407.10759 • Published • 60
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 624 -
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 373 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 257 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 258
-
black-forest-labs/FLUX.1-dev
Text-to-Image • Updated • 1.59M • • 11.2k -
openai/whisper-large-v3-turbo
Automatic Speech Recognition • 0.8B • Updated • 3.42M • • 2.55k -
meta-llama/Llama-3.2-11B-Vision-Instruct
Image-Text-to-Text • 11B • Updated • 815k • • 1.5k -
deepseek-ai/DeepSeek-V2.5
Text Generation • 236B • Updated • 2.94k • • 724
-
Attention Is All You Need
Paper • 1706.03762 • Published • 77 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 19 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 9 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 17
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 19 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published • 1 -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 2
-
Qwen Technical Report
Paper • 2309.16609 • Published • 37 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 10 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 167 -
Qwen2-Audio Technical Report
Paper • 2407.10759 • Published • 60
-
Mistral 7B
Paper • 2310.06825 • Published • 51 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 243 -
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
Paper • 2309.11235 • Published • 15 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 416
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 624 -
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 373 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 257 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 258
-
black-forest-labs/FLUX.1-dev
Text-to-Image • Updated • 1.59M • • 11.2k -
openai/whisper-large-v3-turbo
Automatic Speech Recognition • 0.8B • Updated • 3.42M • • 2.55k -
meta-llama/Llama-3.2-11B-Vision-Instruct
Image-Text-to-Text • 11B • Updated • 815k • • 1.5k -
deepseek-ai/DeepSeek-V2.5
Text Generation • 236B • Updated • 2.94k • • 724
-
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 373 -
Qwen2.5-Coder Technical Report
Paper • 2409.12186 • Published • 151 -
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
Paper • 2409.12122 • Published • 4 -
Qwen2.5-VL Technical Report
Paper • 2502.13923 • Published • 200