Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2203.02155

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 77
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 19
RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 9
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 17

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 254
A Survey on Latent Reasoning

Paper • 2507.06203 • Published Jul 8 • 87
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 16
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Paper • 1910.10683 • Published Oct 23, 2019 • 14

Training language models to follow instructions with human feedback

Paper • 2203.02155 • Published Mar 4, 2022 • 21

LLM Post Training

Instruction Tuning for Large Language Models: A Survey

Paper • 2308.10792 • Published Aug 21, 2023 • 1
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

Paper • 2403.14608 • Published Mar 21, 2024
Efficient Large Language Models: A Survey

Paper • 2312.03863 • Published Dec 6, 2023 • 4
ReFT: Reasoning with Reinforced Fine-Tuning

Paper • 2401.08967 • Published Jan 17, 2024 • 32

LLM-Alignment Papers

Concrete Problems in AI Safety

Paper • 1606.06565 • Published Jun 21, 2016 • 1
The Off-Switch Game

Paper • 1611.08219 • Published Nov 24, 2016 • 1
Learning to summarize from human feedback

Paper • 2009.01325 • Published Sep 2, 2020 • 4
Truthful AI: Developing and governing AI that does not lie

Paper • 2110.06674 • Published Oct 13, 2021 • 1

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

Paper • 2006.03654 • Published Jun 5, 2020 • 3
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 19
RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 9
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 16

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 19
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021 • 1
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 2

Papers (I want) To Read

A list of papers on my reading list.

Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models

Paper • 2304.09842 • Published Apr 19, 2023 • 2
ReAct: Synergizing Reasoning and Acting in Language Models

Paper • 2210.03629 • Published Oct 6, 2022 • 27
Gorilla: Large Language Model Connected with Massive APIs

Paper • 2305.15334 • Published May 24, 2023 • 5
Reflexion: Language Agents with Verbal Reinforcement Learning

Paper • 2303.11366 • Published Mar 20, 2023 • 5

LLM Tech Report

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 373
Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18, 2024 • 151
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement

Paper • 2409.12122 • Published Sep 18, 2024 • 4
Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 200

Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 243
Training language models to follow instructions with human feedback

Paper • 2203.02155 • Published Mar 4, 2022 • 21
nlpaueb/legal-bert-base-uncased

Fill-Mask • Updated Apr 28, 2022 • 5.27M • • 263

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 77
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 19
RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 9
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 17

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

Paper • 2006.03654 • Published Jun 5, 2020 • 3
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 19
RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 9
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 16

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 254
A Survey on Latent Reasoning

Paper • 2507.06203 • Published Jul 8 • 87
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 16
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Paper • 1910.10683 • Published Oct 23, 2019 • 14

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 19
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021 • 1
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 2

Training language models to follow instructions with human feedback

Paper • 2203.02155 • Published Mar 4, 2022 • 21

Papers (I want) To Read

A list of papers on my reading list.

Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models

Paper • 2304.09842 • Published Apr 19, 2023 • 2
ReAct: Synergizing Reasoning and Acting in Language Models

Paper • 2210.03629 • Published Oct 6, 2022 • 27
Gorilla: Large Language Model Connected with Massive APIs

Paper • 2305.15334 • Published May 24, 2023 • 5
Reflexion: Language Agents with Verbal Reinforcement Learning

Paper • 2303.11366 • Published Mar 20, 2023 • 5

LLM Post Training

Instruction Tuning for Large Language Models: A Survey

Paper • 2308.10792 • Published Aug 21, 2023 • 1
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

Paper • 2403.14608 • Published Mar 21, 2024
Efficient Large Language Models: A Survey

Paper • 2312.03863 • Published Dec 6, 2023 • 4
ReFT: Reasoning with Reinforced Fine-Tuning

Paper • 2401.08967 • Published Jan 17, 2024 • 32

LLM Tech Report

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 373
Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18, 2024 • 151
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement

Paper • 2409.12122 • Published Sep 18, 2024 • 4
Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 200

LLM-Alignment Papers

Concrete Problems in AI Safety

Paper • 1606.06565 • Published Jun 21, 2016 • 1
The Off-Switch Game

Paper • 1611.08219 • Published Nov 24, 2016 • 1
Learning to summarize from human feedback

Paper • 2009.01325 • Published Sep 2, 2020 • 4
Truthful AI: Developing and governing AI that does not lie

Paper • 2110.06674 • Published Oct 13, 2021 • 1

Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 243
Training language models to follow instructions with human feedback

Paper • 2203.02155 • Published Mar 4, 2022 • 21
nlpaueb/legal-bert-base-uncased

Fill-Mask • Updated Apr 28, 2022 • 5.27M • • 263

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs