LLMs - a krypticmouse Collection

krypticmouse 's Collections

LLMs

updated Sep 25, 2023

Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 84
Baichuan 2: Open Large-scale Language Models

Paper • 2309.10305 • Published Sep 19, 2023 • 22
Chain-of-Verification Reduces Hallucination in Large Language Models

Paper • 2309.11495 • Published Sep 20, 2023 • 40
LMDX: Language Model-based Document Information Extraction and Localization

Paper • 2309.10952 • Published Sep 19, 2023 • 67
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

Paper • 2309.12307 • Published Sep 21, 2023 • 90
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

Paper • 2309.11998 • Published Sep 21, 2023 • 26
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Paper • 2309.12284 • Published Sep 21, 2023 • 19
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model

Paper • 2309.11568 • Published Sep 20, 2023 • 11
Contrastive Decoding Improves Reasoning in Large Language Models

Paper • 2309.09117 • Published Sep 17, 2023 • 40
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models

Paper • 2309.09958 • Published Sep 18, 2023 • 20
LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language Models

Paper • 2309.09506 • Published Sep 18, 2023 • 15
Cure the headache of Transformers via Collinear Constrained Attention

Paper • 2309.08646 • Published Sep 15, 2023 • 14
Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?

Paper • 2309.08963 • Published Sep 16, 2023 • 11
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale

Paper • 2309.06497 • Published Sep 12, 2023 • 7
Sparse Autoencoders Find Highly Interpretable Features in Language Models

Paper • 2309.08600 • Published Sep 15, 2023 • 15
Agents: An Open-source Framework for Autonomous Language Agents

Paper • 2309.07870 • Published Sep 14, 2023 • 43
Ambiguity-Aware In-Context Learning with Large Language Models

Paper • 2309.07900 • Published Sep 14, 2023 • 5
Large Language Models for Compiler Optimization

Paper • 2309.07062 • Published Sep 11, 2023 • 25
Statistical Rejection Sampling Improves Preference Optimization

Paper • 2309.06657 • Published Sep 13, 2023 • 15
Efficient Memory Management for Large Language Model Serving with PagedAttention

Paper • 2309.06180 • Published Sep 12, 2023 • 38
Large Language Model for Science: A Study on P vs. NP

Paper • 2309.05689 • Published Sep 11, 2023 • 22
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers

Paper • 2309.08532 • Published Sep 15, 2023 • 54
Augmenting text for spoken language understanding with Large Language Models

Paper • 2309.09390 • Published Sep 17, 2023 • 2
Investigating Answerability of LLMs for Long-Form Question Answering

Paper • 2309.08210 • Published Sep 15, 2023 • 15
Replacing softmax with ReLU in Vision Transformers

Paper • 2309.08586 • Published Sep 15, 2023 • 19
Uncovering mesa-optimization algorithms in Transformers

Paper • 2309.05858 • Published Sep 11, 2023 • 14
Neurons in Large Language Models: Dead, N-gram, Positional

Paper • 2309.04827 • Published Sep 9, 2023 • 18
When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale

Paper • 2309.04564 • Published Sep 8, 2023 • 17
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Paper • 2309.05516 • Published Sep 11, 2023 • 11
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting

Paper • 2309.04269 • Published Sep 8, 2023 • 34
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

Paper • 2309.03883 • Published Sep 7, 2023 • 36
One Wide Feedforward is All You Need

Paper • 2309.01826 • Published Sep 4, 2023 • 34
Efficient RLHF: Reducing the Memory Usage of PPO

Paper • 2309.00754 • Published Sep 1, 2023 • 16
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

Paper • 2309.00267 • Published Sep 1, 2023 • 53
YaRN: Efficient Context Window Extension of Large Language Models

Paper • 2309.00071 • Published Aug 31, 2023 • 80
RAVEN: In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models

Paper • 2308.07922 • Published Aug 15, 2023 • 19
CausalLM is not optimal for in-context learning

Paper • 2308.06912 • Published Aug 14, 2023 • 19
Self-Alignment with Instruction Backtranslation

Paper • 2308.06259 • Published Aug 11, 2023 • 43
Shepherd: A Critic for Language Model Generation

Paper • 2308.04592 • Published Aug 8, 2023 • 33
Accelerating LLM Inference with Staged Speculative Decoding

Paper • 2308.04623 • Published Aug 8, 2023 • 26
Adapting Large Language Models via Reading Comprehension

Paper • 2309.09530 • Published Sep 18, 2023 • 82