krypticmouse 's Collections LLMs
updated
Language Modeling Is Compression
Paper
• 2309.10668
• Published
• 84
Baichuan 2: Open Large-scale Language Models
Paper
• 2309.10305
• Published
• 22
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper
• 2309.11495
• Published
• 40
LMDX: Language Model-based Document Information Extraction and
Localization
Paper
• 2309.10952
• Published
• 67
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper
• 2309.12307
• Published
• 90
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Paper
• 2309.11998
• Published
• 26
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language
Models
Paper
• 2309.12284
• Published
• 19
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Paper
• 2309.11568
• Published
• 11
Contrastive Decoding Improves Reasoning in Large Language Models
Paper
• 2309.09117
• Published
• 40
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models
Paper
• 2309.09958
• Published
• 20
LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language
Models
Paper
• 2309.09506
• Published
• 15
Cure the headache of Transformers via Collinear Constrained Attention
Paper
• 2309.08646
• Published
• 14
Struc-Bench: Are Large Language Models Really Good at Generating Complex
Structured Data?
Paper
• 2309.08963
• Published
• 11
A Distributed Data-Parallel PyTorch Implementation of the Distributed
Shampoo Optimizer for Training Neural Networks At-Scale
Paper
• 2309.06497
• Published
• 7
Sparse Autoencoders Find Highly Interpretable Features in Language
Models
Paper
• 2309.08600
• Published
• 15
Agents: An Open-source Framework for Autonomous Language Agents
Paper
• 2309.07870
• Published
• 43
Ambiguity-Aware In-Context Learning with Large Language Models
Paper
• 2309.07900
• Published
• 5
Large Language Models for Compiler Optimization
Paper
• 2309.07062
• Published
• 25
Statistical Rejection Sampling Improves Preference Optimization
Paper
• 2309.06657
• Published
• 15
Efficient Memory Management for Large Language Model Serving with
PagedAttention
Paper
• 2309.06180
• Published
• 38
Large Language Model for Science: A Study on P vs. NP
Paper
• 2309.05689
• Published
• 22
Connecting Large Language Models with Evolutionary Algorithms Yields
Powerful Prompt Optimizers
Paper
• 2309.08532
• Published
• 54
Augmenting text for spoken language understanding with Large Language
Models
Paper
• 2309.09390
• Published
• 2
Investigating Answerability of LLMs for Long-Form Question Answering
Paper
• 2309.08210
• Published
• 15
Replacing softmax with ReLU in Vision Transformers
Paper
• 2309.08586
• Published
• 19
Uncovering mesa-optimization algorithms in Transformers
Paper
• 2309.05858
• Published
• 14
Neurons in Large Language Models: Dead, N-gram, Positional
Paper
• 2309.04827
• Published
• 18
When Less is More: Investigating Data Pruning for Pretraining LLMs at
Scale
Paper
• 2309.04564
• Published
• 17
Optimize Weight Rounding via Signed Gradient Descent for the
Quantization of LLMs
Paper
• 2309.05516
• Published
• 11
From Sparse to Dense: GPT-4 Summarization with Chain of Density
Prompting
Paper
• 2309.04269
• Published
• 34
DoLa: Decoding by Contrasting Layers Improves Factuality in Large
Language Models
Paper
• 2309.03883
• Published
• 36
One Wide Feedforward is All You Need
Paper
• 2309.01826
• Published
• 34
Efficient RLHF: Reducing the Memory Usage of PPO
Paper
• 2309.00754
• Published
• 16
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI
Feedback
Paper
• 2309.00267
• Published
• 53
YaRN: Efficient Context Window Extension of Large Language Models
Paper
• 2309.00071
• Published
• 80
RAVEN: In-Context Learning with Retrieval Augmented Encoder-Decoder
Language Models
Paper
• 2308.07922
• Published
• 19
CausalLM is not optimal for in-context learning
Paper
• 2308.06912
• Published
• 19
Self-Alignment with Instruction Backtranslation
Paper
• 2308.06259
• Published
• 43
Shepherd: A Critic for Language Model Generation
Paper
• 2308.04592
• Published
• 33
Accelerating LLM Inference with Staged Speculative Decoding
Paper
• 2308.04623
• Published
• 26
Adapting Large Language Models via Reading Comprehension
Paper
• 2309.09530
• Published
• 82