DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for
Reinforcement Learning and Monte-Carlo Tree Search
Paper
• 2408.08152
• Published
• 61
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and
Two-Phase Partition
Paper
• 2402.15220
• Published
• 20
Griffin: Mixing Gated Linear Recurrences with Local Attention for
Efficient Language Models
Paper
• 2402.19427
• Published
• 56
Note similar https://huggingface.co/papers/2402.18668
Simple linear attention language models balance the recall-throughput
tradeoff
Paper
• 2402.18668
• Published
• 20
Linear Transformers are Versatile In-Context Learners
Paper
• 2402.14180
• Published
• 7
Scaling Laws for Fine-Grained Mixture of Experts
Paper
• 2402.07871
• Published
• 13
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts
Models
Paper
• 2402.07033
• Published
• 19
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache
Quantization
Paper
• 2401.18079
• Published
• 8
Note kinda similar https://arxiv.org/pdf/2402.02750.pdf
StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback
Paper
• 2402.01391
• Published
• 43
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper
• 2402.01739
• Published
• 28
Note qmoe - https://arxiv.org/pdf/2310.16795.pdf
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper
• 2401.15024
• Published
• 73
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
Language Models
Paper
• 2402.03300
• Published
• 141
Repeat After Me: Transformers are Better than State Space Models at
Copying
Paper
• 2402.01032
• Published
• 24
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper
• 2401.18058
• Published
• 24
Can Large Language Models Understand Context?
Paper
• 2402.00858
• Published
• 24
WARM: On the Benefits of Weight Averaged Reward Models
Paper
• 2401.12187
• Published
• 19
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated
Text
Paper
• 2401.12070
• Published
• 45
Zero Bubble Pipeline Parallelism
Paper
• 2401.10241
• Published
• 25
Self-Rewarding Language Models
Paper
• 2401.10020
• Published
• 152
Specialized Language Models with Cheap Inference from Limited Domain
Data
Paper
• 2402.01093
• Published
• 47
ReFT: Reasoning with Reinforced Fine-Tuning
Paper
• 2401.08967
• Published
• 31
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper
• 2401.06951
• Published
• 26
Tuning Language Models by Proxy
Paper
• 2401.08565
• Published
• 22
Extending LLMs' Context Window with 100 Samples
Paper
• 2401.07004
• Published
• 16
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Paper
• 2401.06080
• Published
• 28
Efficient LLM inference solution on Intel GPU
Paper
• 2401.05391
• Published
• 11
The Impact of Reasoning Step Length on Large Language Models
Paper
• 2401.04925
• Published
• 18
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence
Lengths in Large Language Models
Paper
• 2401.04658
• Published
• 27
Blending Is All You Need: Cheaper, Better Alternative to
Trillion-Parameters LLM
Paper
• 2401.02994
• Published
• 52
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper
• 2401.03462
• Published
• 28
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language
Models
Paper
• 2401.01335
• Published
• 68
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper
• 2401.01325
• Published
• 27
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
• 2312.15166
• Published
• 61
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Paper
• 2312.12456
• Published
• 45
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Paper
• 2401.15077
• Published
• 20
OLMo: Accelerating the Science of Language Models
Paper
• 2402.00838
• Published
• 85
Dolma: an Open Corpus of Three Trillion Tokens for Language Model
Pretraining Research
Paper
• 2402.00159
• Published
• 65
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning
Tasks
Paper
• 2402.04248
• Published
• 32
LiPO: Listwise Preference Optimization through Learning-to-Rank
Paper
• 2402.01878
• Published
• 20
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper
• 2402.04291
• Published
• 50
Direct Language Model Alignment from Online AI Feedback
Paper
• 2402.04792
• Published
• 35
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Paper
• 2402.05099
• Published
• 20
Model Editing with Canonical Examples
Paper
• 2402.06155
• Published
• 13
SubGen: Token Generation in Sublinear Time and Memory
Paper
• 2402.06082
• Published
• 11
InternLM-Math: Open Math Large Language Models Toward Verifiable
Reasoning
Paper
• 2402.06332
• Published
• 19
ODIN: Disentangled Reward Mitigates Hacking in RLHF
Paper
• 2402.07319
• Published
• 14
AutoMathText: Autonomous Data Selection with Language Models for
Mathematical Texts
Paper
• 2402.07625
• Published
• 16
Suppressing Pink Elephants with Direct Principle Feedback
Paper
• 2402.07896
• Published
• 11
Buffer Overflow in Mixture of Experts
Paper
• 2402.05526
• Published
• 9
Speculative Streaming: Fast LLM Inference without Auxiliary Models
Paper
• 2402.11131
• Published
• 42
Linear Transformers with Learnable Kernel Functions are Better
In-Context Models
Paper
• 2402.10644
• Published
• 81
LongAgent: Scaling Language Models to 128k Context through Multi-Agent
Collaboration
Paper
• 2402.11550
• Published
• 19
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper
• 2402.10193
• Published
• 21
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper
• 2402.13753
• Published
• 116
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
• 2402.17764
• Published
• 627
FuseChat: Knowledge Fusion of Chat Models
Paper
• 2402.16107
• Published
• 39
MegaScale: Scaling Large Language Model Training to More Than 10,000
GPUs
Paper
• 2402.15627
• Published
• 36
Do Large Language Models Latently Perform Multi-Hop Reasoning?
Paper
• 2402.16837
• Published
• 29
Orca-Math: Unlocking the potential of SLMs in Grade School Math
Paper
• 2402.14830
• Published
• 24
GPTVQ: The Blessing of Dimensionality for LLM Quantization
Paper
• 2402.15319
• Published
• 22
Beyond A*: Better Planning with Transformers via Search Dynamics
Bootstrapping
Paper
• 2402.14083
• Published
• 47
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Paper
• 2402.14289
• Published
• 20
OneBit: Towards Extremely Low-bit Large Language Models
Paper
• 2402.11295
• Published
• 24
AtP*: An efficient and scalable method for localizing LLM behaviour to
components
Paper
• 2403.00745
• Published
• 14
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper
• 2403.07816
• Published
• 44
MoAI: Mixture of All Intelligence for Large Language and Vision Models
Paper
• 2403.07508
• Published
• 77
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a
Single GPU
Paper
• 2403.06504
• Published
• 56
ReALM: Reference Resolution As Language Modeling
Paper
• 2403.20329
• Published
• 22
sDPO: Don't Use Your Data All at Once
Paper
• 2403.19270
• Published
• 41
Long-form factuality in large language models
Paper
• 2403.18802
• Published
• 26
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Paper
• 2403.13372
• Published
• 179
Evolutionary Optimization of Model Merging Recipes
Paper
• 2403.13187
• Published
• 58
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper
• 2403.10704
• Published
• 60
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
Context Length
Paper
• 2404.08801
• Published
• 66
Pre-training Small Base LMs with Fewer Tokens
Paper
• 2404.08634
• Published
• 36
Dataset Reset Policy Optimization for RLHF
Paper
• 2404.08495
• Published
• 9
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper
• 2405.11143
• Published
• 41
Bootstrapping Language Models with DPO Implicit Rewards
Paper
• 2406.09760
• Published
• 41
Transfusion: Predict the Next Token and Diffuse Images with One
Multi-Modal Model
Paper
• 2408.11039
• Published
• 63
Learning to Reason under Off-Policy Guidance
Paper
• 2504.14945
• Published
• 88