Papers

sugatoray 's Collections

OpenEnvs

Papers + RL/Reasoning

Marimo

RLMs (Reasoning Language Models)

Books And Notes

Reasoning Datasets

SmolAgents Tools (Spaces)

LLM Training Datasets

TFM: TimeSeries Foundation Models

Papers-Benchmarks

LLMs-EmbeddingModels

LLMs + Mamba

LLM + Datasets : Finance

updated about 6 hours ago

Large Language Model (LLM) and NLP related papers.

Upvote

LoRA+: Efficient Low Rank Adaptation of Large Models

Paper • 2402.12354 • Published Feb 19, 2024 • 7
The FinBen: An Holistic Financial Benchmark for Large Language Models

Paper • 2402.12659 • Published Feb 20, 2024 • 24
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization

Paper • 2402.13249 • Published Feb 20, 2024 • 15
TrustLLM: Trustworthiness in Large Language Models

Paper • 2401.05561 • Published Jan 10, 2024 • 69
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks

Paper • 2311.07463 • Published Nov 13, 2023 • 15
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

Paper • 2305.07185 • Published May 12, 2023 • 10
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Paper • 2309.14717 • Published Sep 26, 2023 • 46
PEFTDebias : Capturing debiasing information using PEFTs

Paper • 2312.00434 • Published Dec 1, 2023 • 1
From PEFT to DEFT: Parameter Efficient Finetuning for Reducing Activation Density in Transformers

Paper • 2402.01911 • Published Feb 2, 2024 • 2
Empirical Study of PEFT techniques for Winter Wheat Segmentation

Paper • 2310.01825 • Published Oct 3, 2023 • 2
LoRA: Low-Rank Adaptation of Large Language Models

Paper • 2106.09685 • Published Jun 17, 2021 • 60
L4Q: Parameter Efficient Quantization-Aware Training on Large Language Models via LoRA-wise LSQ

Paper • 2402.04902 • Published Feb 7, 2024 • 5
Self-Instruct: Aligning Language Model with Self Generated Instructions

Paper • 2212.10560 • Published Dec 20, 2022 • 9
Efficient Training of Language Models to Fill in the Middle

Paper • 2207.14255 • Published Jul 28, 2022 • 1
Genie: Generative Interactive Environments

Paper • 2402.15391 • Published Feb 23, 2024 • 72
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method

Paper • 2402.17193 • Published Feb 27, 2024 • 26
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 628
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model

Paper • 2402.17412 • Published Feb 27, 2024 • 23
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT

Paper • 2402.16840 • Published Feb 26, 2024 • 25
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models

Paper • 2402.10524 • Published Feb 16, 2024 • 23
FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models

Paper • 2402.10986 • Published Feb 16, 2024 • 82
StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 156
Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29, 2024 • 53
Training-Free Long-Context Scaling of Large Language Models

Paper • 2402.17463 • Published Feb 27, 2024 • 24
Design2Code: How Far Are We From Automating Front-End Engineering?

Paper • 2403.03163 • Published Mar 5, 2024 • 98
EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs

Paper • 2403.02775 • Published Mar 5, 2024 • 13
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset

Paper • 2402.10176 • Published Feb 15, 2024 • 38
OneBit: Towards Extremely Low-bit Large Language Models

Paper • 2402.11295 • Published Feb 17, 2024 • 24
Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7, 2024 • 65
Stealing Part of a Production Language Model

Paper • 2403.06634 • Published Mar 11, 2024 • 91
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU

Paper • 2403.06504 • Published Mar 11, 2024 • 56
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Paper • 2403.06764 • Published Mar 11, 2024 • 27
V3D: Video Diffusion Models are Effective 3D Generators

Paper • 2403.06738 • Published Mar 11, 2024 • 30
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Paper • 2403.05530 • Published Mar 8, 2024 • 64
MoAI: Mixture of All Intelligence for Large Language and Vision Models

Paper • 2403.07508 • Published Mar 12, 2024 • 77
GiT: Towards Generalist Vision Transformer through Universal Language Interface

Paper • 2403.09394 • Published Mar 14, 2024 • 26
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14, 2024 • 129
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Paper • 2403.09629 • Published Mar 14, 2024 • 79
ORPO: Monolithic Preference Optimization without Reference Model

Paper • 2403.07691 • Published Mar 12, 2024 • 72
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

Paper • 2403.13372 • Published Mar 20, 2024 • 183
Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19, 2024 • 58
Matryoshka: Stealing Functionality of Private ML Data by Hiding Models in Model

Paper • 2206.14371 • Published Jun 29, 2022 • 3
sDPO: Don't Use Your Data All at Once

Paper • 2403.19270 • Published Mar 28, 2024 • 41
Model Stock: All we need is just a few fine-tuned models

Paper • 2403.19522 • Published Mar 28, 2024 • 14
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2, 2024 • 108
Long-context LLMs Struggle with Long In-context Learning

Paper • 2404.02060 • Published Apr 2, 2024 • 37
Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding

Paper • 2403.04797 • Published Mar 5, 2024 • 1
ReFT: Representation Finetuning for Language Models

Paper • 2404.03592 • Published Apr 4, 2024 • 101
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

Paper • 2403.07816 • Published Mar 12, 2024 • 45
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

Paper • 2402.01739 • Published Jan 29, 2024 • 28
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

Paper • 2404.00399 • Published Mar 30, 2024 • 42
Latxa: An Open Language Model and Evaluation Suite for Basque

Paper • 2403.20266 • Published Mar 29, 2024 • 4
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Paper • 2404.05961 • Published Apr 9, 2024 • 66
More Agents Is All You Need

Paper • 2402.05120 • Published Feb 3, 2024 • 57
QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs

Paper • 2404.00456 • Published Mar 30, 2024 • 4
Learn Your Reference Model for Real Good Alignment

Paper • 2404.09656 • Published Apr 15, 2024 • 90
A Review of Modern Recommender Systems Using Generative Models (Gen-RecSys)

Paper • 2404.00579 • Published Mar 31, 2024 • 2
RoFormer: Enhanced Transformer with Rotary Position Embedding

Paper • 2104.09864 • Published Apr 20, 2021 • 17
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

Paper • 2404.03715 • Published Apr 4, 2024 • 62
Parameter Efficient Fine Tuning: A Comprehensive Analysis Across Applications

Paper • 2404.13506 • Published Apr 21, 2024 • 1
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Paper • 2405.01535 • Published May 2, 2024 • 124
WildChat: 1M ChatGPT Interaction Logs in the Wild

Paper • 2405.01470 • Published May 2, 2024 • 65
Better & Faster Large Language Models via Multi-token Prediction

Paper • 2404.19737 • Published Apr 30, 2024 • 80
Granite Code Models: A Family of Open Foundation Models for Code Intelligence

Paper • 2405.04324 • Published May 7, 2024 • 26
Automating the Enterprise with Foundation Models

Paper • 2405.03710 • Published May 3, 2024 • 1
Stylus: Automatic Adapter Selection for Diffusion Models

Paper • 2404.18928 • Published Apr 29, 2024 • 15
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory

Paper • 2405.08707 • Published May 14, 2024 • 34
tinyBenchmarks: evaluating LLMs with fewer examples

Paper • 2402.14992 • Published Feb 22, 2024 • 17
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models

Paper • 2405.09220 • Published May 15, 2024 • 27
Zero-Shot Tokenizer Transfer

Paper • 2405.07883 • Published May 13, 2024 • 5
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Paper • 2405.14333 • Published May 23, 2024 • 45
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Paper • 2405.11143 • Published May 20, 2024 • 41
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 90
Spectrum: Targeted Training on Signal to Noise Ratio

Paper • 2406.06623 • Published Jun 7, 2024 • 16
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

Paper • 2406.08587 • Published Jun 12, 2024 • 16
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM

Paper • 2401.02994 • Published Jan 4, 2024 • 52
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

Paper • 2406.07394 • Published Jun 11, 2024 • 29
The Prompt Report: A Systematic Survey of Prompting Techniques

Paper • 2406.06608 • Published Jun 6, 2024 • 68
How Do Large Language Models Acquire Factual Knowledge During Pretraining?

Paper • 2406.11813 • Published Jun 17, 2024 • 31
GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks

Paper • 2406.12925 • Published Jun 14, 2024 • 25
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

Paper • 2406.12034 • Published Jun 17, 2024 • 16
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 145
LiveMind: Low-latency Large Language Models with Simultaneous Inference

Paper • 2406.14319 • Published Jun 20, 2024 • 15
Extreme Compression of Large Language Models via Additive Quantization

Paper • 2401.06118 • Published Jan 11, 2024 • 14
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations

Paper • 2312.08935 • Published Dec 14, 2023 • 4
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data

Paper • 2406.14546 • Published Jun 20, 2024 • 3
HyperZcdotZcdotW Operator Connects Slow-Fast Networks for Full Context Interaction

Paper • 2401.17948 • Published Jan 31, 2024 • 4
Grokfast: Accelerated Grokking by Amplifying Slow Gradients

Paper • 2405.20233 • Published May 30, 2024 • 7
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

Paper • 2309.03883 • Published Sep 7, 2023 • 36
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models

Paper • 2407.09025 • Published Jul 12, 2024 • 140
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?

Paper • 2407.11963 • Published Jul 16, 2024 • 44
Fast Matrix Multiplications for Lookup Table-Quantized LLMs

Paper • 2407.10960 • Published Jul 15, 2024 • 13
LAMBDA: A Large Model Based Data Agent

Paper • 2407.17535 • Published Jul 24, 2024 • 37
TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSON

Paper • 2407.15734 • Published Jul 22, 2024 • 1
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents

Paper • 2407.18901 • Published Jul 26, 2024 • 35
Trace is the New AutoDiff -- Unlocking Efficient Optimization of Computational Workflows

Paper • 2406.16218 • Published Jun 23, 2024 • 3
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25, 2024 • 102
Probabilistic Programming with Programmable Variational Inference

Paper • 2406.15742 • Published Jun 22, 2024 • 2
Running

114

Vision Papers

💻

114

All paper summaries read by Merve
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12, 2024 • 72
Improving Retrieval Augmented Language Model with Self-Reasoning

Paper • 2407.19813 • Published Jul 29, 2024 • 6
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

Paper • 2408.06195 • Published Aug 12, 2024 • 73
Training Language Models on the Knowledge Graph: Insights on Hallucinations and Their Detectability

Paper • 2408.07852 • Published Aug 14, 2024 • 16
Writing in the Margins: Better Inference Pattern for Long Context Retrieval

Paper • 2408.14906 • Published Aug 27, 2024 • 144
Scaling Up Diffusion and Flow-based XGBoost Models

Paper • 2408.16046 • Published Aug 28, 2024 • 10
Advancing LLM Reasoning Generalists with Preference Trees

Paper • 2404.02078 • Published Apr 2, 2024 • 46
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding

Paper • 2408.15545 • Published Aug 28, 2024 • 38
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis

Paper • 2409.06135 • Published Sep 10, 2024 • 16
OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs

Paper • 2409.05152 • Published Sep 8, 2024 • 32
Scaling Synthetic Data Creation with 1,000,000,000 Personas

Paper • 2406.20094 • Published Jun 28, 2024 • 107
Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published Sep 19, 2024 • 140
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Paper • 2408.03314 • Published Aug 6, 2024 • 63
Self-Reflection in LLM Agents: Effects on Problem-Solving Performance

Paper • 2405.06682 • Published May 5, 2024 • 3
Improve Mathematical Reasoning in Language Models by Automated Process Supervision

Paper • 2406.06592 • Published Jun 5, 2024 • 29
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion

Paper • 2409.12957 • Published Sep 19, 2024 • 21
Prithvi WxC: Foundation Model for Weather and Climate

Paper • 2409.13598 • Published Sep 20, 2024 • 45
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?

Paper • 2409.15277 • Published Sep 23, 2024 • 38
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25, 2024 • 121
The Perfect Blend: Redefining RLHF with Mixture of Judges

Paper • 2409.20370 • Published Sep 30, 2024 • 7

Note HF TRL PR for the GCPO paper implementation: https://github.com/huggingface/trl/pull/2155
Addition is All You Need for Energy-efficient Language Models

Paper • 2410.00907 • Published Oct 1, 2024 • 151
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

Paper • 2410.10563 • Published Oct 14, 2024 • 37
Benchmarking Agentic Workflow Generation

Paper • 2410.07869 • Published Oct 10, 2024 • 29
DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 191
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems

Paper • 2411.02959 • Published Nov 5, 2024 • 71
BitNet a4.8: 4-bit Activations for 1-bit LLMs

Paper • 2411.04965 • Published Nov 7, 2024 • 69
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

Paper • 2405.15071 • Published May 23, 2024 • 42
Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models

Paper • 2407.12327 • Published Jul 17, 2024 • 79
SelfCodeAlign: Self-Alignment for Code Generation

Paper • 2410.24198 • Published Oct 31, 2024 • 24
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

Paper • 2411.04905 • Published Nov 7, 2024 • 127
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

Paper • 2411.04996 • Published Nov 7, 2024 • 51
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding

Paper • 2411.04952 • Published Nov 7, 2024 • 29
LLM-KT: A Versatile Framework for Knowledge Transfer from Large Language Models to Collaborative Filtering

Paper • 2411.00556 • Published Nov 1, 2024 • 1
Direct Preference Optimization Using Sparse Feature-Level Constraints

Paper • 2411.07618 • Published Nov 12, 2024 • 17
LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Nov 15, 2024 • 129
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering

Paper • 2411.11504 • Published Nov 18, 2024 • 24
Continuous Speculative Decoding for Autoregressive Image Generation

Paper • 2411.11925 • Published Nov 18, 2024 • 16
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Paper • 2411.10442 • Published Nov 15, 2024 • 87
Stream of Search (SoS): Learning to Search in Language

Paper • 2404.03683 • Published Apr 1, 2024 • 30
Does Prompt Formatting Have Any Impact on LLM Performance?

Paper • 2411.10541 • Published Nov 15, 2024 • 1
Understanding LLM Embeddings for Regression

Paper • 2411.14708 • Published Nov 22, 2024 • 2
OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation

Paper • 2412.02592 • Published Dec 3, 2024 • 24
Personalized Multimodal Large Language Models: A Survey

Paper • 2412.02142 • Published Dec 3, 2024 • 13
VisionZip: Longer is Better but Not Necessary in Vision Language Models

Paper • 2412.04467 • Published Dec 5, 2024 • 118
Evaluating Language Models as Synthetic Data Generators

Paper • 2412.03679 • Published Dec 4, 2024 • 47
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Paper • 2412.04454 • Published Dec 5, 2024 • 71
TrAct: Making First-layer Pre-Activations Trainable

Paper • 2410.23970 • Published Oct 31, 2024 • 1
Training Large Language Models to Reason in a Continuous Latent Space

Paper • 2412.06769 • Published Dec 9, 2024 • 94
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation

Paper • 2411.04997 • Published Nov 7, 2024 • 39
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

Paper • 2412.07626 • Published Dec 10, 2024 • 30
Generative Powers of Ten

Paper • 2312.02149 • Published Dec 4, 2023 • 8
Phi-4 Technical Report

Paper • 2412.08905 • Published Dec 12, 2024 • 123
Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 147
Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers

Paper • 2412.12276 • Published Dec 16, 2024 • 15
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain

Paper • 2412.13018 • Published Dec 17, 2024 • 41
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 163
How to Synthesize Text Data without Model Collapse?

Paper • 2412.14689 • Published Dec 19, 2024 • 53
Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture

Paper • 2412.11834 • Published Dec 16, 2024 • 8
Are Your LLMs Capable of Stable Reasoning?

Paper • 2412.13147 • Published Dec 17, 2024 • 93
Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents

Paper • 2412.13194 • Published Dec 17, 2024 • 12
SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator

Paper • 2412.12094 • Published Dec 16, 2024 • 11
MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design

Paper • 2412.14590 • Published Dec 19, 2024 • 15
Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning

Paper • 2412.15797 • Published Dec 20, 2024 • 18
GUI Agents: A Survey

Paper • 2412.13501 • Published Dec 18, 2024 • 30
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

Paper • 2412.18619 • Published Dec 16, 2024 • 59
Xmodel-2 Technical Report

Paper • 2412.19638 • Published Dec 27, 2024 • 27
GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 247
Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study

Paper • 2305.13062 • Published May 22, 2023 • 2
Agent Laboratory: Using LLM Agents as Research Assistants

Paper • 2501.04227 • Published Jan 8, 2025 • 95
Proactive Conversational Agents with Inner Thoughts

Paper • 2501.00383 • Published Dec 31, 2024 • 1
KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model

Paper • 2501.01028 • Published Jan 2, 2025 • 19
Agentless: Demystifying LLM-based Software Engineering Agents

Paper • 2407.01489 • Published Jul 1, 2024 • 65
Transformer^2: Self-adaptive LLMs

Paper • 2501.06252 • Published Jan 9, 2025 • 55
Titans: Learning to Memorize at Test Time

Paper • 2501.00663 • Published Dec 31, 2024 • 31
PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback

Paper • 2412.03578 • Published Nov 18, 2024 • 1
MiniRAG: Towards Extremely Simple Retrieval-Augmented Generation

Paper • 2501.06713 • Published Jan 12, 2025 • 4
Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass

Paper • 2501.13928 • Published Jan 23, 2025 • 17
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate

Paper • 2501.17703 • Published Jan 29, 2025 • 59
TIGER-Lab/Qwen2.5-Math-7B-CFT

Text Generation • 8B • Updated Feb 2, 2025 • 47 • 8
WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training

Paper • 2501.18511 • Published Jan 30, 2025 • 20
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 448
s1: Simple test-time scaling

Paper • 2501.19393 • Published Jan 31, 2025 • 125

Note Code: https://github.com/simplescaling/s1
Preference Leakage: A Contamination Problem in LLM-as-a-judge

Paper • 2502.01534 • Published Feb 3, 2025 • 40
Explaining Large Language Models Decisions Using Shapley Values

Paper • 2404.01332 • Published Mar 29, 2024 • 1
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4, 2025 • 258
The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles

Paper • 2502.01081 • Published Feb 3, 2025 • 13
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning

Paper • 2502.03275 • Published Feb 5, 2025 • 18
Demystifying Long Chain-of-Thought Reasoning in LLMs

Paper • 2502.03373 • Published Feb 5, 2025 • 58
On Teacher Hacking in Language Model Distillation

Paper • 2502.02671 • Published Feb 4, 2025 • 18
Matryoshka Quantization

Paper • 2502.06786 • Published Feb 10, 2025 • 32
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!

Paper • 2502.07374 • Published Feb 11, 2025 • 40
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

Paper • 2401.14196 • Published Jan 25, 2024 • 72
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 61
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging

Paper • 2502.05664 • Published Feb 8, 2025 • 24
DPO-Shift: Shifting the Distribution of Direct Preference Optimization

Paper • 2502.07599 • Published Feb 11, 2025 • 15
SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models

Paper • 2502.09604 • Published Feb 13, 2025 • 37
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents

Paper • 2502.09560 • Published Feb 13, 2025 • 35
Large Language Diffusion Models

Paper • 2502.09992 • Published Feb 14, 2025 • 127
One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs

Paper • 2502.10454 • Published Feb 12, 2025 • 7
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Paper • 2502.06703 • Published Feb 10, 2025 • 152
Soundwave: Less is More for Speech-Text Alignment in LLMs

Paper • 2502.12900 • Published Feb 18, 2025 • 86
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

Paper • 2502.10248 • Published Feb 14, 2025 • 57
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?

Paper • 2502.12215 • Published Feb 17, 2025 • 16
Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study

Paper • 2502.02481 • Published Feb 4, 2025 • 16
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning

Paper • 2502.12853 • Published Feb 18, 2025 • 29
S*: Test Time Scaling for Code Generation

Paper • 2502.14382 • Published Feb 20, 2025 • 63
SurveyX: Academic Survey Automation via Large Language Models

Paper • 2502.14776 • Published Feb 20, 2025 • 100
Indiana Jones: There Are Always Some Useful Ancient Relics

Paper • 2501.18628 • Published Jan 27, 2025 • 1
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment

Paper • 2502.16894 • Published Feb 24, 2025 • 33
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25, 2025 • 75
Levels of AGI for Operationalizing Progress on the Path to AGI

Paper • 2311.02462 • Published Nov 4, 2023 • 36
Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures

Paper • 2502.05078 • Published Feb 7, 2025 • 3
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Paper • 2503.01743 • Published Mar 3, 2025 • 90
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition

Paper • 2503.00735 • Published Mar 2, 2025 • 23
DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion

Paper • 2503.01183 • Published Mar 3, 2025 • 29
START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6, 2025 • 113
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents

Paper • 2503.01935 • Published Mar 3, 2025 • 30
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

Paper • 2503.05592 • Published Mar 7, 2025 • 27
MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published Feb 19, 2025 • 48
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning

Paper • 2503.05379 • Published Mar 7, 2025 • 38
A Survey on Post-training of Large Language Models

Paper • 2503.06072 • Published Mar 8, 2025 • 11
LettuceDetect: A Hallucination Detection Framework for RAG Applications

Paper • 2502.17125 • Published Feb 24, 2025 • 13
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

Paper • 2503.07572 • Published Mar 10, 2025 • 48
HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs

Paper • 2503.02003 • Published Mar 3, 2025 • 48
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Paper • 2503.09573 • Published Mar 12, 2025 • 77
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 172
2 OLMo 2 Furious

Paper • 2501.00656 • Published Dec 31, 2024 • 22
Gemini Embedding: Generalizable Embeddings from Gemini

Paper • 2503.07891 • Published Mar 10, 2025 • 46
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14, 2025 • 156
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18, 2025 • 146
Manify: A Python Library for Learning Non-Euclidean Representations

Paper • 2503.09576 • Published Mar 12, 2025 • 1
Tokenize Image as a Set

Paper • 2503.16425 • Published Mar 20, 2025 • 16
Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts

Paper • 2503.16057 • Published Mar 20, 2025 • 14
Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning

Paper • 2503.16252 • Published Mar 20, 2025 • 32
SynCity: Training-Free Generation of 3D Worlds

Paper • 2503.16420 • Published Mar 20, 2025 • 27
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't

Paper • 2503.16219 • Published Mar 20, 2025 • 52
DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ

Paper • 2405.15306 • Published May 24, 2024 • 10
Survey on Evaluation of LLM-based Agents

Paper • 2503.16416 • Published Mar 20, 2025 • 96
One-Step Residual Shifting Diffusion for Image Super-Resolution via Distillation

Paper • 2503.13358 • Published Mar 17, 2025 • 95
InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity

Paper • 2503.16418 • Published Mar 20, 2025 • 36
Defeating Prompt Injections by Design

Paper • 2503.18813 • Published Mar 24, 2025 • 25
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning

Paper • 2503.21620 • Published Mar 27, 2025 • 62
Video-R1: Reinforcing Video Reasoning in MLLMs

Paper • 2503.21776 • Published Mar 27, 2025 • 79
CLS-RL: Image Classification with Rule-Based Reinforcement Learning

Paper • 2503.16188 • Published Mar 20, 2025 • 13
ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation

Paper • 2503.21729 • Published Mar 27, 2025 • 29
InSTA: Towards Internet-Scale Training For Agents

Paper • 2502.06776 • Published Feb 10, 2025 • 9
Large Language Model Agent: A Survey on Methodology, Applications and Challenges

Paper • 2503.21460 • Published Mar 27, 2025 • 83
YourBench: Easy Custom Evaluation Sets for Everyone

Paper • 2504.01833 • Published Apr 2, 2025 • 23
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources

Paper • 2504.00595 • Published Apr 1, 2025 • 37
Reinforcement Learning: An Overview

Paper • 2412.05265 • Published Dec 6, 2024 • 8
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay

Paper • 2504.03601 • Published Apr 4, 2025 • 18
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

Paper • 2504.05118 • Published Apr 7, 2025 • 26
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Paper • 2504.01990 • Published Mar 31, 2025 • 305
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

Paper • 2504.08600 • Published Apr 11, 2025 • 33
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation

Paper • 2504.08736 • Published Apr 11, 2025 • 46
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Paper • 2504.10449 • Published Apr 14, 2025 • 15
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14, 2025 • 308
ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance

Paper • 2504.08716 • Published Apr 11, 2025 • 9
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce

Paper • 2504.11343 • Published Apr 15, 2025 • 20
InteractVLM: 3D Interaction Reasoning from 2D Foundational Models

Paper • 2504.05303 • Published Apr 7, 2025 • 5
OTC: Optimal Tool Calls via Reinforcement Learning

Paper • 2504.14870 • Published Apr 21, 2025 • 35
BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs

Paper • 2504.18415 • Published Apr 25, 2025 • 50
DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large Language Models

Paper • 2504.15716 • Published Apr 22, 2025 • 12
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models

Paper • 2504.17789 • Published Apr 24, 2025 • 23
WebThinker: Empowering Large Reasoning Models with Deep Research Capability

Paper • 2504.21776 • Published Apr 30, 2025 • 59
DeepCritic: Deliberate Critique with Large Language Models

Paper • 2505.00662 • Published May 1, 2025 • 54
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Paper • 2504.19413 • Published Apr 28, 2025 • 52
Towards Effective Extraction and Evaluation of Factual Claims

Paper • 2502.10855 • Published Feb 15, 2025 • 3
A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency

Paper • 2505.01658 • Published May 3, 2025 • 40
Benchmarking LLMs' Swarm intelligence

Paper • 2505.04364 • Published May 7, 2025 • 20
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

Paper • 2505.07608 • Published May 12, 2025 • 83
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Paper • 2505.09343 • Published May 14, 2025 • 76
CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models

Paper • 2505.12504 • Published May 18, 2025 • 24
AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning

Paper • 2505.11896 • Published May 17, 2025 • 58
Scaling Law for Quantization-Aware Training

Paper • 2505.14302 • Published May 20, 2025 • 76
Reward Reasoning Model

Paper • 2505.14674 • Published May 20, 2025 • 37
MMaDA: Multimodal Large Diffusion Language Models

Paper • 2505.15809 • Published May 21, 2025 • 98
Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!

Paper • 2505.15656 • Published May 21, 2025 • 15
One RL to See Them All: Visual Triple Unified Reinforcement Learning

Paper • 2505.18129 • Published May 23, 2025 • 62
FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow

Paper • 2505.17399 • Published May 23, 2025 • 14
Asymptotics of Language Model Alignment

Paper • 2404.01730 • Published Apr 2, 2024 • 1
RL with KL penalties is better viewed as Bayesian inference

Paper • 2205.11275 • Published May 23, 2022 • 1
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization

Paper • 2505.19000 • Published May 25, 2025 • 42
VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection

Paper • 2505.20289 • Published May 26, 2025 • 10
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space

Paper • 2505.15778 • Published May 21, 2025 • 19
ZeroGUI: Automating Online GUI Learning at Zero Human Cost

Paper • 2505.23762 • Published May 29, 2025 • 45
Table-R1: Inference-Time Scaling for Table Reasoning

Paper • 2505.23621 • Published May 29, 2025 • 93
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 172
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 265
BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation

Paper • 2506.07530 • Published Jun 9, 2025 • 20
Give Me FP32 or Give Me Death? Challenges and Solutions for Reproducible Reasoning

Paper • 2506.09501 • Published Jun 11, 2025 • 20
Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Paper • 2506.09250 • Published Jun 10, 2025 • 27
Magistral

Paper • 2506.10910 • Published Jun 12, 2025 • 68
MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation

Paper • 2506.14028 • Published Jun 16, 2025 • 94
From Bytes to Ideas: Language Modeling with Autoregressive U-Nets

Paper • 2506.14761 • Published Jun 17, 2025 • 17
Data Efficacy for Language Model Training

Paper • 2506.21545 • Published Jun 26, 2025 • 11
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

Paper • 2507.00432 • Published Jul 1, 2025 • 79
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

Paper • 2507.01925 • Published Jul 2, 2025 • 39
LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion

Paper • 2507.02813 • Published Jul 3, 2025 • 60
Can LLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation on AI Research Papers

Paper • 2507.02694 • Published Jul 3, 2025 • 19
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks

Paper • 2507.01955 • Published Jul 2, 2025 • 36
SingLoRA: Low Rank Adaptation Using a Single Matrix

Paper • 2507.05566 • Published Jul 8, 2025 • 116
AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs

Paper • 2507.05687 • Published Jul 8, 2025 • 31
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

Paper • 2507.10532 • Published Jul 14, 2025 • 90
Empirical evidence of Large Language Model's influence on human spoken communication

Paper • 2409.01754 • Published Sep 3, 2024 • 1
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs

Paper • 2507.09477 • Published Jul 13, 2025 • 88
KV Cache Steering for Inducing Reasoning in Small Language Models

Paper • 2507.08799 • Published Jul 11, 2025 • 40
RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization

Paper • 2507.12142 • Published Jul 16, 2025 • 36
SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search

Paper • 2507.15245 • Published Jul 21, 2025 • 11
Geometric-Mean Policy Optimization

Paper • 2507.20673 • Published Jul 28, 2025 • 32
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

Paper • 2507.23726 • Published Jul 31, 2025 • 115
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation

Paper • 2506.20639 • Published Jun 25, 2025 • 31
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Paper • 2508.06471 • Published Aug 8, 2025 • 211
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization

Paper • 2508.14460 • Published Aug 20, 2025 • 85
Clio: Privacy-Preserving Insights into Real-World AI Use

Paper • 2412.13678 • Published Dec 18, 2024 • 1
AWorld: Orchestrating the Training Recipe for Agentic AI

Paper • 2508.20404 • Published Aug 28, 2025 • 38
Hermes 4 Technical Report

Paper • 2508.18255 • Published Aug 25, 2025 • 49
Paper2Agent: Reimagining Research Papers As Interactive and Reliable AI Agents

Paper • 2509.06917 • Published Sep 8, 2025 • 44
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

Paper • 2509.12201 • Published Sep 15, 2025 • 107
ARE: Scaling Up Agent Environments and Evaluations

Paper • 2509.17158 • Published Sep 21, 2025 • 36
Qwen3-Omni Technical Report

Paper • 2509.17765 • Published Sep 22, 2025 • 153
Paper2Video: Automatic Video Generation from Scientific Papers

Paper • 2510.05096 • Published Oct 6, 2025 • 120
Training-Free Group Relative Policy Optimization

Paper • 2510.08191 • Published Oct 9, 2025 • 46
AutoPR: Let's Automate Your Academic Promotion!

Paper • 2510.09558 • Published Oct 10, 2025 • 53
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

Paper • 2510.14528 • Published Oct 16, 2025 • 124
BitNet Distillation

Paper • 2510.13998 • Published Oct 15, 2025 • 59
Dr.LLM: Dynamic Layer Routing in LLMs

Paper • 2510.12773 • Published Oct 14, 2025 • 32
The Art of Scaling Reinforcement Learning Compute for LLMs

Paper • 2510.13786 • Published Oct 15, 2025 • 33
The Free Transformer

Paper • 2510.17558 • Published Oct 20, 2025 • 6
Kimi Linear: An Expressive, Efficient Attention Architecture

Paper • 2510.26692 • Published Oct 30, 2025 • 132
HunyuanOCR Technical Report

Paper • 2511.19575 • Published Nov 24, 2025 • 22
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 106
ReviewerToo: Should AI Join The Program Committee? A Look At The Future of Peer Review

Paper • 2510.08867 • Published Oct 9, 2025 • 6
Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition

Paper • 2512.15603 • Published Dec 17, 2025 • 69
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers

Paper • 2512.17351 • Published Dec 19, 2025 • 28
TimeBill: Time-Budgeted Inference for Large Language Models

Paper • 2512.21859 • Published Dec 26, 2025 • 25
Benchmark^2: Systematic Evaluation of LLM Benchmarks

Paper • 2601.03986 • Published Jan 7 • 34
AI-Researcher: Autonomous Scientific Innovation

Paper • 2505.18705 • Published May 24, 2025 • 1
AIBrix: Towards Scalable, Cost-Effective Large Language Model Inference Infrastructure

Paper • 2504.03648 • Published Feb 22, 2025 • 1
Qwen3-TTS Technical Report

Paper • 2601.15621 • Published Jan 22 • 75
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability

Paper • 2601.18778 • Published Jan 26 • 42
Accelerating Scientific Research with Gemini: Case Studies and Common Techniques

Paper • 2602.03837 • Published Feb 3 • 5
AttentionRetriever: Attention Layers are Secretly Long Document Retrievers

Paper • 2602.12278 • Published Feb 12 • 2
jina-embeddings-v5-text: Task-Targeted Embedding Distillation

Paper • 2602.15547 • Published Feb 17 • 26
Robot Learning: A Tutorial

Paper • 2510.12403 • Published Oct 14, 2025 • 130
Can Aha Moments Be Fake? Identifying True and Decorative Thinking Steps in Chain-of-Thought

Paper • 2510.24941 • Published Oct 28, 2025 • 4
Building AI Coding Agents for the Terminal: Scaffolding, Harness, Context Engineering, and Lessons Learned

Paper • 2603.05344 • Published Mar 5 • 7
TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas

Paper • 2603.16448 • Published Mar 17 • 58
Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Paper • 2512.20848 • Published Dec 23, 2025 • 42
NVIDIA Nemotron 3: Efficient and Open Intelligence

Paper • 2512.20856 • Published Dec 24, 2025 • 43
FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol

Paper • 2603.24943 • Published 28 days ago • 12
Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems

Paper • 2604.14228 • Published 9 days ago • 23
PaperOrchestra: A Multi-Agent Framework for Automated AI Research Paper Writing

Paper • 2604.05018 • Published 17 days ago • 2

Upvote

Collection guide
Browse collections

Papers

Vision Papers