RelayGen: Intra-Generation Model Switching for Efficient Reasoning Paper • 2602.06454 • Published 26 days ago • 11
LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents Paper • 2602.01053 • Published about 1 month ago • 8
Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection Paper • 2602.03216 • Published 29 days ago • 12
QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models Paper • 2509.17428 • Published Sep 22, 2025 • 9