Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets Paper • 2602.22207 • Published 9 days ago • 39
view article Article GGML and llama.cpp join HF to ensure the long-term progress of Local AI +4 15 days ago • 478
Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs Paper • 2502.19413 • Published Feb 26, 2025 • 22
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs Paper • 2504.04030 • Published Apr 5, 2025 • 3
view article Article LateOn-Code & ColGrep: LightOn unveils state-of-the-art code retrieval models and code search tooling 22 days ago • 47
Open Coding Agents Specialization Collection Ai2 Open Coding Agents - Django, Sphinx, Sympy Data • 6 items • Updated 23 days ago • 2
Multilingual PII & De-Identification Collection Multilingual models for extracting PII entities and de-identifying clinical text, with support for HIPAA and GDPR compliance. • 140 items • Updated 4 days ago • 21
view article Article Classement compar:IA : des votes des utilisateurs au classement participatif des modèles Nov 3, 2025 • 7
compar:IA: The French Government's LLM arena to collect French-language human prompts and preference data Paper • 2602.06669 • Published 28 days ago • 7
view article Article Community Evals: Because we're done trusting black-box leaderboards over the community +5 about 1 month ago • 85
Instruction Pretrained Experiments Collection Experiments associated with the paper 'Continued Pretraining and Interpretability-Based Evaluation for Low-Resource Languages: A Galician Case Study' • 3 items • Updated Dec 11, 2025 • 1
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text Paper • 2601.22975 • Published Jan 30 • 109
MMFineReason Collection High-quality STEM reasoning dataset for Multimodal LLM post-training. • 8 items • Updated 4 days ago • 22