Aleksei Dorkin's picture

Aleksei Dorkin PRO

adorkin

·

AI & ML interests

Computational Linguistics

Recent Activity

liked a model about 3 hours ago

ilsp/Llama-Krikri-8B-Instruct

updated a collection 1 day ago

Code SFT Datasets

updated a collection 3 days ago

Code SFT Datasets

View all activity

Organizations

upvoted a paper 3 days ago

Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets

Paper • 2602.22207 • Published 9 days ago • 39

upvoted an article 12 days ago

Article

GGML and llama.cpp join HF to ensure the long-term progress of Local AI

+4

15 days ago

•

478

upvoted a paper 14 days ago

Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs

Paper • 2502.19413 • Published Feb 26, 2025 • 22

upvoted a collection 17 days ago

code_rlcef

8 items • Updated Jun 1, 2025 • 2

upvoted a paper 17 days ago

OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs

Paper • 2504.04030 • Published Apr 5, 2025 • 3

upvoted an article 21 days ago

Article

LateOn-Code & ColGrep: LightOn unveils state-of-the-art code retrieval models and code search tooling

22 days ago

•

47

upvoted a collection 21 days ago

UltraData

Ultra Scale, Ultra Quality, Ultra Coverage • 9 items • Updated 25 days ago • 77

upvoted a paper 23 days ago

EuroLLM-22B: Technical Report

Paper • 2602.05879 • Published 29 days ago • 3

upvoted a collection 23 days ago

Open Coding Agents Specialization

Ai2 Open Coding Agents - Django, Sphinx, Sympy Data • 6 items • Updated 23 days ago • 2

upvoted a changelog 23 days ago

Hugging Face Changelog

Scoped Full-text Search

24 days ago

• 78

upvoted a collection 23 days ago

Multilingual PII & De-Identification

Multilingual models for extracting PII entities and de-identifying clinical text, with support for HIPAA and GDPR compliance. • 140 items • Updated 4 days ago • 21

upvoted an article 24 days ago

Article

Classement compar:IA : des votes des utilisateurs au classement participatif des modèles

Nov 3, 2025

•

7

upvoted a paper 24 days ago

compar:IA: The French Government's LLM arena to collect French-language human prompts and preference data

Paper • 2602.06669 • Published 28 days ago • 7

upvoted 2 articles 28 days ago

Article

Community Evals: Because we're done trusting black-box leaderboards over the community

+5

about 1 month ago

•

85

Article

🇵🇭 FilBench - Can LLMs Understand and Generate Filipino?

+7

Aug 12, 2025

•

23

upvoted a changelog 28 days ago

Hugging Face Changelog

Community Evals and Benchmark Repositories

29 days ago

• 67

upvoted a collection 29 days ago

Instruction Pretrained Experiments

Experiments associated with the paper 'Continued Pretraining and Interpretability-Based Evaluation for Low-Resource Languages: A Galician Case Study' • 3 items • Updated Dec 11, 2025 • 1

upvoted a paper about 1 month ago

Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text

Paper • 2601.22975 • Published Jan 30 • 109

upvoted 2 collections about 1 month ago

Open Coding Agents

13 items • Updated 1 day ago • 49

MMFineReason

High-quality STEM reasoning dataset for Multimodal LLM post-training. • 8 items • Updated 4 days ago • 22