Data
updated
Large Language Models are Superpositions of All Characters: Attaining
Arbitrary Role-play via Self-Alignment
Paper
• 2401.12474
• Published
• 36
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic
Prompt Compression
Paper
• 2403.12968
• Published
• 25
RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities
of Large Language Models
Paper
• 2310.00746
• Published
• 1
LESS: Selecting Influential Data for Targeted Instruction Tuning
Paper
• 2402.04333
• Published
• 3
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora
with Web Data, and Web Data Only
Paper
• 2306.01116
• Published
• 43
The FineWeb Datasets: Decanting the Web for the Finest Text Data at
Scale
Paper
• 2406.17557
• Published
• 100
Scaling Synthetic Data Creation with 1,000,000,000 Personas
Paper
• 2406.20094
• Published
• 104
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs
with Nothing
Paper
• 2406.08464
• Published
• 71
Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic
Paper
• 2407.18129
• Published
• 12
Meltemi: The first open Large Language Model for Greek
Paper
• 2407.20743
• Published
• 68
Meta-Rewarding Language Models: Self-Improving Alignment with
LLM-as-a-Meta-Judge
Paper
• 2407.19594
• Published
• 21
Paper
• 2408.05366
• Published
• 14
Synth-Empathy: Towards High-Quality Synthetic Empathy Data
Paper
• 2407.21669
• Published
DiaSynth -- Synthetic Dialogue Generation Framework
Paper
• 2409.19020
• Published
• 20
CCI3.0-HQ: a large-scale Chinese dataset of high quality designed for
pre-training large language models
Paper
• 2410.18505
• Published
• 11
Conifer: Improving Complex Constrained Instruction-Following Ability of
Large Language Models
Paper
• 2404.02823
• Published
• 3
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep
Thinking
Paper
• 2501.04519
• Published
• 288