view article Article Building MoulSot: How We Curated 1,500 Hours of Moroccan Darija Speech, Selected the Best 80 Hours for Transcription, and Fine-Tuned Qwen3-ASR on Top of It about 10 hours ago • 1
view article Article Atlaset Dataset for Moroccan Darija: From Data Collection, Analysis, to Model Trainings Mar 6, 2025 • 27
MorphBPE: A Morpho-Aware Tokenizer Bridging Linguistic Complexity for Efficient LLM Training Across Morphologies Paper • 2502.00894 • Published Feb 2, 2025 • 3