Post
34
You can leverage our Hugging Face–ready nano translation dataset, which covers a diverse set of languages including Turkish, English, German, French, Spanish, Italian, Portuguese, Dutch, Russian, Ukrainian, Polish, Czech, Slovak, Hungarian, Romanian, Bulgarian, Greek, Arabic, Persian, Hebrew, Hindi, Bengali, Urdu, Tamil, Telugu, Kannada, Malayalam, Chinese, Japanese, Korean, Indonesian, Malay, Thai, Vietnamese, and several Nordic and Baltic languages. The dataset consists of approximately 600 lines of synthetically generated sentence pairs spanning a wide range of everyday topics, making it lightweight yet versatile for experimentation, prototyping, and benchmarking multilingual translation models. Its compact size allows for quick training iterations and easy integration into low-resource or edge-based NLP workflows, while still providing enough linguistic variety to test generalization across multiple language families. For use: pthinc/BCE-Prettybird-Nano-OWL-v0.1