jina-embeddings-v5-omni: Text-Geometry-Preserving Multimodal Embeddings via Frozen-Tower Composition Paper • 2605.08384 • Published 15 days ago • 10
jina-embeddings-v5-omni Collection Multimodal (text + image + video + audio) embedding models aligned with jina-embeddings-v5-text-*. Two sizes, four task variants each. • 27 items • Updated 10 days ago • 36
view article Article DenseOn with the LateOn: Open State-of-the-Art Single and Multi-Vector Models lightonai • Apr 21 • 38
DenseOn & LateOn Collection A collection of open state-of-the-art single and multi-vector models • 7 items • Updated about 1 month ago • 10
CodeScout Collection RL-trained code search agents (1.7B, 4B, 14B) that outperform 2–18× larger models using only a Unix terminal. 📄 arxiv.org/abs/2603.17829 • 12 items • Updated Mar 19 • 8
ColBERT-Zero: To Pre-train Or Not To Pre-train ColBERT models Paper • 2602.16609 • Published Feb 18 • 7
artificial-hivemind Collection This collection contains datasets for the Artificial Hiveminds paper. • 4 items • Updated May 16, 2025 • 16
LightOnOCR-2 🦉 Collection LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family • 12 items • Updated Apr 7 • 24
Sarashina2.2 Collection Large Language Models developed by SB Intuitions. Pretrained and instruction-tuned models are available in three sizes: 0.5B, 1B, and 3B. • 6 items • Updated Mar 5, 2025 • 10
view article Article Introducing RTEB: A New Standard for Retrieval Evaluation +4 fzliu, KennethEnevoldsen, Samoed, isaacchung, tomaarsen, fzoll • Oct 1, 2025 • 144
view article Article Welcome EmbeddingGemma, Google's new efficient embedding model +4 tomaarsen, Xenova, alvarobartt, ariG23498, pcuenq, sergiopaniego • Sep 4, 2025 • 274
view article Article Training and Finetuning Sparse Embedding Models with Sentence Transformers tomaarsen, arthurbresnu • Jul 1, 2025 • 138
FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents Paper • 2504.13128 • Published Apr 17, 2025 • 7
AceMath Collection We are releasing math instruction models, math reward models, general instruction models, all training datasets, and a math reward benchmark. • 11 items • Updated 3 days ago • 18