BioVITA: Biological Dataset, Model, and Benchmark for Visual-Textual-Acoustic Alignment Paper • 2603.23883 • Published 25 days ago • 6
AnimalCLAP: Taxonomy-Aware Language-Audio Pretraining for Species Recognition and Trait Inference Paper • 2603.22053 • Published 26 days ago • 3
AlignBench: Benchmarking Fine-Grained Image-Text Alignment with Synthetic Image-Caption Pairs Paper • 2511.20515 • Published Nov 25, 2025 • 5
AgroBench: Vision-Language Model Benchmark in Agriculture Paper • 2507.20519 • Published Jul 28, 2025 • 8
Zero-shot Hierarchical Plant Segmentation via Foundation Segmentation Models and Text-to-image Attention Paper • 2509.09116 • Published Sep 11, 2025
Photo-Realistic Monocular Gaze Redirection Using Generative Adversarial Networks Paper • 1903.12530 • Published Mar 29, 2019
ETH-XGaze: A Large Scale Dataset for Gaze Estimation under Extreme Head Pose and Gaze Variation Paper • 2007.15837 • Published Jul 31, 2020
UniGaze: Towards Universal Gaze Estimation via Large-scale Pre-Training Paper • 2502.02307 • Published Feb 4, 2025
ActionVOS: Actions as Prompts for Video Object Segmentation Paper • 2407.07402 • Published Jul 10, 2024
LORE: Latent Optimization for Precise Semantic Control in Rectified Flow-based Image Editing Paper • 2508.03144 • Published Aug 5, 2025 • 4