Vision Transformers
updated
Mobile V-MoEs: Scaling Down Vision Transformers via Sparse
Mixture-of-Experts
Paper
• 2309.04354
• Published • 16
Vision Transformers Need Registers
Paper
• 2309.16588
• Published • 86
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models
Paper
• 2309.16414
• Published • 18
MotionLM: Multi-Agent Motion Forecasting as Language Modeling
Paper
• 2309.16534
• Published • 17
BLIP: Bootstrapping Language-Image Pre-training for Unified
Vision-Language Understanding and Generation
Paper
• 2201.12086
• Published • 3
FiT: Flexible Vision Transformer for Diffusion Model
Paper
• 2402.12376
• Published • 48
Subobject-level Image Tokenization
Paper
• 2402.14327
• Published • 18
Scalable Diffusion Models with Transformers
Paper
• 2212.09748
• Published • 17
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal
Large Language Models
Paper
• 2408.04840
• Published • 33
Seeing and Understanding: Bridging Vision with Chemical Knowledge Via
ChemVLM
Paper
• 2408.07246
• Published • 22