Submitted by akhaliq 34 Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory · 4 authors
Submitted by akhaliq 25 Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning · 8 authors 194
Submitted by akhaliq 23 Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding · 45 authors 4.29k 2
Submitted by akhaliq 18 Understanding the performance gap between online and offline alignment algorithms · 11 authors
Submitted by akhaliq 17 Compositional Text-to-Image Generation with Dense Blob Representations · 6 authors 1
Submitted by akhaliq 15 No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding · 5 authors
Submitted by akhaliq 12 SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models · 14 authors