19 47 58

xiangan

https://anxiangsir.github.io/

anxiangsir

AI & ML interests

None yet

Recent Activity

upvoted a paper about 19 hours ago

LLaDA-o: An Effective and Length-Adaptive Omni Diffusion Model

upvoted a paper about 21 hours ago

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

upvoted an article about 21 hours ago

NEO-unify: Building Native Multimodal Unified Models End to End

View all activity

Organizations

upvoted a paper about 19 hours ago

LLaDA-o: An Effective and Length-Adaptive Omni Diffusion Model

Paper • 2603.01068 • Published 5 days ago • 19

upvoted a paper about 21 hours ago

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

Paper • 2510.14979 • Published Oct 16, 2025 • 68

upvoted an article about 21 hours ago

Article

NEO-unify: Building Native Multimodal Unified Models End to End

1 day ago

•

upvoted a paper 3 days ago

UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?

Paper • 2603.03241 • Published 3 days ago • 79

upvoted a changelog 5 days ago

Hugging Face Changelog

Public Storage Add-ons

8 days ago

• 114

upvoted a collection 14 days ago

onevision-encoder

Collection

2 items • Updated 25 days ago • 6

upvoted a paper 17 days ago

UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

Paper • 2602.12279 • Published 22 days ago • 20

upvoted a paper 18 days ago

CoPE-VideoLM: Codec Primitives For Efficient Video Language Models

Paper • 2602.13191 • Published 21 days ago • 30

upvoted a paper 19 days ago

OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence

Paper • 2602.08683 • Published 25 days ago • 50

upvoted a paper 22 days ago

GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning

Paper • 2602.12099 • Published 22 days ago • 57

upvoted a paper about 1 month ago

Innovator-VL: A Multimodal Large Language Model for Scientific Discovery

Paper • 2601.19325 • Published Jan 27 • 79

upvoted 2 papers about 2 months ago

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding

Paper • 2601.10611 • Published Jan 15 • 29

DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset

Paper • 2601.10305 • Published Jan 15 • 36

upvoted a collection 2 months ago

OneVision-Encoder

Collection

HEVC-Style Vision Transformer • 2 items • Updated 25 days ago • 3

upvoted an article 3 months ago

Article

Transformers v5: Simple model definitions powering the AI ecosystem

Dec 1, 2025

•

304

upvoted 3 papers 3 months ago

SwiftVLA: Unlocking Spatiotemporal Dynamics for Lightweight VLA Models at Minimal Overhead

Paper • 2512.00903 • Published Nov 30, 2025 • 7

LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

Paper • 2511.20785 • Published Nov 25, 2025 • 186

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Paper • 2511.16334 • Published Nov 20, 2025 • 93

upvoted a paper 4 months ago

Cambrian-S: Towards Spatial Supersensing in Video

Paper • 2511.04670 • Published Nov 6, 2025 • 38

upvoted a collection 4 months ago

Cambrian-S-Data

Collection

Data used during Cambrian-S's 4-stage training • 4 items • Updated 7 days ago • 5

xiangan

AI & ML interests

Recent Activity

Organizations

xiangan's activity

NEO-unify: Building Native Multimodal Unified Models End to End

Public Storage Add-ons

Transformers v5: Simple model definitions powering the AI ecosystem