2 9 12

Zijie Xin

xxayt

https://xxayt.github.io/

xxayt

AI & ML interests

multi-modal learning, AIGC

Recent Activity

upvoted a paper 13 days ago

MMOU: A Massive Multi-Task Omni Understanding and Reasoning Benchmark for Long and Complex Real-World Videos

authored a paper 14 days ago

SAVE: Speech-Aware Video Representation Learning for Video-Text Retrieval

upvoted a paper 14 days ago

SAVE: Speech-Aware Video Representation Learning for Video-Text Retrieval

View all activity

Organizations

upvoted a paper 13 days ago

MMOU: A Massive Multi-Task Omni Understanding and Reasoning Benchmark for Long and Complex Real-World Videos

Paper • 2603.14145 • Published 25 days ago • 14

authored a paper 14 days ago

SAVE: Speech-Aware Video Representation Learning for Video-Text Retrieval

Paper • 2603.08224 • Published about 1 month ago • 1

upvoted a paper 14 days ago

SAVE: Speech-Aware Video Representation Learning for Video-Text Retrieval

Paper • 2603.08224 • Published about 1 month ago • 1

upvoted a paper 4 months ago

OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models

Paper • 2511.14582 • Published Nov 18, 2025 • 19

upvoted a collection 4 months ago

Qwen3-Omni

Collection

6 items • Updated Dec 31, 2025 • 195

authored a paper 6 months ago

Multi-Object Sketch Animation by Scene Decomposition and Motion Planning

Paper • 2503.19351 • Published Mar 25, 2025 • 1

upvoted a paper 6 months ago

Multi-Object Sketch Animation by Scene Decomposition and Motion Planning

Paper • 2503.19351 • Published Mar 25, 2025 • 1

updated a collection 7 months ago

MGSV

Collection

[ICCV 2025] Music Grounding by Short Video • 3 items • Updated Sep 9, 2025 • 1

upvoted a collection 7 months ago

MGSV

Collection

[ICCV 2025] Music Grounding by Short Video • 3 items • Updated Sep 9, 2025 • 1

liked a dataset 7 months ago

NVEagle/VideoITG-40K

Preview • Updated Aug 8, 2025 • 81 • 3

liked a model 7 months ago

OpenGVLab/InternVL_2_5_HiCo_R16

Video-Text-to-Text • 8B • Updated Feb 13, 2025 • 205 • 6

upvoted a paper 8 months ago

Music Grounding by Short Video

Paper • 2408.16990 • Published Aug 30, 2024 • 2

authored a paper 8 months ago

Learning Partially-Decorrelated Common Spaces for Ad-hoc Video Search

Paper • 2508.02340 • Published Aug 4, 2025

updated a dataset 8 months ago

xxayt/MGSV-EC

Viewer • Updated Aug 5, 2025 • 53.2k • 22 • 2

upvoted 2 papers 8 months ago

VideoDeepResearch: Long Video Understanding With Agentic Tool Using

Paper • 2506.10821 • Published Jun 12, 2025 • 19

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning

Paper • 2410.19702 • Published Oct 25, 2024 • 1

New activity in xxayt/MGSV-EC 12 months ago

[bot] Conversion to Parquet

#2 opened about 1 year ago by

parquet-converter

liked a dataset 12 months ago

TheEighthDay/SeekWorld

Preview • Updated Apr 20, 2025 • 129 • 6

Zijie Xin

AI & ML interests

Recent Activity

Organizations

xxayt's activity

[bot] Conversion to Parquet