Gen AI Diffusion
updated
Animate-X: Universal Character Image Animation with Enhanced Motion
Representation
Paper
• 2410.10306
• Published
• 56
ReCapture: Generative Video Camera Controls for User-Provided Videos
using Masked Video Fine-Tuning
Paper
• 2411.05003
• Published
• 71
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for
Image-to-Video Generation
Paper
• 2411.04709
• Published
• 26
IterComp: Iterative Composition-Aware Feedback Learning from Model
Gallery for Text-to-Image Generation
Paper
• 2410.07171
• Published
• 43
Story-Adapter: A Training-free Iterative Framework for Long Story
Visualization
Paper
• 2410.06244
• Published
• 20
How Far is Video Generation from World Model: A Physical Law Perspective
Paper
• 2411.02385
• Published
• 34
Training-free Regional Prompting for Diffusion Transformers
Paper
• 2411.02395
• Published
• 25
AutoVFX: Physically Realistic Video Editing from Natural Language
Instructions
Paper
• 2411.02394
• Published
• 16
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse
Autoencoders
Paper
• 2410.22366
• Published
• 84
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Paper
• 2410.10812
• Published
• 18
DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise
Motion Control
Paper
• 2410.13830
• Published
• 26
SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion
Models
Paper
• 2411.05007
• Published
• 24
Add-it: Training-Free Object Insertion in Images With Pretrained
Diffusion Models
Paper
• 2411.07232
• Published
• 68
OmniEdit: Building Image Editing Generalist Models Through Specialist
Supervision
Paper
• 2411.07199
• Published
• 50
MagicQuill: An Intelligent Interactive Image Editing System
Paper
• 2411.09703
• Published
• 80
AnimateAnything: Consistent and Controllable Animation for Video
Generation
Paper
• 2411.10836
• Published
• 24
Stylecodes: Encoding Stylistic Information For Image Generation
Paper
• 2411.12811
• Published
• 12
VideoRepair: Improving Text-to-Video Generation via Misalignment
Evaluation and Localized Refinement
Paper
• 2411.15115
• Published
• 10
Style-Friendly SNR Sampler for Style-Driven Generation
Paper
• 2411.14793
• Published
• 39
OminiControl: Minimal and Universal Control for Diffusion Transformer
Paper
• 2411.15098
• Published
• 61
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
Paper
• 2412.01169
• Published
• 13
SNOOPI: Supercharged One-step Diffusion Distillation with Proper
Guidance
Paper
• 2412.02687
• Published
• 113
NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic
Adversarial Training
Paper
• 2412.02030
• Published
• 19
MotionShop: Zero-Shot Motion Transfer in Video Diffusion Models with
Mixture of Score Guidance
Paper
• 2412.05355
• Published
• 8
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution
Image Synthesis
Paper
• 2412.04431
• Published
• 17
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for
Customized Manga Generation
Paper
• 2412.07589
• Published
• 48
FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion
Models
Paper
• 2412.07674
• Published
• 20
UniReal: Universal Image Generation and Editing via Learning Real-world
Dynamics
Paper
• 2412.07774
• Published
• 30
LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style
Conditioned Image Generation
Paper
• 2412.05148
• Published
• 12
ObjCtrl-2.5D: Training-free Object Control with Camera Poses
Paper
• 2412.07721
• Published
• 9
StyleMaster: Stylize Your Video with Artistic Generation and Translation
Paper
• 2412.07744
• Published
• 20
FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow
Models
Paper
• 2412.08629
• Published
• 13
DisPose: Disentangling Pose Guidance for Controllable Human Image
Animation
Paper
• 2412.09349
• Published
• 8
LAION-SG: An Enhanced Large-Scale Dataset for Training Complex
Image-Text Models with Structural Annotations
Paper
• 2412.08580
• Published
• 45
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via
Multimodal LLM
Paper
• 2412.09618
• Published
• 21
LoRACLR: Contrastive Adaptation for Customization of Diffusion Models
Paper
• 2412.09622
• Published
• 8
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free
Scale Fusion
Paper
• 2412.09626
• Published
• 21
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
Paper
• 2412.15213
• Published
• 28
BrushEdit: All-In-One Image Inpainting and Editing
Paper
• 2412.10316
• Published
• 36
Paper
• 2412.18653
• Published
• 86
From Elements to Design: A Layered Approach for Automatic Graphic Design
Composition
Paper
• 2412.19712
• Published
• 15
VideoMaker: Zero-shot Customized Video Generation with the Inherent
Force of Video Diffusion Models
Paper
• 2412.19645
• Published
• 13
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion
Control
Paper
• 2501.01427
• Published
• 53
SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video
Restoration
Paper
• 2501.01320
• Published
• 12
ConceptMaster: Multi-Concept Video Customization on Diffusion
Transformer Models Without Test-Time Tuning
Paper
• 2501.04698
• Published
• 15
Diffusion Adversarial Post-Training for One-Step Video Generation
Paper
• 2501.08316
• Published
• 36
3DIS-FLUX: simple and efficient multi-instance generation with DiT
rendering
Paper
• 2501.05131
• Published
• 37
AnyStory: Towards Unified Single and Multiple Subject Personalization in
Text-to-Image Generation
Paper
• 2501.09503
• Published
• 14
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising
Steps
Paper
• 2501.09732
• Published
• 72
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute
in Linear Diffusion Transformer
Paper
• 2501.18427
• Published
• 24
MatAnyone: Stable Video Matting with Consistent Memory Propagation
Paper
• 2501.14677
• Published
• 34
Analyze Feature Flow to Enhance Interpretation and Steering in Language
Models
Paper
• 2502.03032
• Published
• 60
DynVFX: Augmenting Real Videos with Dynamic Content
Paper
• 2502.03621
• Published
• 31
Generating Multi-Image Synthetic Data for Text-to-Image Customization
Paper
• 2502.01720
• Published
• 8
Magic 1-For-1: Generating One Minute Video Clips within One Minute
Paper
• 2502.07701
• Published
• 36
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human
Animation Models
Paper
• 2502.01061
• Published
• 223
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning
in Diffusion Models
Paper
• 2502.10458
• Published
• 38
PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data
Paper
• 2502.14397
• Published
• 41
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
Paper
• 2502.18411
• Published
• 74
KV-Edit: Training-Free Image Editing for Precise Background Preservation
Paper
• 2502.17363
• Published
• 37
K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs
Paper
• 2502.18461
• Published
• 17
What's in a Latent? Leveraging Diffusion Latent Space for Domain
Generalization
Paper
• 2503.06698
• Published
• 4
EasyControl: Adding Efficient and Flexible Control for Diffusion
Transformer
Paper
• 2503.07027
• Published
• 29
YuE: Scaling Open Foundation Models for Long-Form Music Generation
Paper
• 2503.08638
• Published
• 72
OmniPaint: Mastering Object-Oriented Editing via Disentangled
Insertion-Removal Inpainting
Paper
• 2503.08677
• Published
• 29
Personalize Anything for Free with Diffusion Transformer
Paper
• 2503.12590
• Published
• 44
Efficient Personalization of Quantized Diffusion Model without
Backpropagation
Paper
• 2503.14868
• Published
• 20
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM
Paper
• 2503.14478
• Published
• 48
Video-T1: Test-Time Scaling for Video Generation
Paper
• 2503.18942
• Published
• 90
Judge Anything: MLLM as a Judge Across Any Modality
Paper
• 2503.17489
• Published
• 23
Training-free Diffusion Acceleration with Bottleneck Sampling
Paper
• 2503.18940
• Published
• 12
BizGen: Advancing Article-level Visual Text Rendering for Infographics
Generation
Paper
• 2503.20672
• Published
• 14
ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation
Paper
• 2503.22194
• Published
• 25
Zero4D: Training-Free 4D Video Generation From Single Video Using
Off-the-Shelf Video Diffusion Model
Paper
• 2503.22622
• Published
• 18
ChatAnyone: Stylized Real-time Portrait Video Generation with
Hierarchical Motion Diffusion Model
Paper
• 2503.21144
• Published
• 27
Optimal Stepsize for Diffusion Sampling
Paper
• 2503.21774
• Published
• 13
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual
Editing
Paper
• 2504.02826
• Published
• 68
FreSca: Unveiling the Scaling Space in Diffusion Models
Paper
• 2504.02154
• Published
• 18
One-Minute Video Generation with Test-Time Training
Paper
• 2504.05298
• Published
• 110
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for
Autoregressive Image Generation
Paper
• 2504.08736
• Published
• 46
Alchemist: Turning Public Text-to-Image Data into Generative Gold
Paper
• 2505.19297
• Published
• 84
OmniConsistency: Learning Style-Agnostic Consistency from Paired
Stylization Data
Paper
• 2505.18445
• Published
• 63
Temporal In-Context Fine-Tuning for Versatile Control of Video Diffusion
Models
Paper
• 2506.00996
• Published
• 40
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image
Synthesis
Paper
• 2506.06276
• Published
• 26
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Paper
• 2506.07986
• Published
• 19
Multiverse: Your Language Models Secretly Decide How to Parallelize and
Merge Generation
Paper
• 2506.09991
• Published
• 55
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper
• 2506.09113
• Published
• 107
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video
Diffusion
Paper
• 2506.08009
• Published
• 30
OmniGen2: Exploration to Advanced Multimodal Generation
Paper
• 2506.18871
• Published
• 78
VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed
View Memory
Paper
• 2506.18903
• Published
• 22
Audit & Repair: An Agentic Framework for Consistent Story Visualization
in Text-to-Image Diffusion Models
Paper
• 2506.18900
• Published
• 3
JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo
Retouching Agent
Paper
• 2506.17612
• Published
• 64
HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based
Diffusion Sampling
Paper
• 2506.20452
• Published
• 19
Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with
Hybrid History Condition
Paper
• 2506.17201
• Published
• 57
FairyGen: Storied Cartoon Video from a Single Child-Drawn Character
Paper
• 2506.21272
• Published
• 9
FilMaster: Bridging Cinematic Principles and Generative AI for Automated
Film Generation
Paper
• 2506.18899
• Published
• 6
AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion
Models
Paper
• 2506.19851
• Published
• 60
FantasyPortrait: Enhancing Multi-Character Portrait Animation with
Expression-Augmented Diffusion Transformers
Paper
• 2507.12956
• Published
• 25
LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation
Paper
• 2508.03694
• Published
• 52
Matrix-3D: Omnidirectional Explorable 3D World Generation
Paper
• 2508.08086
• Published
• 76
Stand-In: A Lightweight and Plug-and-Play Identity Control for Video
Generation
Paper
• 2508.07901
• Published
• 40
Story2Board: A Training-Free Approach for Expressive Storyboard
Generation
Paper
• 2508.09983
• Published
• 70
ToonComposer: Streamlining Cartoon Production with Generative
Post-Keyframing
Paper
• 2508.10881
• Published
• 52
MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos
Paper
• 2512.10881
• Published
• 30