audio
updated
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with
Audio2Video Diffusion Model under Weak Conditions
Paper
• 2402.17485
• Published
• 194
MusicHiFi: Fast High-Fidelity Stereo Vocoding
Paper
• 2403.10493
• Published
• 18
Paper
• 2404.13358
• Published
• 14
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Paper
• 2406.02430
• Published
• 38
Audio Mamba: Bidirectional State Space Model for Audio Representation
Learning
Paper
• 2406.03344
• Published
• 22
VideoTetris: Towards Compositional Text-to-Video Generation
Paper
• 2406.04277
• Published
• 25
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Paper
• 2406.18009
• Published
• 22
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of
Audio Events in Text-to-audio Generation
Paper
• 2407.02869
• Published
• 21
FunAudioLLM: Voice Understanding and Generation Foundation Models for
Natural Interaction Between Humans and LLMs
Paper
• 2407.04051
• Published
• 40
Video-to-Audio Generation with Hidden Alignment
Paper
• 2407.07464
• Published
• 17
Masked Generative Video-to-Audio Transformers with Enhanced
Synchronicity
Paper
• 2407.10387
• Published
• 8
MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music
Generation
Paper
• 2407.15060
• Published
• 9
MulliVC: Multi-lingual Voice Conversion With Cycle Consistency
Paper
• 2408.04708
• Published
• 8
Presto! Distilling Steps and Layers for Accelerating Music Generation
Paper
• 2410.05167
• Published
• 18