DreamID-V

Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer

* Equal contribution, Corresponding author

Tsinghua University | Intelligent Creation Lab, Bytedance

Research Paper GitHub

Demo Video

Video Face Swapping Results
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...

Method

InsertPipe
We pre-train the Identity-Anchored Video Synthesizer and combine it with the Image Face Swapping model DreamID to construct Bidirectional Quadruplet Pair data.


InsertPipe
We design customized injection mechanisms for Spatio-Temporal Context, Structural Guidance, and Identity Information, respectively. To fully leverage the Bidirectional Quadruplet Pair data, we devised a three-stage training strategy: Synthetic Training, Real Augmentation Training, and Identity-Coherence Reinforcement Learning.

Additional Vertical Video Demos

DreamID-V achieves high-fidelity face swapping across challenging scenarios, including hair occlusion, complex lighting, diverse ethnicities, and significant face shape variations.


Loading...
Loading...
Loading...
Loading...
Loading...
Loading...

Ethical Considerations

The reference images and videos used in these demo videos are sourced from public domains or generated by models, and are intended solely to demonstrate the capabilities of this research. If there are any concerns, please contact us (guo-x24@mails.tsinghua.edu.cn) and we will delete it in time.