VideoTimeTravel : High-Fidelity Face Re-Aging
Diffusion Models For Production Video

Anonymous Author(s)

VideoTimeTravel is first diffusion-based video re-aging neural net for production-level AI model.

Abstract

Machine learning-based face re-aging enables to automatically change age-related attributes given a target age, starkly reducing the need of manual labor among experienced artists. Variants of StyleGAN or diffusion models exhibited promising outcomes, but such approaches often lead to low fidelity on in-the-wild samples or still remain on image space. Thus, those video-level technics has lagged behind, despite video inference is imperative for practical aspects. To this end, we introduce diffusion-based re-aging models, marking the first attempt to incorporate diffusion scheme on video face re-aging task. Our optimization-based denoising approach can generate faithful re-aging results on various conditions, surpassing the shortcoming of GAN and VAE. Concretely, to improve global semantic coherence, joint null-text optimization is proposed where a single embedding is learned to coverage entire scene using keyframes. In addition, we leverage delta map which significantly amplifies image fidelity via residual manner. Thanks to the nature of delta, our system can lift the 2D diffusion models to video editing by neglecting age-irrelavant region and by propagating age-relavant pixels to adjacent frames through estimated optical flow without costly computing power. Our VideoTimeTravel ensures high fidelity, achieving unprecedented generalizable capability on in-the-wild cases such as occlusion, accessories and also satisfying industrial demand such as movie trailer, CGI.


How It Works



Conceptual behavior of delta operation


Instead of generating entire pixels in an image, our system concentrates to generate the delta map and faciliate it for temporal consistency and high fidelity. In addition, by jointly optimizing null-text embedding on keyframes, global semantic consistency can be further enhanced.


Method Overview


Overview of our VideoTimeTravel


We introduce delta propogation strategy, which allows us to boost 2D diffusion to video diffusion with no costly computating power.


Production-Level Test

Additional videos will be revealed ASAP!

Source: Sophie Marceau
Estimated Age: 45
Result: Young Direction
Target Age : 21
Source: Jack Black, A Minecraft Movie (2025)
Estimated Age: 50
Result: Young Direction
Target Age : 25
Source: Lee Jung-jae, Squid Game 1 (2021)
Estimated Age: 37
Result: Old Direction
Target Age : 45
Source: Lee Jung-jae, Squid Game 3 (2025)
Estimated Age: 36
Result: Young Direction
Target Age : 25
Source: Sophie Marceau
Estimated Age: 37
Result: Young Direction
Target Age : 18
Source: Joey Wong, 倩女幽魂 (1987)
Estimated Age: 25
Result: Young Direction
Target Age : 15
Source: Alita:Battle Angel (2019)
Estimated Age: 22
Result: Old Direction
Target Age : 35
Source: Song Kang-ho, Memories of Murder (2003)
Estimated Age: 31
Result: Old Direction
Target Age : 46
Source: Scarlett Johansson, The Avengers (2012)
Estimated Age: 22
Result: Young Direction
Target Age : 15
Source: Sophie Marceau, La Boum (1980)
Estimated Age: 16
Result: Old Direction
Target Age : 35
Source: Sharon Stone, Basic Instinct (1992)
Estimated Age: 27
Result: Old Direction
Target Age : 35
Source: Harris Dickinson, The King's Man (2021)
Estimated Age: 33
Result: Old Direction
Target Age : 45
Source: Tom Hardy, Legend (2015)
Estimated Age: 41
Result: Young Direction
Target Age : 25
Source: Robert De Niro, Taxi Driver (1976)
Estimated Age: 31
Result: Old Direction
Target Age : 45

Our system showcases quite reasonable results on production-level videos



Long-Duration Test


Source: VFHQ - DSIshKfsJmE
Estimated Age: 38
Result: Young Direction
Target Age : 25
Source: VFHQ - H6nGiKvDQAY
Estimated Age: 39
Result: Young Direction
Target Age : 25
Source: VFHQ - OPF8webjmPg
Estimated Age: 44
Result: Young Direction
Target Age : 25

Ours also perform well on long-duration video (>100 frames).



Qualitative Comparison

Prompt-based General Video Editing Methods

Age
48→30
Age
30→60
Age
53→30
Age
42→60
Age
51→60
Age
33→60

Original

Ours

RAVE
CVPR 2024

VidToMe
CVPR 2024

BIVDiff
CVPR 2024

TokenFlow
ICLR 2024

Rerender-A-Video
Siggraph Asia 2023

Fate/Zero
ICCV 2023

Text2Video-Zero
ICCV 2023

Pix2Video
ICCV 2023

Our method outperforms existing video editing models in terms of re-aging performance, fidelity and temporal consistency



Qualitative Comparison

Attribute-based Video Face Editing Methods

Age
45→30
Age
32→60
Age
44→60
Age
43→60
Age
40→60

Original

Ours

STIT
Siggraph Asia 2022

StyleGANEX
ICCV 2023

DiffusionVAE
CVPR 2023

VIVE3D
CVPR 2023

Our method is not constrained to any particular distribution such as FFHQ, in contrast to video face editing methods.