Intrinsic Temporal Regularization for High-Resolution Human Video Synthesis

Lingbo Yang1,2   Zhanning Gao2   Peiran Ren2   Siwei Ma1   Wen Gao1

1Peking University  2Alibaba DAMO Academy 


Temporal consistency is crucial for extending image processing pipelines to the video domain, which is often enforced with flow-based warping error over adjacent frames. Yet for human video synthesis, such scheme is less reliable due to the misalignment between source and target video as well as the difficulty in accurate flow estimation. In this paper, we propose an effective intrinsic temporal regularization scheme to mitigate these issues, where an intrinsic confidence map is estimated via the frame generator to regulate motion estimation via temporal loss modulation. This creates a shortcut for back-propagating temporal loss gradients directly to the front-end motion estimator, thus improving training stability and temporal coherence in output videos. We apply our intrinsic temporal regulation to single-image generator, leading to a powerful “INTERnet” capable of generating 512 * 512 resolution human action videos with temporal-coherent, realistic visual details. Extensive experiments demonstrate the superiority of proposed INTERnet over several competitive baselines.


If you use our code or data, please cite:

  title={Intrinsic Temporal Regularization for High-Resolution Human Video Synthesis},
  author={Lingbo Yang and Zhanning Gao and Peiran Ren and Siwei Ma and Wen Gao},
  journal={CVPR to appear},
Acknowledgements: page template comes from Po-Han Huang