Papers in 2025

Self-Ensembling Gaussian Splatting for Few-Shot Novel View Synthesis
Chen Zhao, Xuan Wang, Tong Zhang, Saqib Javed, Mathieu Salzmann
Project |
- We observe that 3D Gaussian Splatting (3DGS) excels in novel view synthesis (NVS) but overfits with sparse views, so we propose Self-Ensembling Gaussian Splatting (SE-GS) using an uncertainty-aware perturbation strategy to train a main model alongside perturbed models. We minimize discrepancies between these models to form a robust ensemble for novel-view generation.

Yuan Sun, Xuan Wang, Cong Wang, WeiLi Zhang, Yanbo Fan, Yu Guo, Fei Wang
- 3D Gaussian-based head avatar modeling performs well with enough data, but prior-based methods lack rendering quality due to limited identity-shared representation power. We solve this via joint reconstruction and registration of prior-based and prior-free 3D Gaussians, merging and post-processing for complete avatars, with experiments showing better quality and high-resolution support.

Yudong Jin, Sida Peng, Xuan Wang, Tao Xie, Zhen Xu, Yifan Yang, Yujun Shen, Hujun Bao, Xiaowei Zhou
Project |
- This paper addresses high-fidelity novel-view synthesis from sparse-view human videos. While 4D diffusion models tackle limited observations, they lack spatio-temporal consistency, so we propose a sliding iterative denoising process on a latent grid (encoding image, camera and human poses) via alternating spatial-temporal denoising, enabling sufficient information flow and affordable GPU memory.

DGTalker: Disentangled Generative Latent Space Learning for Audio-Driven Gaussian Talking Heads
Xiaoxi Liang, Yanbo Fan, Qiya Yang, Xuan Wang, Wei Gao, Ge Li
- In this work, we investigate generating high-fidelity, audio-driven 3D Gaussian talking heads from monocular videos, presenting DGTalker—a real-time, high-fidelity, 3D-aware framework that uses Gaussian generative priors and latent space navigation to alleviate 3D information lack and overfitting issues. We propose a disentangled latent space navigation framework for precise lip and expression control, plus masked cross-view supervision for robust learning. Extensive experiments show DGTalker outperforms state-of-the-art methods in visual quality, motion accuracy, and controllability.
AvatarArtist: Open-Domain 4D Avatarization
Hongyu Liu, Xuan Wang$^\dagger$, Ziyu Wan, Yue Ma, Jingye Chen, Yanbo Fan, Yujun Shen, Yibing Song, Qifeng Chen$^\dagger$
Project |
- This work focuses on open-domain 4D avatarization, with the purpose of creating a 4D avatar from a portrait image in an arbitrary style. Extensive experiments suggest that our model, termed AvatarArtist, is capable of producing high-quality 4D avatars with strong robustness to various source image domains.

HERA: Hybrid Explicit Representation for Ultra-Realistic Head Avatars
Hongrui Cai$^\star$, Yuting Xiao$^\star$, Xuan Wang$^\dagger$, Jiafei Li, Yudong Guo, Yanbo Fan, Shenghua Gao, Juyong Zhang$^\dagger$
- We present a hybrid explicit representation to combine the strengths of different geometric primitives, which adaptively models rich texture on smooth surfaces as well as complex geometric structures simultaneously.
- To avoid artifacts created by facet-crossing Gaussian splats, we design a stable depth sorting strategy based on the rasterization results of the mesh and 3DGS.
- We incorporate the proposed hybrid explicit representation into modeling 3D head avatars, which render more fidelity images in real time.

3D Gaussian Head Avatars with Expressive Dynamic Appearances by Compact Tensorial Representations
Yating Wang, Xuan Wang, Ran Yi, Yanbo Fan, Jichen Hu, Jingcheng Zhu, Lizhuang Ma
Project |
- we propose a novel 3D head avatar modeling method that takes into account both dy064 namic texture modeling and spatiotemporal efficiency.

DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations
Ziqiao Peng, Yanbo Fan, Haoyu Wu, Xuan Wang, Hongyan Liu, Jun He, Zhaoxin Fan
Project |
- we introduce DualTalk, a novel unified framework that integrates the dy012 namic behaviors of speakers and listeners to simulate realistic and coherent dialogue interactions.

Diffusion-based Realistic Listening Head Generation via Hybrid Motion Modeling
Yinuo Wang, Yanbo Fan, Xuan Wang, Yu Guo, Fei Wang
- In this work, we propose a novel listening head generation framework that enables both highly expressive head motions and photorealistic rendering.