Xuan Wang

Papers in 2025

ICCV 2025 (Oral)

Chen Zhao, Xuan Wang, Tong Zhang, Saqib Javed, Mathieu Salzmann

Project |

We observe that 3D Gaussian Splatting (3DGS) excels in novel view synthesis (NVS) but overfits with sparse views, so we propose Self-Ensembling Gaussian Splatting (SE-GS) using an uncertainty-aware perturbation strategy to train a main model alongside perturbed models. We minimize discrepancies between these models to form a robust ensemble for novel-view generation.

ICCV 2025

Yuan Sun, Xuan Wang, Cong Wang, WeiLi Zhang, Yanbo Fan, Yu Guo, Fei Wang

3D Gaussian-based head avatar modeling performs well with enough data, but prior-based methods lack rendering quality due to limited identity-shared representation power. We solve this via joint reconstruction and registration of prior-based and prior-free 3D Gaussians, merging and post-processing for complete avatars, with experiments showing better quality and high-resolution support.

ICCV 2025

Yudong Jin, Sida Peng, Xuan Wang, Tao Xie, Zhen Xu, Yifan Yang, Yujun Shen, Hujun Bao, Xiaowei Zhou

Project |

This paper addresses high-fidelity novel-view synthesis from sparse-view human videos. While 4D diffusion models tackle limited observations, they lack spatio-temporal consistency, so we propose a sliding iterative denoising process on a latent grid (encoding image, camera and human poses) via alternating spatial-temporal denoising, enabling sufficient information flow and affordable GPU memory.

ICCV 2025

Xiaoxi Liang, Yanbo Fan, Qiya Yang, Xuan Wang, Wei Gao, Ge Li

In this work, we investigate generating high-fidelity, audio-driven 3D Gaussian talking heads from monocular videos, presenting DGTalker—a real-time, high-fidelity, 3D-aware framework that uses Gaussian generative priors and latent space navigation to alleviate 3D information lack and overfitting issues. We propose a disentangled latent space navigation framework for precise lip and expression control, plus masked cross-view supervision for robust learning. Extensive experiments show DGTalker outperforms state-of-the-art methods in visual quality, motion accuracy, and controllability.

CVPR 2025

Hongyu Liu, Xuan Wang$^\dagger$, Ziyu Wan, Yue Ma, Jingye Chen, Yanbo Fan, Yujun Shen, Yibing Song, Qifeng Chen$^\dagger$

Project |

This work focuses on open-domain 4D avatarization, with the purpose of creating a 4D avatar from a portrait image in an arbitrary style. Extensive experiments suggest that our model, termed AvatarArtist, is capable of producing high-quality 4D avatars with strong robustness to various source image domains.

CVPR 2025

Hongrui Cai$^\star$, Yuting Xiao$^\star$, Xuan Wang$^\dagger$, Jiafei Li, Yudong Guo, Yanbo Fan, Shenghua Gao, Juyong Zhang$^\dagger$

We present a hybrid explicit representation to combine the strengths of different geometric primitives, which adaptively models rich texture on smooth surfaces as well as complex geometric structures simultaneously.
To avoid artifacts created by facet-crossing Gaussian splats, we design a stable depth sorting strategy based on the rasterization results of the mesh and 3DGS.
We incorporate the proposed hybrid explicit representation into modeling 3D head avatars, which render more fidelity images in real time.

CVPR 2025

Yating Wang, Xuan Wang, Ran Yi, Yanbo Fan, Jichen Hu, Jingcheng Zhu, Lizhuang Ma

Project |

we propose a novel 3D head avatar modeling method that takes into account both dy064 namic texture modeling and spatiotemporal efficiency.

CVPR 2025

Ziqiao Peng, Yanbo Fan, Haoyu Wu, Xuan Wang, Hongyan Liu, Jun He, Zhaoxin Fan

Project |

we introduce DualTalk, a novel unified framework that integrates the dy012 namic behaviors of speakers and listeners to simulate realistic and coherent dialogue interactions.

CVPR 2025 (Highlight)

Yinuo Wang, Yanbo Fan, Xuan Wang, Yu Guo, Fei Wang

In this work, we propose a novel listening head generation framework that enables both highly expressive head motions and photorealistic rendering.