MOSS: Motion-based 3D Clothed Human Synthesis from Monocular Video

Hongsheng Wang1,2, Xiang Cai2, Xi Sun2, Jinhong Yue2, Zhanyun Tang2,
Shengyu Zhang†1, Feng Lin2 and Fei Wu1


1 Zhejiang University, China      2 Zhejiang Lab, China
Teaser Image

MOSS reconstructs 3D clothed humans with detailed joints and fine clothing folds. The right image demonstrates that MOSS surpasses the visual quality of previous works on MonoCap. (LPIPS* = LPIPS × 1000). Larger circles denote higher FPS.

Abstract

Single-view clothed human reconstruction holds a central position in virtual reality applications, especially in contexts involving intricate human motions. It presents notable challenges in achieving realistic clothing deformation. Current methodologies often overlook the influence of motion on surface deformation, resulting in surfaces lacking the constraints imposed by global motion. To overcome these limitations, we introduce an innovative framework, Motion-Based 3D Clothed Humans Synthesis (MOSS), which employs kinematic information to achieve motion-aware Gaussian split on the human surface. Our framework consists of two modules: Kinematic Gaussian Locating Splatting (KGAS) and Surface Deformation Detector (UID). KGAS incorporates matrix-Fisher distribution to propagate global motion across the body surface. The density and rotation factors of this distribution explicitly control the Gaussians, thereby enhancing the realism of the reconstructed surface. Additionally, to address local occlusions in single-view, based on KGAS, UID identifies significant surfaces, and geometric reconstruction is performed to compensate for these deformations. Experimental results demonstrate that MOSS achieves state-of-the-art visual quality in 3D clothed human synthesis from monocular videos. Notably, we improve the Human NeRF and the Gaussian Splatting by 33.94% and 16.75% in LPIPS* respectively.

Pipeline

Pipeline Image

MOSS framework. MOSS rotates and zooms the Gaussians with Fisher. The T-pose is converted to the target pose and the surface folds are refined.

Comparison with SOTA

Pipeline Image

Sequences

GT
Ours
GauHuman