1The University of Texas at Austin 2Zhejiang University
(1) We present HSMR (Human Skeleton and Mesh Recovery), the first end-to-end approach to recover SKEL parameters from single image.
(2) We show how to create a dataset with pseudo ground truth to train a model for other human body models.
(3) We demonstrate that HSMR shows robustness in extreme poses and viewpoints, providing biomechanically accurate human pose estimation, while still matches the performance of the most closely related state-of-the-art method that regresses SMPL parameters.
(4) We reveal the limitations of previous methods regressing SMPL parameters, and show how they tend to predict unnatural rotations for the body joints, leading to biomechanically inaccurate results.
In this paper, we introduce a method for reconstructing humans in 3D from a single image using a biomechanically accurate skeleton model.
To achieve this, we train a transformer that takes an image as input and estimates the parameters of the model.
Due to the lack of training data for this task, we build a pipeline to generate pseudo ground truth data and implement a training procedure that iteratively refines these pseudo labels for improved accuracy.
Compared to state-of-the-art methods in 3D human pose estimation, our model achieves competitive performance on standard benchmarks, while it significantly outperforms them in settings with extreme 3D poses and viewpoints.
This result highlights the benefits of using a biomechanical skeleton with realistic degrees of freedom for robust pose estimation.
Additionally, we show that previous models frequently violate joint angle limits, leading to unnatural rotations.
In contrast, our approach leverages the biomechanically plausible degrees of freedom leading to more realistic joint rotation estimates.
We validate our approach across multiple human pose estimation benchmarks.
Overview of our HSMR approach. A key design choice of HSMR is the adoption of the SKEL parametric body model which is designed using a biomechanically accurate skeletal model. We use a transformer-based architecture that takes as input a single image of a person and estimates the pose q and shape parameters β of the SKEL model. During training, we iteratively update the pseudo ground truth we use to supervise our model, aiming to improve its quality. For this, we optimize the HSMR estimate to align with the ground truth 2D keypoints (SKELify). The optimization output parameters are used in future training iterations as supervision target.
@inproceedings{xia2025hsmr,
title={Reconstructing Humans with a Biomechanically Accurate Skeleton},
author={Xia, Yan and Zhou, Xiaowei and Vouga, Etienne and Huang, Qixing and Pavlakos, Georgios},
booktitle={CVPR},
year={2025},
}