SMPLOlympics: Sports Environments for Physically Simulated Humanoids

Abstract

We present SMPLOlympics, a collection of physically simulated environments that allow humanoids to compete in a variety of Olympic sports. Sports simulation offers a rich and standardized testing ground for evaluating and improving the capabilities of learning algorithms due to the diversity and physically demanding nature of athletic activities. As humans have been competing in these sports for many years, there is also a plethora of existing knowledge on the preferred strategy to achieve better performance. To leverage these existing human demonstrations from videos and motion capture, we design our humanoid to be compatible with the widely-used SMPL and SMPL-X human models from the vision and graphics community. We provide a suite of individual sports environments, including golf, javelin throw, high jump, long jump, and hurdling, as well as competitive sports, including both 1v1 and 2v2 games such as table tennis, tennis, fencing, boxing, soccer, and basketball. Our analysis shows that combining strong motion priors with simple rewards can result in human-like behavior in various sports. By providing a unified sports benchmark and baseline implementation of state and reward designs, we hope that SMPLOlympics can help the control and animation communities achieve human-like behaviors.

Sports Environments

SMPL + SMPLX
Unitree H1 + G1

Data from Videos

Ablations on Using Motion Imitation for Physically Plausible Refinement

Algorithms Comparisons

High Jump
Long Jump
Hurdling
Javelin
Golf
Tennis
Table Tennis
Free Throw
Penalty Kick

Sports Environments

In this section, we provide a collage of the policies trained using our sports environments and the preliminary reward designs. Fencing and boxing results use our competitive self-play.

SMPL + SMPLX


Table Tennis	Tennis	Boxing


Fencing	Penalty Kick	Free Throw


Soccer 1v1	Soccer 2v2


Javelin	Golf


High Jump	Long Jump	Hurdle

Unitree H1 + G1

All of the above sports environments support both the SMPL humanoids as well as the real-world humanoid robots (Unitree H1 and G1). Here we provide some samples of motion imitation for H1 and G1 (using retargeted motion from AMASS); we also provide some results on humanoid sports. Notice that for H1 and G1 simulation, we use the same simulation parameters (200 Hz simulation, 50 Hz control, joint limit, torque limit, weight, etc.) as the previous sim-to-real efforts, without adding any domain randomization.


H1 Humanoid Motion Retargeting & Imitation	G1 Humanoid Motion Retargeting & Imitation


H1 Humanoid (Boxing: PULSE)	H1 Humanoid (High Jump + Hurdling: PPO)	G1 Humanoid (Hurdling: PPO)	G1 Humanoid (Penalty Kick: PPO)

Data from Videos

Our SMPL based humanoid enables us to directly use poses estimated from videos as human demonstration data. Here, we provide sample visualizations of the motion data extracted from videos using our pose estimation then simulation refinement pipeline. We can see that our extracted motion is physically plausible and describes a unique style of motion for that sport.


Soccer	Tennis


Golf	Boxing

Ablations on Using Motion Imitation for Physically Plausible Refinement

In this section, we ablate the importance of using a motion imitator (PHC) for pose refinement when acquiring data from videos. Here, we test two sequences of motion demonstration, one with refinement and one without. We train the PULSE+AMP model using these two sequences as prior and show that the quality of the demonstration data is important. Our refinement step leads to a better motion prior.


W/o Refinement	W/ Refinement

Algorithms Comparisons

In this section, we provide visual comparisons of state-of-the-art humanoid control methods (PPO-only/AMP/PULSE). For sports that have accompanied human demonstration data from videos, we also provide PULSE+AMP as a baseline.

High Jump

For high jump, we can see that using PPO without any motion prior will yield an inhuman jumping motion. AMP, due to the task difficulty and no specific high jump motion in AMASS, decides to ignore the task reward, only optimize the discriminator reward, and stand still. Using motor skills learned from AMASS, surprisingly, PULSE can discover the Fosbury way of high jump.


PPO	AMP	PULSE

Long Jump

For long jump, we can see that using PPO without any motion prior will lead to inhuman motion. AMP, due to the task difficulty and no specific long jump motion in AMASS, decides to ignore the task reward, only optimize the discriminator reward, and stand still. Using motor skills learned from AMASS, PULSE can long jump with human-like motion.


PPO	AMP	PULSE

Hurdling

For hurdling, we can see that using PPO without any motion prior will yield inhuman motion. AMP, due to the task difficulty and no specific hurdling motion in AMASS, decides to ignore the task reward, only optimize the discriminator reward, and stand still. Using motor skills learned from AMASS, PULSE can cross hurdles with human-like motion.


PPO	AMP	PULSE

Javelin

For Javelin throw, we can see that using PPO without any motion prior will yield inhuman motion. AMP, even though with the throwing motion from videos, prioritizes discriminator reward (with the swinging hand motion) and fails to throw. Using motor skills learned from AMASS, PULSE can throw using human-like motion and even learns to jump to gain momentum.


PPO	AMP	PULSE

Golf

For golfing, PPO without any motion prior will yield inhuman motion (kicking the ball with the pelvis block). AMP, even though with the motion from video, optimizes only the task reward and ignores the discriminator reward (another possible failure mode). PULSE can kick the golf ball using human-like motion, but PULSE + AMP uses more golf-like motion due to the style guidance from human demonstration.


PPO	AMP	PULSE	PULSE+AMP

Tennis

For tennis, PPO without any motion prior will yield an inhuman swinging motion. AMP will use tennis-like motion when not hitting the ball, but when trying to hit the ball, inhuman behavior surfaces. This is another symptom of the disagreement between task and discriminator reward. PULSE and PULSE+AMP can hit the ball using human-like motion.


PPO	AMP	PULSE	PULSE+AMP

Table Tennis

For table tennis, PPO without any motion prior results in an inhuman swinging motion. AMP uses ping-pong-like motion when not hitting the ball, but when trying to hit the ball, uses inhuman behavior. PULSE can hit the ball using human-like motion, but PULSE + AMP uses more table-tennis-like motion due to the style guidance from human demonstration.


PPO	AMP	PULSE	PULSE+AMP

Free Throw

For free throw, PPO without any motion prior and AMP both fail to learn proper free throw motion. PULSE and PULSE + AMP can both achieve a high free throw success rate using human-like motion.


PPO	AMP	PULSE	PULSE+AMP

Penalty Kick

For penalty kicks, PPO without any motion prior and AMP both fail to learn proper free throw motion. This is due to the difficulty of learning human-object interaction from scratch. The reward design also plays a role where PPO is exploiting the player-to-ball reward instead of learning the kicking motion. PULSE and PULSE + AMP can both learn to push the ball. However, PULSE learns to kick with a human-like motion, but PULSE+AMP suffers from the conflict between style and task reward.


PPO	AMP	PULSE	PULSE+AMP