SMPLOlympics: Sports Environments for Physically Simulated Humanoids



Abstract

We present SMPLOlympics, a collection of physically simulated environments that allow humanoids to compete in a variety of Olympic sports. Sports simulation offers a rich and standardized testing ground for evaluating and improving the capabilities of learning algorithms due to the diversity and physically demanding nature of athletic activities. As humans have been competing in these sports for many years, there is also a plethora of existing knowledge on the preferred strategy to achieve better performance. To leverage human demonstrations from videos and motion capture, we design our humanoid to be compatible with the widely-used SMPL and SMPL-X human models from the vision and graphics community. We provide a suite of individual sports environments, including golf, javelin throw, high jump, long jump, and hurdling, as well as competitive sports, including both 1v1 or 2v2 games such as ping pong, tennis, fencing, boxing, soccer, and basketball. Our analysis based on these diverse tasks shows that combining strong motion priors with simple reward engineering can result in human-like behavior on various sports tasks. By providing a unified sports benchmark and baseline implementation of state and reward designs, we hope it can help both the control and animation communities achieve performant and human-like behaviors.



  1. Sports Environments
  2. Data from Videos
  3. Baseline Comparisons


Sports Environments

In this section, we provide a collage of the policies trained using our sports environments and the preliminary reward designs. Fencing and boxing results use our competitive self-play.

Table Tennis
Tennis
Boxing
Fencing
Penalty Kick
Free Throw
Soccer 1v1
Soccer 2v2
Javelin
Golf
High Jump
Long Jump
Hurdle

Data from Videos

Our SMPL based humanoid enables us to directly use poses estimated from videos as human demonstration data. Here, we provide sample visualizations of the motion data extracted from videos using our pose estimation then simulation refinement pipeline. We can see that our extracted motion is physically plausible and describes a unique style of motion for that sport.

Soccer
Tennis
Golf
Boxing

Ablations on Using Motion Imitation for Physically Plausible Refinement

In this section, we ablate the importance of using a motion imitator (PHC) for pose refinement when acquiring data from videos. Here, we test two sequences of motion demonstration, one with refinement and one without. We train the PULSE+AMP model using these two sequences as prior and show that the quality of the demonstration data is important. Our refinement step leads to a better motion prior.

W/o Refinement
W/ Refinement

Baseline Comparisons

In this section, we provide visual comparisons of baseline methods (PPO-only/AMP/PULSE). For sports that have accompanied human demonstration data from videos, we also provide PULSE+AMP as a baseline.

High Jump

For high jump, we can see that using PPO without any motion prior will yield an inhuman jumping motion. AMP, due to the task difficulty and no specific high jump motion in AMASS, decides to ignore the task reward, only optimize the discriminator reward, and stand still. Using motor skills learned from AMASS, surprisingly, PULSE can discover the Fosbury way of high jump.

PPO
AMP
PULSE

Long Jump

For long jump, we can see that using PPO without any motion prior will lead to inhuman motion. AMP, due to the task difficulty and no specific long jump motion in AMASS, decides to ignore the task reward, only optimize the discriminator reward, and stand still. Using motor skills learned from AMASS, PULSE can long jump with human-like motion.

PPO
AMP
PULSE

Hurdling

For hurdling, we can see that using PPO without any motion prior will yield inhuman motion. AMP, due to the task difficulty and no specific hurdling motion in AMASS, decides to ignore the task reward, only optimize the discriminator reward, and stand still. Using motor skills learned from AMASS, PULSE can cross hurdles with human-like motion.

PPO
AMP
PULSE

Javelin

For Javelin throw, we can see that using PPO without any motion prior will yield inhuman motion. AMP, even though with the throwing motion from videos, prioritizes discriminator reward (with the swinging hand motion) and fails to throw. Using motor skills learned from AMASS, PULSE can throw using human-like motion and even learns to jump to gain momentum.

PPO
AMP
PULSE

Golf

For golfing, PPO without any motion prior will yield inhuman motion (kicking the ball with the pelvis block). AMP, even though with the motion from video, optimizes only the task reward and ignores the discriminator reward (another possible failure mode). PULSE can kick the golf ball using human-like motion, but PULSE + AMP uses more golf-like motion due to the style guidance from human demonstration.

PPO
AMP
PULSE
PULSE+AMP

Tennis

For tennis, PPO without any motion prior will yield an inhuman swinging motion. AMP will use tennis-like motion when not hitting the ball, but when trying to hit the ball, inhuman behavior surfaces. This is another symptom of the disagreement between task and discriminator reward. PULSE and PULSE+AMP can hit the ball using human-like motion.

PPO
AMP
PULSE
PULSE+AMP

Table Tennis

For table tennis, PPO without any motion prior results in an inhuman swinging motion. AMP uses ping-pong-like motion when not hitting the ball, but when trying to hit the ball, uses inhuman behavior. PULSE can hit the ball using human-like motion, but PULSE + AMP uses more table-tennis-like motion due to the style guidance from human demonstration.

PPO
AMP
PULSE
PULSE+AMP

Free Throw

For free throw, PPO without any motion prior and AMP both fail to learn proper free throw motion. PULSE and PULSE + AMP can both achieve a high free throw success rate using human-like motion.

PPO
AMP
PULSE
PULSE+AMP

Penalty Kick

For penalty kicks, PPO without any motion prior and AMP both fail to learn proper free throw motion. This is due to the difficulty of learning human-object interaction from scratch. The reward design also plays a role where PPO is exploiting the player-to-ball reward instead of learning the kicking motion. PULSE and PULSE + AMP can both learn to push the ball. However, PULSE learns to kick with a human-like motion, but PULSE+AMP suffers from the conflict between style and task reward.

PPO
AMP
PULSE
PULSE+AMP