Unity Crawler Environment using Proximal Policy Optimization (PPO)

bg-shape illustrations illustrations illustrations illustrations illustrations
Unity Crawler Environment using Proximal Policy Optimization (PPO)

Date

May 12, 2019

Contributor

Ashutosh Tiwari

Categories

R l

Project Heads-up

The Crawler environment. A creature with 4 arms and 4 forearms.
Agent Reward Function (independent):

  • +0.03 times body velocity in the goal direction.
  • +0.01 times body direction alignment with goal direction.

Goal : The agents must move its body toward the goal direction without falling.

CrawlerStaticTarget - Goal direction is always forward. CrawlerDynamicTarget- Goal direction is randomized.

The Observation space consists of 117 variables corresponding to position, rotation, velocity, and angular velocities of each limb plus the acceleration and angular acceleration of the body. Vector Action space: (Continuous) Size of 20, corresponding to target rotations for joints.

The version of environment in this project contains 12 identical agents, each with its own copy of the environment.

Note : For details of PPO please see the summary of the PPO paper here

Click 👉 for Project Details