Unity Crawler Environment using Proximal Policy Optimization (PPO)

Project Heads-up

The Crawler environment. A creature with 4 arms and 4 forearms.
Agent Reward Function (independent):

+0.03 times body velocity in the goal direction.
+0.01 times body direction alignment with goal direction.

Goal : The agents must move its body toward the goal direction without falling.

CrawlerStaticTarget - Goal direction is always forward. CrawlerDynamicTarget- Goal direction is randomized.

The Observation space consists of 117 variables corresponding to position, rotation, velocity, and angular velocities of each limb plus the acceleration and angular acceleration of the body. Vector Action space: (Continuous) Size of 20, corresponding to target rotations for joints.

The version of environment in this project contains 12 identical agents, each with its own copy of the environment.

Note : For details of PPO please see the summary of the PPO paper here

Unity Crawler Environment using Proximal Policy Optimization (PPO)

Date

Contributor

Categories

Project Link

Project Heads-up

Click 👉 for Project Details

Want to get in touch 🤝 ?

Drop a Hi 😃