Project Heads-up
REINFORCE algorithm is based on finding the local maximum of a function using a procedure known as gradient ascent.This class implements the simple Convolution Neuron Network (CNN) model containing only 2 fully-connected levels. In this CNN model, the function reinforce() approximizes the return value (= sum of all rewards with discounts). The environment is solved in 791 episodes!