This is REINFORCE using Recurrent Network (GRU) instead of the plain Multi Layer Perceptron (MLP) / feedforward.