This is A2C using Recurrent Network (GRU) instead of the plain Multi Layer Perceptron (MLP) / feedforward.