Understanding PPO and Implementations in Pytorch