Policy Gradient,TRPO,PPO
Temporal difference learning
Key concepts of reinforcement learning
Gradient Descent and Its Variantst