Policy gradient method

Policy gradient methods are a class of reinforcement learning algorithms.

Policy gradient methods are a sub-class of policy optimization methods. Unlike value-based methods which learn a value function to derive a policy, policy optimization methods directly learn a policy function $\pi$ that selects actions without consulting a value function. For policy gradient to apply, the policy function $\pi _{\theta }$ is parameterized by a differentiable parameter $\theta$ .