Policy gradient method

Policy gradient methods are a class of reinforcement learning algorithms.

Policy gradient methods are a sub-class of policy optimization methods. Unlike value-based methods which learn a value function to derive a policy, policy optimization methods directly learn a policy function that selects actions without consulting a value function. For policy gradient to apply, the policy function is parameterized by a differentiable parameter .


© MMXXIII Rich X Search. We shall prevail. All rights reserved. Rich X Search