Part of a series on |
Machine learning and data mining |
---|
This article needs additional citations for verification. (October 2022) |
Proximal policy optimization (PPO) is a reinforcement learning (RL) algorithm for training an intelligent agent. Specifically, it is a policy gradient method, often used for deep RL when the policy network is very large.
© MMXXIII Rich X Search. We shall prevail. All rights reserved. Rich X Search