Part of a series on |
Machine learning and data mining |
---|
Q-learning is a reinforcement learning algorithm that trains an agent to assign values to its possible actions based on its current state, without requiring a model of the environment (model-free). It can handle problems with stochastic transitions and rewards without requiring adaptations.[1]
For example, in a grid maze, an agent learns to reach an exit worth 10 points. At a junction, Q-learning might assign a higher value to moving right than left if right gets to the exit faster, improving this choice by trying both directions over time.
For any finite Markov decision process, Q-learning finds an optimal policy in the sense of maximizing the expected value of the total reward over any and all successive steps, starting from the current state.[2] Q-learning can identify an optimal action-selection policy for any given finite Markov decision process, given infinite exploration time and a partly random policy.[2]
"Q" refers to the function that the algorithm computes: the expected reward—that is, the quality—of an action taken in a given state.[3]
{{cite book}}
: CS1 maint: location missing publisher (link)
© MMXXIII Rich X Search. We shall prevail. All rights reserved. Rich X Search