In: Computer Science
Explain what happens in reinforcement learning if the agent
always chooses the action that maximizes
the Q-value. Suggest two ways to force the agent to explore.
In this case the agent will get stuck in non-optimal policies as the agent will not explore enough to find the best possible action from each of the state as the agent will always choose the action to maximize Q.
The two ways by which agent we can force the agent to explore is :
1) Set the initial values high. If the initial values are high the unexplored region will look good.
2) Make it pick random values occasionally so that it starts exploring.