Question

In: Computer Science

Is there a systematic way to determine which action value-based learning method (Q-learning and SARSA) is...

Is there a systematic way to determine which action value-based learning method (Q-learning and SARSA) is a better choice and can achieve better results? Explain.

Solutions

Expert Solution

SARSA and Q- learning, both are reinforcement learning algorithms that work in same way. The major difference is that SARSA is on policy while Q-learning is off policy.

The update rules for SARSA and Q-learning are given in the image below,

Actually in both SARSA and Q-learning, we take the actual single generated action next. Here we can note the difference that, in Q-learning, we update the estimate from the maximum available estimate of possible next actions, regradless of which action we took. While in SARSA, we update the estimates based on actual action.

There is a way for us to determine which action value based learning method ( Q-learning and SARSA ) is a better choice and can achieve better results, for that we can compare Q-learning with SARSA, and finally we can determine which is better choice and achieve better results based on diffrent parameters.

1. When we consider Q-learning, it directly learns the optimal policy, while SARSA  learns a near optimal policy while exploring. For learning an optimal policy using SARSA, we need to decide on a method to decay in -greedy action choice.

2. If there is a large negative reward close to the optimal path, Q-learning will tend to trigger that reward while exploring, while SARSA will avoid a dangerous optimal path and only slowly learn to use it when the exploration parameters are reduced.

3. Q Learning has higher per-sample variance than SARSA, and may suffer from problems converging as a result. When we train a neural network via Q-learning, this will be a problem.

  • If our goal is to train an optimal agent in simulation, or in a low cost and fast iterating environment, then Q Learning is a good choice due to the first point given above ( in Q Learning, it directly learns the optimal policy ).
  • If our agent learns online, and we care about rewards earned while learning, then SARSA may be a better choice.


Related Solutions

A company that makes language learning software wants to determine which of two approaches (Method A...
A company that makes language learning software wants to determine which of two approaches (Method A or Method B) to learning vocabulary would lead to the largest number of recalled words. The company wishes to evaluate the methods on 7 different languages (since languages differ in difficulty). Seven individuals, one per language, were recruited to learn words using Method A, and 7 individuals, one per language, were recruited to learn words using Method B. After one month, each person completed...
A company that makes language learning software wants to determine which of two approaches (Method A...
A company that makes language learning software wants to determine which of two approaches (Method A or Method B) to learning vocabulary would lead to the largest number of recalled words. The company wishes to evaluate the methods on 7 different languages (since languages differ in difficulty). Seven individuals, one per language, were recruited to learn words using Method A, and 7 individuals, one per language, were recruited to learn words using Method B. Suppose the company wishes to test...
A company that makes language learning software wants to determine which of two approaches (Method A...
A company that makes language learning software wants to determine which of two approaches (Method A or Method B) to learning vocabulary would lead to the largest number of recalled words. The company wishes to evaluate the methods on 7 different languages (since languages differ in difficulty). Seven individuals, one per language, were recruited to learn words using Method A, and 7 individuals, one per language, were recruited to learn words using Method B. Suppose the company wishes to test...
A company that makes language learning software wants to determine which of two approaches (Method A...
A company that makes language learning software wants to determine which of two approaches (Method A or Method B) to learning vocabulary would lead to the largest number of recalled words. The company wishes to evaluate the methods on 7 different languages (since languages differ in difficulty). Seven individuals, one per language, were recruited to learn words using Method A, and 7 individuals, one per language, were recruited to learn words using Method B. After one month, each person completed...
A company that makes language learning software wants to determine which of two approaches (Method A...
A company that makes language learning software wants to determine which of two approaches (Method A or Method B) to learning vocabulary would lead to the largest number of recalled words. The company wishes to evaluate the methods on 7 different languages (since languages differ in difficulty). Seven individuals, one per language, were recruited to learn words using Method A, and 7 individuals, one per language, were recruited to learn words using Method B. After one month, each person completed...
A company that makes language learning software wants to determine which of two approaches (Method A...
A company that makes language learning software wants to determine which of two approaches (Method A or Method B) to learning vocabulary would lead to the largest number of recalled words. The company wishes to evaluate the methods on 7 different languages (since languages differ in difficulty). Seven individuals, one per language, were recruited to learn words using Method A, and 7 individuals, one per language, were recruited to learn words using Method B. After one month, each person completed...
In Reinforcement Learning, Is it possible for the agent to rely on the state value-based learning...
In Reinforcement Learning, Is it possible for the agent to rely on the state value-based learning approach to achieve its goal?
A company that makes language learning software wants to determine which of two approaches (Method A or Method B) to learning vocabulary would lead to the largest number of recalled words.
A company that makes language learning software wants to determine which of two approaches (Method A or Method B) to learning vocabulary would lead to the largest number of recalled words. The company wishes to evaluate the methods on 7 different languages (since languages differ in difficulty). Seven individuals, one per language, were recruited to learn words using Method A, and 7 individuals, one per language, were recruited to learn words using Method B. After one month, each person completed a...
explain the concept of competency based education. what are the advantage of this method of learning...
explain the concept of competency based education. what are the advantage of this method of learning for student who have to learn a large number of skill?
The nursing process is a systematic continuous and dynamic method of providing care to clients which...
The nursing process is a systematic continuous and dynamic method of providing care to clients which comprises of a series of sequential phases built upon the preceding step. Assessment is an important step of the whole nursing process. Explain how you would perform a focused abdominal assessment a new patient on your ward .
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT