Question

In: Other

The transition model

Consider the 3 × 3 world shown in Figure 17.14(a). The transition model is the same as in the 4 × 3 Figure 17.1: 80% of the time the agent goes in the direction it selects; the rest of the time it moves at right angles to the intended direction.

Implement value iteration for this world for each value of r below. Use discounted rewards with a discount factor of 0.99. Show the policy obtained in each case. Explain intuitively why the value of r leads to each policy.

a. r = 100

b. r = ˆ’3

c. r = 0

d. r = +3

 

Figure 17.1

+1 0.8 0.1 0.1 2 START 2 3 (a) (b)

Solutions

Expert Solution

a. 

r = 100.

 

See the comments for part d. This should have been r = −100 to illustrate an alternative behavior:

 

Here, the agent tries to reach the goal quickly, subject to attempting to avoid the square (1, 3) as much as possible. Note that the agent will choose to move Down in square (1, 2) in order to actively avoid the possibility of “accidentally” moving into the square (1, 3) if it tried to move Right instead, since the penalty for moving into square (1, 3) is so great.

 

b. 

r = −3.

 

Here, the agent again tries to reach the goal as fast as possible while attempting to avoid the square (1, 3), but the penalty for square (1, 3) is not so great that the agent will try to actively avoid it at all costs. Thus, the agent will choose to move Right in square (1, 2) in order to try to get closer to the goal even if it occasionally will result in a transition to square (1, 3).

c. 

r = 0

 

Here, the agent again tries to reach the goal as fast as possible, but will try to do so via a path that includes square (1, 3) if possible. This results from the fact that square (1, 3) does not incur the reward of −1 in all other non-goal states, so it reaching the goal via a path through that square can potentially have slightly greater reward than another path of equal length that does not pass through (1, 3).

 

d. 

r = 3.


r = 100.

Related Solutions

The model, particle in box can be used to estimate the energy of spectral transition in...
The model, particle in box can be used to estimate the energy of spectral transition in the molecules. (Justify the statement)
Chemical reactions are often described using a three state model: Reactants→Transition State→ProductsReactants→Transition State→Products In most cases...
Chemical reactions are often described using a three state model: Reactants→Transition State→ProductsReactants→Transition State→Products In most cases the energy of the transition state is much higher than the energy of the reactant state. This means that the reaction cannot proceed until there is a random thermal fluctuation large enough to `kick' the reactant molecule(s) up to the transition state energy. Say we have a reaction in which the transition state is 9.0×10−20 J above the reactant state. a.) Calculate the ratio...
Discuss how ”Kurt lewin's” and “Bridges’ model” of transition might be used by a leader and...
Discuss how ”Kurt lewin's” and “Bridges’ model” of transition might be used by a leader and what role he might play as a change agent while people are transiting from one stage to another. ( 400 words)
Use the AD-SRAS-LRAS model and diagram of chapter 10 to explain the economy’s likely transition to...
Use the AD-SRAS-LRAS model and diagram of chapter 10 to explain the economy’s likely transition to a major stock market decline that reduces the wealth of U.S. consumers. Show both long run and short run outcomes for the case where the AS is upward sloping and the case where AS is horizontal
Transition dynamics, more: Explain the following statements: a) The Solow growth model tells us that poorer...
Transition dynamics, more: Explain the following statements: a) The Solow growth model tells us that poorer countries ought to grow faster than richer countries? b) If two countries have the same k, the country with the higher investment rate grows faster than the country with a lower investment rate. c) Statement a) is true if ?̅, ?̅, ??? ?̅ are the same for all countries. It may not be otherwise. d) The Solow growth model implies convergence in GDP per...
What is the demographic transition? What are the four stages of the demographic transition?
What is the demographic transition? What are the four stages of the demographic transition?
Calculate the energy difference for a transition in the Paschen series for a transition from the...
Calculate the energy difference for a transition in the Paschen series for a transition from the higher energy shell n=6. Express your answer to four significant figures and include the appropriate units
Calculate the energy difference for a transition in the Paschen series for a transition from the...
Calculate the energy difference for a transition in the Paschen series for a transition from the higher energy shell n=5. Express your answer to four significant figures and include the appropriate units.
How does a fee-for-service system transition to a value-based payment model? Offer specific, descriptive detail to...
How does a fee-for-service system transition to a value-based payment model? Offer specific, descriptive detail to support your responce.
define the glass transition temperature of a polymer , explain the difference in the glass transition...
define the glass transition temperature of a polymer , explain the difference in the glass transition temperature values between polystyrene (Tg = 100degrees celcius) and polyethylene (Tg = -20 degrees celcius)
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT