Question

In: Other

threshold is crossed

For the environment shown in Figure 17.1, find all the threshold values for R(s) such that the optimal policy changes when the threshold is crossed. You will need a way to calculate the optimal policy and its value for fixed R(s).

 

Figure 17.1

0,8 0.1 0.1 START 3 (a) (b) 2. 2.

Solutions

Expert Solution

  • For any two adjacent policies that differ, run binary search on the step cost to pinpoint the threshold value.
  • Convince yourself that you haven’t missed any policies, either by using too coarse an increment in step size (0.02), or by stopping too soon (1.0).

 

One useful observation in this context is that the expected total reward of any fixed policy is linear in r, the per-step reward for the empty states. Imagine drawing the total reward of a policy as a function of r x a straight line. Now draw all the straight lines corresponding to all possible policies. The reward of the optimal policy as a function of r is just the max of all these straight lines. Therefore it is a piece wise linear, convex function of r. 

Hence there is a very efficient way to find all the optimal policy regions:

  • For any two consecutive values of r that have different optimal policies, find the optimal policy for the midpoint. Once two consecutive values of r give the same policy, then the interval between the two points must be covered by that policy.
  • Repeat this until two points are known for each distinct optimal policy.
  • Suppose (ra1, va1) and (ra2, va2) are points for policy a, and (rb1, vb1) and (rb2, vb2) are the next two points, for policy b. Clearly, we can draw straight lines through these pairs of points and find their intersection. This does not mean, however, that there is no other optimal policy for the intervening region. We can determine this by calculating the optimal policy for the intersection point. If we get a different policy, we continue the process.

 

The policies and boundaries derived from this procedure are shown in Figure S17.1. The figure shows that there are nine distinct optimal policies! Notice that as r becomes more negative, the agent becomes more willing to dive straight into the –1 terminal state rather than face the cost of the detour to the +1 state.

 

The somewhat ugly code is as follows. That because the lines for neighboring policies are very nearly parallel, numerical instability is a serious problem.

 

Figure S17.1

 


 


  • For any two adjacent policies that differ, run binary search on the step cost to pinpoint the threshold value.
  • Convince yourself that you haven’t missed any policies, either by using too coarse an increment in step size (0.02), or by stopping too soon (1.0).

Related Solutions

describe the difference between the principles of transduction, absolute threshold and difference threshold
describe the difference between the principles of transduction, absolute threshold and difference threshold
(Programming: Counting threshold inversions) You’ll be given an array (of integers) and a threshold value as...
(Programming: Counting threshold inversions) You’ll be given an array (of integers) and a threshold value as input, write a program to return the number of threshold inversions in the array. An inversion between indices i < j is a threshold inversion if ai > t ∗ aj , where t is the threshold value given as input.
What are some threshold concepts in Electricity and Magnetism Physics? Use the characteristics of threshold concepts...
What are some threshold concepts in Electricity and Magnetism Physics? Use the characteristics of threshold concepts to support your reasoning.
Design a version of QuickSort that uses a threshold. Do empirical testing to determine a good threshold value.
The following submission rules apply:·    For those questions requiring programs, the solutions must be implemented using JavaScript or Java.o Appropriate self-documenting comments in the source code are mandatory, consistent with good programming practices.o Solutions must be provided in plain text so that formatting is not lost.·    All answers must be provided in this document.·    Sources must be given accurate and complete citations sufficient for the instructor to find and confirm them.Design a version of QuickSort that uses a threshold. Do empirical...
Describe the difference between threshold and non-threshold dose response curves. Give one example of a disease...
Describe the difference between threshold and non-threshold dose response curves. Give one example of a disease or condition that fall into each category. (2 points)
drop reservation by banks below threshold requirements
describe four potential possible ways that a bank could do when its required reserve drops below the requirement.
Define and describe depolarization, repolarization, hyperpolarization, and threshold.
Define and describe depolarization, repolarization, hyperpolarization, and threshold.
Define and describe depolarization, repolarization, hyperpolarization, and threshold.
Define and describe depolarization, repolarization, hyperpolarization, and threshold.
A. A plant of genotype CCdd is crossed to ccDD. An F1 of that cross is...
A. A plant of genotype CCdd is crossed to ccDD. An F1 of that cross is testcrossed to ccdd. If the genes are unlinked, the percentage of C_D_ progeny of that cross will be _____% B. A plant of genotype CCdd is crossed to ccDD. An F1 of that cross is testcrossed to ccdd. If the genes are so tightly linked that no recombination occurs between them, the percentage of C_D_ progeny of that cross will be _____%
According to a theory in​ genetics, if tall and colorful plants are crossed with short and...
According to a theory in​ genetics, if tall and colorful plants are crossed with short and colorless​ plants, four types of plants will​ result: tall and​ colorful, tall and​ colorless, short and​ colorful, and short and​ colorless, with corresponding probabilities: 0.37, 0.42, 0.11 and 0.10. ten plants are selected. find the probability that 5 will be tall and colorful, 2 will be tall and colorless, 2 will be short and colorful, and 1 will be short and colorless.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT