Question

In: Computer Science

Optimality of A* Assume that we are running A* search on a problem and we find...

Optimality of A* Assume that we are running A* search on a problem and we find path p to a goal. In the lecture and text, we proved mathematically that, if we are using an admissible heuristic, there is no other path that is shorter than p. In other words, if path p' is still on the frontier, and p'' is a path to the goal that extends p', path p will always be less than (or equal to) p''. This is called proving the optimality of A*.

Explain why we cannot prove the optimality of A* if the heuristic is not admissible. Be as specific and detailed as possible, referring to the proof in your answer.

Solutions

Expert Solution

Defination 1: A search heuristic h(n) is an estimate of the cost of the optimal (cheapest) path from node n to a goal node.

· Think of h(n) as only using readily obtainable (easy to compute) information about a node.

· h can be extended to paths: h(〈n0,…,nk〉)=h(nk)

Defination 2: A search heuristic h(n) is admissible if it never overestimates the actual cost of the cheapest path from a node to the goal

• A* search takes into account both

  1. the cost of the path to a node c(p)
  2. the heuristic value of that path h(p).

• Let f(p) = c(p) + h(p).

• estimate of the cost of a path from the start to a goal via p.

A* always chooses the path on the frontier with the lowest estimated distance from the start to a goal node constrained to go via that path.

Admissibility of A*

A* is complete (finds a solution, if one exists) and optimal (finds the optimal path to a goal) if:

• the branching factor is finite

• arc costs are > 0

• h(n) is admissible -> an underestimate of the length of the shortest path from n to a goal node.

• This property of A* is called admissibility of A*

A* admissible: complete

It halts (does not get caught in cycles) because:

• Let fmin be the cost of the optimal solution path s (unknown but finite if there exists a solution)

• Each sub-path p of s has cost f(p) ≤ fmin

• Due to admissibility (exercise: prove this at home)

• Let fmin > 0 be the minimal cost of any arc

• All paths with length > fmin / cmin have cost > fmin

• A* expands path on the frontier with minimal f(n)

• Always a prefix of s on the frontier - Only expands paths p with f(p) ≤ fmin

• Terminates when expanding s

A* admissible: optimal

Let p* be the optimal solution path, with cost c*.

• Let p’ be a suboptimal solution path. That is c(p’) > c*.

Analysis of A*

If fact, we can prove something even stronger about A* (when it is admissible)

• A* is optimally efficient among the algorithms that extend the search path from the initial state.

• It finds the goal with the minimum # of expansions

A* is Optimally Efficient

No other optimal algorithm is guaranteed to expand fewer nodes than A*

• This is because any algorithm that does not expand every node with f(n) < f* risks to miss the optimal solution

Effect of Search Heuristic

A search heuristic that is a better approximation on the actual cost reduces the number of nodes expanded by A*

• Example: 8puzzle – tiles can move anywhere

– (h1 : number of tiles that are out of place)

– tiles can move to any adjacent square

– (h2 : sum of number of squares that separate each tile from its correct position)

• average number of paths expanded: (d = depth of the solution; IDS=iterative depth first, see next lecture)

• d=12 IDS = 3,644,035 paths A* (h1) = 227 paths A* (h2) = 73 paths

• d=24 IDS = too many paths A* (h1) = 39,135 paths A* (h2) = 1,641 paths

A* is complete (finds a solution, if one exists) and also optimal (finds the optimal path to a goal) if:

• the branching factor is finite

• arc costs are > 0

• h(n) is admissible which means an underestimate of the length of the shortest path from n to a goal node.

This property is known as the admissibility of A*

Why A* is Optimally Efficient?

No other optimal algorithm can able to expand fewer nodes than A*.

This is because any algorithm that does not expand every node with f(n) < f* risks to miss the optimal solution.

Also, A* is only optimal if two conditions are met:

  1. The heuristic is admissible, as it will never overestimate the cost.
  2. The heuristic is monotonic, that is, if h(ni) < h(ni + 1), then real-cost(ni) < real-cost(ni + 1).

You can prove the optimality to be correct by assuming the opposite and expanding the implications.

Assume that the path gives by A* is not optimal with an admissible and monotonic heuristic, and think about what that means in terms of implications (you'll soon find yourself reaching a contradiction), and thus, your original assumption is reduced to absurd.

The main idea of the proof is that when A* finds a path, it has a found a path that has an estimate lower than the estimate of any other possible paths. Since the estimates are optimistic, the other paths can be safely ignored.

Also, A* is only optimal if two conditions are met:

  1. The heuristic is admissible, as it will never overestimate the cost.
  2. The heuristic is monotonic, that is, if h(ni) < h(ni + 1), then real-cost(ni) < real-cost(ni + 1).

Optimality (finding a shortest path): Provided A∗ is given an admissible heuristic, it will always find a shortest path because the “optimistic” heuristic will never allow it to skip over a possible shorter path option when expanding nodes

Optimality (number of node expansions): Specifically, the number of node expansions verses other algorithms with the same heuristic information (as A∗ is clearly more optimal than algorithms that lack heuristic information)

Admissible Heuristic: A heuristic function h(n) is said to be admissible on (G, Γ) iff h(n) ≤ h ∗ (n) for every n ∈ G

Consistent Heuristic: A heuristic function h(n) is said to be consistent (or monotone) on G iff for any pair of nodes, n 0 and n, the triangle inequality holds: h(n 0 ) ≤ k(n 0 , n) + h(n)

Surely Expanded: Nodes that must be expanded to reach a goal node (regardless of the tie-breaking rule or algorithm used)

Admissible Algorithm: An algorithm that, given a problem space where h(n) ≤ h ∗ (n) for every n ∈ G, will return least-cost solutions

Dominance: Algorithm A dominates some other algorithm B if the set of nodes that A expands is a subset of the nodes expanded by B (not just fewer nodes)

Strictly Dominate: A strictly dominates B iff A dominates B and B does not dominate A

Optimal (strong definition): Algorithm A is optimal over a class A of algorithms iff A dominates every member of A Weakly Optimal: No member of A strictly dominates A

To determine the optimality of A∗ over other algorithms with the same heuristic information, we group these algorithms into classes (since A∗ is already a class of algorithms, each with a different tie-breaking rule):

Aad: Algorithms that return least-cost solutions when given an admissible problem space (though not necessarily an admissible heuristic to run on that problem space)

Abf : Subclass of Aad that accepts any path-dependent evaluation function, but operates in a best-first manner

Agc: Subclass of Aad that will return optimal solutions whenever A∗ does, but may not obey h > h

A∗ is optimal relative to A in the following senses:

0-Optimal: All A∗ ’s tie-breaking rules dominate all members of A

1-Optimal: One of A∗ ’s tie-breaking rules expands a subset of all A’s members

2-Optimal: No member of A expands a proper subset of any of A∗ ’s members

3-Optimal: No tie-breaking rule in A∗ (or A∗∗) is strictly dominated by some member of A

Strongest → Weakest

0-Optimal → 3-Optimal

Theorem & proof

If any admissible algorithm B does not expand a node which is surely expanded by A∗ in some problem instance where h is admissible and non-pathological, then in that very problem instance B must expand a node which is avoided by every tie-breaking-rule in A∗ .

No algorithm (A∗ or otherwise) is 1-optimal over those guaranteed to find an optimal solution when given h ≤ h ∗ .

A∗∗ differs from A∗ in that it relies not only on the g + h value of node n, but also considers all the g + h values along the path from s to n. The maximum is then used as a criterion for node selection.

A∗∗ Evaluation Function

F’ Ps−n = max{f(n 0 ) = gPs−n (n 0 ) + h(n 0 )|n 0 ∈ Ps−n}

A∗∗ is 3-optimal over all algorithms relative to IAD.

The type-3 optimality of A∗∗ over Aad demonstrates that those “smart” algorithms that prevent A∗ from achieving optimality are not smart after all, but simply lucky; each takes advantage of the peculiarity of the graph for which it was contrived, and none can maintain this superiority over all problem instances. If it wins on one graph, there must be another where it is beaten, and by the very same opponent, A∗∗. It is in this sense that A∗∗ is 3-optimal; it exhibits a universal robustness against all its challengers.

A∗ (a) For every tie-breaking rule of A∗ and for every problem instance I ∈ IAD, there exists a tie-breaking rule for A∗∗ that expands a subset of the nodes expanded by A∗ . (b) Moreover, there exists a problem instance and a tie-breaking rule for A∗∗ that expands a proper subset of the nodes that are expanded by any tie-breaking rule of A∗ .∗ strictly dominates A∗

A∗∗ is admissible over IAD.

Algorithm A∗∗ will terminate with an optimal solution in every problem instance in which

h ≤ h∗

Because of the pathological cases in IAD, A∗ cannot be 3-optimal. A∗∗ exists due to this fact, because it can dominate all instances of A∗ . A∗∗ is 3-optimal in the most general case.

A∗∗ is admissible and in non-pathological cases A∗∗ expands the same set of nodes as does A∗ , namely, the surely expanded nodes in NC∗ g+h . In pathological cases, however, there exist tie-breaking rules in A∗∗ that strictly dominate every tie-breaking rule in A∗ . This immediately precludes A∗ from being 3-optimal relative to IAD and nominates A∗∗ for that title.

A** is not a BF∗ algorithm since it uses one function f 0 for ordering nodes for expansion and a different function g for redirecting pointers. Had we allowed A∗∗ to use f for both purposes, it would not be admissible relative to IAD, since f 0 is not order preserving.


Related Solutions

Problem 5: Find Smallest Elements In this problem, we will write a function to find the...
Problem 5: Find Smallest Elements In this problem, we will write a function to find the smallest elements of a list. Define a function named find_smallest() that accepts two parameters: x and n. The parameter x is expected to be a list of values of the same time, and n is expected to be an either integer, or the value None, and should have a default value of None. • If n is set to None, then the function should...
Consider the problem from Lecture 4, “Search, Sampling and Independence.” Assume that the distribution of prices...
Consider the problem from Lecture 4, “Search, Sampling and Independence.” Assume that the distribution of prices from which the consumer draws is (i) discrete and (ii) Uniformly distributed with N possible stores/prices from which to draw. (a) If the consumer is lost in the mall and doesn’t remember the last store he visited (i.e. the last price he drew) and so cannot avoid the possibility of returning to the same store, are successive price draws dependent or independent? (b) If...
In this problem, we assume that the odds of giving birth to agirl or to...
In this problem, we assume that the odds of giving birth to a girl or to a boy are 1 2 each. We consider a country in which, because of tradition and particular socio-economic circumstances, parents give birth to children until they give birth to their first son, at which point they stop having children. The point of the problem is to evaluate the proportion of children who are boys in a given generation. We will do this in several...
Problem 3: Minimum In this problem, we will write a function to find the smallest element...
Problem 3: Minimum In this problem, we will write a function to find the smallest element of a list. We are, in a sense, reinventing the wheel since the min() function already performs this exact task. However, the purpose of this exercise is to have you think through the logic of how such a function would be implemented from scratch. Define a function named minimum(). The function should accept a single parameter named x, which is expected to be a...
We use binary search tree because in best case scenario we can retrieve anything we search...
We use binary search tree because in best case scenario we can retrieve anything we search for in O(log(n)) times. However, this is not always the case. Give an example of when this fails and what can be done to avoid it.
1. In this problem, we assume for convenience that we consider call options for only one...
1. In this problem, we assume for convenience that we consider call options for only one share of a stock. We consider only one stock, and all options are for this stock. We denote the expiration date of the options by T, and we assume that it is the same date for all options considered below. We denote prices as pure numbers, omitting any notation for a currency such as the dollar sign. You may assume that the price C(K)...
In a typical optimization problem (max/min problem), we want to find a relative maximum or relative...
In a typical optimization problem (max/min problem), we want to find a relative maximum or relative minimum of a function. Our process is to • find the derivative of the function, • set that derivative equal to zero, • and solve for x. Use complete sentences to explain why this process makes sense.
Find the median (middle value) of an array of numbers, which we will assume are all...
Find the median (middle value) of an array of numbers, which we will assume are all distinct. For example, if there are 27 elements, then you must return the 14th smallest (which is also the 14th largest). The strategy used to solve this problem is somewhat like the quicksort algorithm, but has some important differences. Consider choosing a pivot element and partitioning the input so that the elements less than the pivot are to its left, and those larger than...
Constant Yield Harvesting. In this problem, we assume that fish are caught at a constant rate...
Constant Yield Harvesting. In this problem, we assume that fish are caught at a constant rate h independent of the size of the fish population, that is, the harvesting rate H(y, t) = h. Then y satisfies dy/dt = r(1 − y/K )y − h = f (y). (ii) The assumption of a constant catch rate h may be reasonable when y is large but becomes less so when y is small. (a) If h < rK/4, show that Eq....
Why will we assume arbitrary values when it does not satisfy the conditions of the problem....
Why will we assume arbitrary values when it does not satisfy the conditions of the problem. I talking about the arbitrary values in blending plan?
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT