In: Computer Science
Optimality of A* Assume that we are running A* search on a problem and we find path p to a goal. In the lecture and text, we proved mathematically that, if we are using an admissible heuristic, there is no other path that is shorter than p. In other words, if path p' is still on the frontier, and p'' is a path to the goal that extends p', path p will always be less than (or equal to) p''. This is called proving the optimality of A*.
Explain why we cannot prove the optimality of A* if the heuristic is not admissible. Be as specific and detailed as possible, referring to the proof in your answer.
Defination 1: A search heuristic h(n) is an estimate of the cost of the optimal (cheapest) path from node n to a goal node.
· Think of h(n) as only using readily obtainable (easy to compute) information about a node.
· h can be extended to paths: h(〈n0,…,nk〉)=h(nk)
Defination 2: A search heuristic h(n) is admissible if it never overestimates the actual cost of the cheapest path from a node to the goal
• A* search takes into account both
• Let f(p) = c(p) + h(p).
• estimate of the cost of a path from the start to a goal via p.
A* always chooses the path on the frontier with the lowest estimated distance from the start to a goal node constrained to go via that path.
Admissibility of A*
A* is complete (finds a solution, if one exists) and optimal (finds the optimal path to a goal) if:
• the branching factor is finite
• arc costs are > 0
• h(n) is admissible -> an underestimate of the length of the shortest path from n to a goal node.
• This property of A* is called admissibility of A*
A* admissible: complete
It halts (does not get caught in cycles) because:
• Let fmin be the cost of the optimal solution path s (unknown but finite if there exists a solution)
• Each sub-path p of s has cost f(p) ≤ fmin
• Due to admissibility (exercise: prove this at home)
• Let fmin > 0 be the minimal cost of any arc
• All paths with length > fmin / cmin have cost > fmin
• A* expands path on the frontier with minimal f(n)
• Always a prefix of s on the frontier - Only expands paths p with f(p) ≤ fmin
• Terminates when expanding s
A* admissible: optimal
Let p* be the optimal solution path, with cost c*.
• Let p’ be a suboptimal solution path. That is c(p’) > c*.
Analysis of A*
If fact, we can prove something even stronger about A* (when it is admissible)
• A* is optimally efficient among the algorithms that extend the search path from the initial state.
• It finds the goal with the minimum # of expansions
A* is Optimally Efficient
No other optimal algorithm is guaranteed to expand fewer nodes than A*
• This is because any algorithm that does not expand every node with f(n) < f* risks to miss the optimal solution
Effect of Search Heuristic
A search heuristic that is a better approximation on the actual cost reduces the number of nodes expanded by A*
• Example: 8puzzle – tiles can move anywhere
– (h1 : number of tiles that are out of place)
– tiles can move to any adjacent square
– (h2 : sum of number of squares that separate each tile from its correct position)
• average number of paths expanded: (d = depth of the solution; IDS=iterative depth first, see next lecture)
• d=12 IDS = 3,644,035 paths A* (h1) = 227 paths A* (h2) = 73 paths
• d=24 IDS = too many paths A* (h1) = 39,135 paths A* (h2) = 1,641 paths
A* is complete (finds a solution, if one exists) and also optimal (finds the optimal path to a goal) if:
• the branching factor is finite
• arc costs are > 0
• h(n) is admissible which means an underestimate of the length of the shortest path from n to a goal node.
This property is known as the admissibility of A*
Why A* is Optimally Efficient?
No other optimal algorithm can able to expand fewer nodes than A*.
This is because any algorithm that does not expand every node with f(n) < f* risks to miss the optimal solution.
Also, A* is only optimal if two conditions are met:
You can prove the optimality to be correct by assuming the opposite and expanding the implications.
Assume that the path gives by A* is not optimal with an admissible and monotonic heuristic, and think about what that means in terms of implications (you'll soon find yourself reaching a contradiction), and thus, your original assumption is reduced to absurd.
The main idea of the proof is that when A* finds a path, it has a found a path that has an estimate lower than the estimate of any other possible paths. Since the estimates are optimistic, the other paths can be safely ignored.
Also, A* is only optimal if two conditions are met:
Optimality (finding a shortest path): Provided A∗ is given an admissible heuristic, it will always find a shortest path because the “optimistic” heuristic will never allow it to skip over a possible shorter path option when expanding nodes
Optimality (number of node expansions): Specifically, the number of node expansions verses other algorithms with the same heuristic information (as A∗ is clearly more optimal than algorithms that lack heuristic information)
Admissible Heuristic: A heuristic function h(n) is said to be admissible on (G, Γ) iff h(n) ≤ h ∗ (n) for every n ∈ G
Consistent Heuristic: A heuristic function h(n) is said to be consistent (or monotone) on G iff for any pair of nodes, n 0 and n, the triangle inequality holds: h(n 0 ) ≤ k(n 0 , n) + h(n)
Surely Expanded: Nodes that must be expanded to reach a goal node (regardless of the tie-breaking rule or algorithm used)
Admissible Algorithm: An algorithm that, given a problem space where h(n) ≤ h ∗ (n) for every n ∈ G, will return least-cost solutions
Dominance: Algorithm A dominates some other algorithm B if the set of nodes that A expands is a subset of the nodes expanded by B (not just fewer nodes)
Strictly Dominate: A strictly dominates B iff A dominates B and B does not dominate A
Optimal (strong definition): Algorithm A is optimal over a class A of algorithms iff A dominates every member of A Weakly Optimal: No member of A strictly dominates A
To determine the optimality of A∗ over other algorithms with the same heuristic information, we group these algorithms into classes (since A∗ is already a class of algorithms, each with a different tie-breaking rule):
Aad: Algorithms that return least-cost solutions when given an admissible problem space (though not necessarily an admissible heuristic to run on that problem space)
Abf : Subclass of Aad that accepts any path-dependent evaluation function, but operates in a best-first manner
Agc: Subclass of Aad that will return optimal solutions whenever A∗ does, but may not obey h > h
A∗ is optimal relative to A in the following senses:
0-Optimal: All A∗ ’s tie-breaking rules dominate all members of A
1-Optimal: One of A∗ ’s tie-breaking rules expands a subset of all A’s members
2-Optimal: No member of A expands a proper subset of any of A∗ ’s members
3-Optimal: No tie-breaking rule in A∗ (or A∗∗) is strictly dominated by some member of A
Strongest → Weakest
0-Optimal → 3-Optimal
Theorem & proof
If any admissible algorithm B does not expand a node which is surely expanded by A∗ in some problem instance where h is admissible and non-pathological, then in that very problem instance B must expand a node which is avoided by every tie-breaking-rule in A∗ .
No algorithm (A∗ or otherwise) is 1-optimal over those guaranteed to find an optimal solution when given h ≤ h ∗ .
A∗∗ differs from A∗ in that it relies not only on the g + h value of node n, but also considers all the g + h values along the path from s to n. The maximum is then used as a criterion for node selection.
A∗∗ Evaluation Function
F’ Ps−n = max{f(n 0 ) = gPs−n (n 0 ) + h(n 0 )|n 0 ∈ Ps−n}
A∗∗ is 3-optimal over all algorithms relative to IAD.
The type-3 optimality of A∗∗ over Aad demonstrates that those “smart” algorithms that prevent A∗ from achieving optimality are not smart after all, but simply lucky; each takes advantage of the peculiarity of the graph for which it was contrived, and none can maintain this superiority over all problem instances. If it wins on one graph, there must be another where it is beaten, and by the very same opponent, A∗∗. It is in this sense that A∗∗ is 3-optimal; it exhibits a universal robustness against all its challengers.
A∗ (a) For every tie-breaking rule of A∗ and for every problem instance I ∈ IAD, there exists a tie-breaking rule for A∗∗ that expands a subset of the nodes expanded by A∗ . (b) Moreover, there exists a problem instance and a tie-breaking rule for A∗∗ that expands a proper subset of the nodes that are expanded by any tie-breaking rule of A∗ .∗ strictly dominates A∗
A∗∗ is admissible over IAD.
Algorithm A∗∗ will terminate with an optimal solution in every problem instance in which
h ≤ h∗
Because of the pathological cases in IAD, A∗ cannot be 3-optimal. A∗∗ exists due to this fact, because it can dominate all instances of A∗ . A∗∗ is 3-optimal in the most general case.
A∗∗ is admissible and in non-pathological cases A∗∗ expands the same set of nodes as does A∗ , namely, the surely expanded nodes in NC∗ g+h . In pathological cases, however, there exist tie-breaking rules in A∗∗ that strictly dominate every tie-breaking rule in A∗ . This immediately precludes A∗ from being 3-optimal relative to IAD and nominates A∗∗ for that title.
A** is not a BF∗ algorithm since it uses one function f 0 for ordering nodes for expansion and a different function g for redirecting pointers. Had we allowed A∗∗ to use f for both purposes, it would not be admissible relative to IAD, since f 0 is not order preserving.