In: Statistics and Probability
Explain why the number of nonterminals that can pop from an LL(1) parse stack is not bounded by a grammar-specific constant.
LL(1) parsing principle:-
Parse from 1) left-to-right (as always anyway), do a 2) left-most derivation
and resolve the “which-right-hand-side” non-determinism by
1. looking 1 symbol ahead.
Explanation
• two flavors for LL(1) parsing here (both are top-down parsers)
– recursive descent
– table-based LL(1) parser
• predictive parsers
If one wants to be very precise: it’s recursive descent with one look-ahead and
without back-tracking. It’s the single most common case for recursive descent
parsers. Longer look-aheads are possible, but less common. Technically, even
allowing back-tracking can be done using recursive descent as principle (even
if not done in practice).
Sample expr grammar again
factors and terms
exp → term exp′
exp′ → addop term exp′
∣
addop → + ∣ −
term → factor term′
term′ → mulop factor term′
∣
mulop → ∗
factor → ( exp ) ∣ n
(4.6)
Look-ahead of 1: straightforward, but not trivial
• look-ahead of 1:
– not much of a look-ahead, anyhow
– just the “current token”
⇒ read the next token, and, based on that, decide
• but: what if there’s no more symbols?
⇒ read the next token if there is, and decide based on the token or else the
fact that there’s none left6
Example: 2 productions for non-terminal factor
factor → ( exp ) ∣ number
6Sometimes “special terminal” $ used to mark the end (as mentioned).
Remark
that situation is trivial, but that’s not all to LL(1) . . .
Recursive descent: general set-up
1. global variable, say tok, representing the “current token” (or pointer to
current token)
2. parser has a way to advance that to the next token (if there’s one)
Idea
For each non-terminal nonterm, write one procedure which:
• succeeds, if starting at the current token position, the “rest” of the token
stream starts with a syntactically correct word of terminals representing
nonterm
• fail otherwise
• ignored (for right now): when doing the above successfully, build the AST
for the accepted nonterminal.
Recursive descent
method factor for nonterminal factor
f i n a l int LPAREN=1,RPAREN=2,NUMBER=3,
PLUS=4,MINUS=5,TIMES=6;
void facto r ( ) {
switch ( tok ) {
case LPAREN: e a t (LPAREN) ; expr ( ) ; e a t (RPAREN) ;
case NUMBER: e a t (NUMBER) ;
}
}
Recursive descent
qtype token = LPAREN | RPAREN | NUMBER
| PLUS | MINUS | TIMES
l e t f a c t o r ( ) = (∗ f u n c t i o n f o r f a c t o r s ∗)
match ! tok with
LPAREN −> e a t (LPAREN) ; expr ( ) ; e a t (RPAREN)
| NUMBER −> e a t (NUMBER)
Slightly more complex
• previous 2 rules for factor: situation not always as immediate as that
LL(1) principle (again)
given a non-terminal, the next token must determine the choice of right-hand
side7
First
⇒ definition of the First set
Lemma 4.4.1 (LL(1) (without nullable symbols)). A reduced context-
free grammar without nullable non-terminals is an LL(1)-grammar iff for
all non-terminals A and for all pairs of productions A → α1 and A → α2
with α1 =/ α2:
First1(α1) ∩ First1(α2) = ∅ .
Common problematic situation
• often: common left factors problematic
if -stmt → if ( exp ) stmt
∣ if ( exp ) stmt else stmt
• requires a look-ahead of (at least) 2
• ⇒ try to rearrange the grammar
1. Extended BNF ([6] suggests that)
if -stmt → if ( exp ) stmt[else stmt]
1. left-factoring:
7
It must be the next token/terminal in the sense of First, but it need not be a token directly
mentioned on the right-hand sides of the corresponding rules.
if -stmt → if ( exp ) stmt else−part
else−part → ∣ else stmt
Recursive descent for left-factored if -stmt
procedure i fstmt ( )
begin
match ( " i f " ) ;
match ( " ( " ) ;
exp ( ) ;
match ( " ) " ) ;
stmt ( ) ;
i f token = " e l s e "
then match ( " e l s e " ) ;
stmt ( )
end
end ;
Left recursion is a no-go
factors and terms
exp → exp addop term ∣ term
addop → + ∣ −
term → term mulop factor ∣ factor
mulop → ∗
factor → ( exp ) ∣ number
(4.7)
Left recursion explanation
• consider treatment of exp: First(exp)?
– whatever is in First(term), is in First(exp)
8
– even if only one (left-recursive) production ⇒ infinite recursion.
Left-recursion
Left-recursive grammar never works for recursive descent.
8And it would not help to look-ahead more than 1 token either.