A COMPARISON BY EXAMPLE OF LR(1) WITH SLR(1) PARSERS
----------------------------------------------------
Robert Heckendorn
University of Idaho
Let's look in detail at the example from page 215 in the book:
-> |
-> identifier
-> ` :=
`` -> `` [ ] | identifier
-> `` | number
A stand-in for the grammar above is:
S -> id | V = E
V -> id
E -> V | n
where the capital letters are nonterminals.
Let's create an SLR(1) parser for this grammar.
This is easy. We use the LR(0) items and build
a NFA then DFA and code from that as follows:
consider the following augmented and split out grammar:
0) S'-> S // augmented grammar
1) S -> id
2) S -> V = E
3) V -> id
4) E -> V
5) E -> n
That leads to these LR(0) items:
0) S' -> . S
1) S' -> S .
2) S -> . id
3) S -> id .
4) S -> . V = E
5) S -> V . = E
6) S -> V = . E
7) S -> V = E .
8) V -> . id
9) V -> id .
10) E -> . V
11) E -> V .
12) E -> . n
13) E -> n .
we can now group these together into a NFA. Here are the transitions
between the 14 states above. Note that dot before a terminal yeild a
transition on a terminal and a dot before a nonterminal leads to an
epsilon transition for each possible substitution for the nonterminal
as well as a substitution for on each completed nonterminal. Progress
through the NFA goes from start symbol to a "complete item".
0 -> 1 on S
2 -> 3 on id
4 -> 5 on V
5 -> 6 on =
6 -> 7 on E
8 -> 9 on id
10 -> 1 on 1 V
12 -> 1 on 3 n
0 -> 2 on \epsilon
0 -> 4 on \epsilon
4 -> 8 on \epsilon
6 -> 1 on 0 \epsilon
6 -> 1 on 2 \epsilon
10 -> 8 on \epsilon
We now convert the NFA to a DFA by starting with the
start symbol:
STATES TRANSITIONS
0) S' -> . S 0->1 on S
0) S -> . id 0->2 on V
0) S -> .V = E 0->3 on id
0) V -> . id
1) S' -> S .
2) S -> V . = E 2->4 on =
3) S -> id .
3) V -> id .
4) S -> V = . E 4->5 on E
4) E -> . V 4->3 on id
4) E -> . n 4->6 on n
4) V -> . id 4->7 on V
5) S -> V = E .
6) E -> n .
7) E -> V .
We can now write the code for the parser.
1) for every noncomplete item (dot in the middle) we expect one
of the symbols following the dot to be the next token. That
is the lookahead that will tell us what state to go to next.
2) for every complete item (dot at the end) we expect to
simply pop off the matching tokens on the stack. Look at
the state and element from the lhs and goto that new state.
For example in state 5 above we will pop the three tokens V = E
from the stack and look at the revealed new-state on the top
of the stack. Then take the lhs of the production in state 5
which is S and the new-state and get the next state.
This works for most things above but something is funny with state 3.
We pop off the id giving us the new state but what is the lhs we use?!
Is it S or V?? We don't know which reduce to use. This is a
reduce-reduce error. (Compare this with the dangling else which gives
us a shift-reduce.)
Let's try a different approach. Let's include the next token in
the state structure and see if that helps. It might because
we are including this from the beginning.
consider the grammar again:
0) S'-> S $ // augmented grammar with end-of-input token
1) S -> id
2) S -> V = E
3) V -> id
4) E -> V
5) E -> n
Now let's create LR(1) items (see page 217).
LR(1) items are LR(0) items with a single lookahead token tag.
THis token will be the next token we scan in after the complete
pattern is matched. That is, for a production A -> .B the
LR(1) items are [A -> . B, z] where z is in the follow(A).
and guide the parse.
We tag each production depending on where the dot is.
1) If we have a production with a dot before an X
[a -> b . X c, z]
where X is any symbol then we preserve the lookahead token z
[a -> b . X c, z] transitions to [a -> b X . c, z] on X
X can be a terminal or nonterminal.
2) If we have
[a -> b . X c, z]
where X is a nonterminal we also have an \epsilon transition
[a -> b . X c, z] transitions to [ X -> Y, z' ]
for every z' in the First(cz) that is the first set of
the string of token c concatenated to the token z. This
is First(c) if c can never be empty.
Note: for every
[a -> b . X , z] transitions to [ X -> Y, z ] exactly as a special case
of rule 2.
We can construct the LR(1) items given the above rules and
the firstOfList function we described in class:
0) S' -> . S,$
1) S' -> S .,$
2) S -> . id,$
3) S -> id .,$
4) S -> .V = E,$
5) S -> V . = E,$
6) S -> V = . E,$
7) S -> V = E .,$
8) V -> . id,$
9) V -> . id,=
10) V -> id .,=
11) V -> id .,$
12) E -> . V,$
13) E -> V .,$
14) E -> . n,$
15) E -> n .,$
the transitions lead us to this DFA (see page 223)
STATES TRANSITIONS
0) S' -> . S,$ 0->1 on S
0) S -> . id,$ 0->2 on V
0) S -> .V = E,$ 0->3 on id
0) V -> . id,=
1) S' -> S .,$
2) S -> id .,$
2) V -> id .,=
3) S -> V . = E,$ 3->4 on =
4) S -> V = . E,$ 4->5 on E
4) E -> . V,$ 4->8 on id
4) E -> . n,$ 4->7 on n
4) V -> . id,$ 4->6 on V
5) S -> V = E .,$
6) E -> V .,$
7) E -> n .,$
8) V -> id .,$
Note that it is very clear what to do at every state and
when there is a reduce-reduce conflict from the SLR we
get a two choices of productions that are decided by the
lookahead char. So the LR parser saved the day.
Why this works is that in the case of reducing an id to an S the S
must be followed by a $ while a V must be followed by an =. These two
sets do not overlap. In an LR(1) parser a reduce-reduce conflict must
now look something more like the following in the same set:
k) S -> id .,$
k) S -> id .,=
k) V -> id .,*
k) V -> id .,=
When the lookahead is = we can't decide between the two productions.
The important thing to see is that what can follow a V
fol(V) = {=, $}
and
fol{S} = {$}
overlap as you can see below:
Here are the S cases:
2) S -> . id,$
3) S -> id .,$
4) S -> .V = E,$
5) S -> V . = E,$
6) S -> V = . E,$
7) S -> V = E .,$
Here are the V cases:
8) V -> . id,$
9) V -> . id,=
10) V -> id .,=
11) V -> id .,$
In an SLR grammar the reductions 3 and 10 occur in the same
node (node 2) and which reduction to use would be
dictated by the rule in the SLR algorithm:
"if state contains a complete items A ::= X., B::=Y., C::=Z. then
chose A if input \isin Follow(A), B if input \isin Follow(B), etc"
Follow(S) = {$}
Follow(V) = {=, $}
This is a reduce/reduce error since the follow sets are not disjoint.
In an LR(1) grammar we start with three different nodes in the NFA
[S -> id., $]
[V -> id., $]
[V -> id., =]
and in the DFA become two nodes. see page 223 for complete example:
node 2:
[S -> id., $]
[V -> id., =]
node 8:
[V -> id., $]
In node 2 the lookahead symbol tells us which reduction to use.
Let's look at this from a different view. Where can
an id occur?
S -> id | V = E
V -> id
E -> V | n
it can occur in three places. A statement can be:
id $
id = E $
V = id $
Look at figure 5.8:
id $ state 2 with lookahead $
id = E $ state 2 with lookahead =
V = id $ state 8
distinguishing between
id $
V = id $
used to be a problem since in an SLR grammar
state 4 used to go to state 2 on an id.
Now there is a new separate state 8 for id.
This is for processing the id at the end of
an assignment.
This would lump
[S -> id., $]
[V -> id., $]
[V -> id., =]
together and cause a confusion about what to do. An extra state
allows us to record that we are at the end of an assignment so you
must want to do V -> id and not S -> id.
`