Parser Error Examples

Dr. Robert Heckendorn
University of Idaho

Many of the examples here show a "dump" of the internal state of the parsing. This is done by setting the bison variable yydebug=1.


This example has right recursion in it and won't resolve until the end of input is reached. If you suspect that your list of STMTs is long in the case of this example, it could cause a stack overflow.

loop2.y

%token XX YY ZZ // delay print. // possible stack overflow. // teaches us to avoid right recursion in a bottom up parser %% stmt : a stmt {printf("> STMT\n"); } | a {printf("> the a STMT\n"); } ; a : XX {printf("> XX\n"); } | YY {printf("> YY\n"); } | ZZ {printf("> ZZ\n"); } ; %%

OUTPUT WITH yydebug=1

INPUT: XX YY ZZ

Starting parse
Entering state 0
Reading a token: Next token is 257 (XX)
Shifting token 257 (XX), Entering state 1
Reducing via rule 3 (line 34), XX  -> a
> XX
state stack now 0
Entering state 4
Reading a token: Next token is 258 (YY)
Shifting token 258 (YY), Entering state 2
Reducing via rule 4 (line 35), YY  -> a
> YY
state stack now 0 4
Entering state 4
Reading a token: Next token is 259 (ZZ)
Shifting token 259 (ZZ), Entering state 3
Reducing via rule 5 (line 36), ZZ  -> a
> ZZ 
state stack now 0 4 4 
Entering state 4
Reading a token: Now at end of input.
Reducing via rule 2 (line 32), a  -> stmt        <-- first time we generate a stmt!
> the a STMT
state stack now 0 4 4
Entering state 5
Reducing via rule 1 (line 31), a stmt  -> stmt
> STMT 
state stack now 0 4
Entering state 5
Reducing via rule 1 (line 31), a stmt  -> stmt
> STMT
state stack now 0
Entering state 6
Now at end of input.
Shifting token 0 ($), Entering state 7
Now at end of input.


loop.y

This is the correct way to do a list by doing left recursion if order of evaluation of subparts doesn't matter. This will reduce the STMT as soon as possible

 
%token XX YY ZZ
%%
stmt  : stmt a  {printf("> STMT\n"); }
      | a       {printf("> the a STMT\n"); }
      ;
a     : XX {printf("> XX\n"); }
      | YY {printf("> YY\n"); }
      | ZZ {printf("> ZZ\n"); } 
      ;
%% 

OUTPUT WITH yydebug=1

 
Starting parse
Entering state 0
Reading a token: Next token is 257 (XX)
Shifting token 257 (XX), Entering state 1
Reducing via rule 3 (line 31), XX  -> a
> XX 
state stack now 0
Entering state 5
Reducing via rule 2 (line 29), a  -> stmt
> the a STMT
state stack now 0
Entering state 4
Reading a token: Next token is 258 (YY)
Shifting token 258 (YY), Entering state 2
Reducing via rule 4 (line 32), YY  -> a 
> YY 
state stack now 0 4
Entering state 6
Reducing via rule 1 (line 28), stmt a  -> stmt
> STMT
state stack now 0
Entering state 4
Reading a token: Next token is 259 (ZZ)
Shifting token 259 (ZZ), Entering state 3
Reducing via rule 5 (line 33), ZZ  -> a
> ZZ 
state stack now 0 4
Entering state 6
Reducing via rule 1 (line 28), stmt a  -> stmt
> STMT
state stack now 0
Entering state 4
Reading a token: Now at end of input.
Shifting token 0 ($), Entering state 7
Now at end of input.


x-lr2.y

%token ID TYPE
%%
func  : TYPE ID '(' params ')'
      | TYPE ID '(' ')'
      ; 

params : param | params ',' param <--- comma could mean switch to new parm ;

param : TYPE list_of_ids ;

list_of_ids : ID | list_of_ids ',' ID <--- comma could mean get id ; %%

check it for input:
type id(type id, type id)
               ^
        problem is right here.  Can't tell if type or id after ,

possible parse trees (left-most derivation) could be the following. But remember this is a bottom up parser so this just indicates where the two different bottom up parses are "headed".
 
parms
params ',' param
param ',' param
TYPE list_of_ids ',' TYPE list_of_ids
TYPE ID ',' TYPE list_of_ids [reduce first and then comma]
   HERE LOOKING AT ',' ON INPUT: reduce:  TYPE list_of_ids .  -> param
or:
parms
param
TYPE list_of_ids
TYPE list_of_ids ',' ID 
TYPE ID ',' ID [shift comma on stack here]
   HERE:  shift: list_of_ids . ',' ID  ->  list_of_ids ',' . ID 
Let's see what is in the output file x-lr2.output. It says:
 
State 11 conflicts: 1 shift/reduce 
Let's go to state 11. Here we see two LR(0) items (dot-thingies). Production 5 is a reduce and production 7 is a shift. There is nothing wrong as long as they have non-overlapping follow sets. Unfortunately it says here that it will shift if it sees a ',' and on the other hand it will reduce if it sees ','. Bad news. Can't decide what to do.
 
state 11

    5 param: TYPE list_of_ids .
    7 list_of_ids: list_of_ids . ',' ID

    ','  shift, and go to state 14

    ','       [reduce using rule 5 (param)]
    $default  reduce using rule 5 (param) 
Oddly enough, this problem could be solved if it wasn't for the comma. The comma hides the fact that one of the routes of parsing requires a TYPE and other requires an ID! But the parser only gets to look one token ahead.

NOTE: the grammar is NOT ambiguous! It is just not parseable with an LR(1) parser. The problem is the with the grammar, not what we want in the language.

Bison by default chooses to shift. That is the wrong answer in this case. Consider the input TYPE ID ( TYPE ID , TYPE ID ):

 
Starting parse
Entering state 0
Reading a token: Next token is 258 (TYPE)
Shifting token 258 (TYPE), Entering state 1
Reading a token: Next token is 257 (ID)
Shifting token 257 (ID), Entering state 2
Reading a token: Next token is 40 ('(')
Shifting token 40 ('('), Entering state 3
Reading a token: Next token is 258 (TYPE)
Shifting token 258 (TYPE), Entering state 4
Reading a token: Next token is 257 (ID)
Shifting token 257 (ID), Entering state 8
Reducing via rule 6 (line 39), ID  -> list_of_ids
state stack now 0 1 2 3 4
Entering state 9
Reading a token: Next token is 44 (',')
Shifting token 44 (','), Entering state 12
Reading a token: Next token is 258 (TYPE)
ERROR lineno(1):parse error, expecting `ID'.  I got: type  <--- misidentified as list of ids!
Error: state stack now 0 1 2 3 4 9
Error: state stack now 0 1 2 3 4 
Error: state stack now 0 1 2 3 
Error: state stack now 0 1 2
Error: state stack now 0 1
Error: state stack now 0


x-lr2-2.y This rewrite of the grammar fixes the problem. Put the decision about the comma in the same state.

%token ID TYPE
%%
func  : TYPE ID '(' params ')'
      | TYPE ID '(' ')'
      ;

params : TYPE ID | params ',' TYPE ID <- if type | params ',' ID <- if id ; %%

Starting parse Entering state 0 Reading a token: Next token is 258 (TYPE) Shifting token 258 (TYPE), Entering state 1 Reading a token: Next token is 257 (ID) Shifting token 257 (ID), Entering state 2 Reading a token: Next token is 40 ('(') Shifting token 40 ('('), Entering state 3 Reading a token: Next token is 258 (TYPE) Shifting token 258 (TYPE), Entering state 4 Reading a token: Next token is 257 (ID) Shifting token 257 (ID), Entering state 7 Reducing via rule 3 (line 32), TYPE ID -> params state stack now 0 1 2 3 Entering state 6 Reading a token: Next token is 44 (',') Shifting token 44 (','), Entering state 9 Reading a token: Next token is 258 (TYPE) Shifting token 258 (TYPE), Entering state 11 <---- right here it knows to shift TYPE Reading a token: Next token is 257 (ID) Shifting token 257 (ID), Entering state 12 Reducing via rule 4 (line 33), params ',' TYPE ID -> params state stack now 0 1 2 3 Entering state 6 Reading a token: Next token is 41 (')') Shifting token 41 (')'), Entering state 8 Reducing via rule 1 (line 28), TYPE ID '(' params ')' -> func state stack now 0 Entering state 13 Reading a token: Now at end of input. Shifting token 0 ($), Entering state 14 Now at end of input.

This creates a state 11 with a clear decision based on the look ahead:

 
state 11

    4 params: params ',' . TYPE ID
    5       | params ',' . ID

    ID    shift, and go to state 12
    TYPE  shift, and go to state 13

Note: a second way to fix this is to use a different character to separate parameters of the same type. For example ';'. That is change the syntax of the language to make it easier to parse. This might also make it easier for someone to read and improve error recovery by signalling the intent of having a new type followed by list of ids.

%token ID TYPE
%%
func  : TYPE ID '(' params ')'
      | TYPE ID '(' ')'
      ;

params : param | params ';' param <--- separate with not a comma ;

param : TYPE list_of_ids ;

list_of_ids : ID | list_of_ids ',' ID <--- comma means id is next ; %%


x-reducereduce.y

This has a reduce/reduce error in it

%token XX
%%
stmt : a
     | b
     ;

a : XX ;

b : XX ;

%%

Here is the conflict:

 
state 1

    a  ->  XX .   (rule 3)
    b  ->  XX .   (rule 4)

    $           reduce using rule 3 (a)
    $           [reduce using rule 4 (b)]
    $default    reduce using rule 3 (a) 
This is, of course, unfixable because this grammar is ambiguous. No amount of grammar rewrites that includes two routes to the same sentence will solve this. The solution is to remove one of the possible trees.

x-reducereduce2.y

We have changed the language in this example by adding clues to how to parse the XX. This does NOT have a reduce/reduce error in it. Because the YY and ZZ are lookaheads that hint which reduce to take.

%token XX YY ZZ
%%
stmt : a YY 
     | b ZZ
     ;

a : XX ;

b : XX ;

%%

This can be seen here:
Grammar
rule 1    stmt -> a YY
rule 2    stmt -> b ZZ
rule 3    a -> XX
rule 4    b -> XX
     .
     . 
     .
state 1

    a  ->  XX .   (rule 3)
    b  ->  XX .   (rule 4)

    ZZ          reduce using rule 4 (b)
    $default    reduce using rule 3 (a)


x-reducereduce3.y

This is a case where the reduce/reduce error has returned because the parser only looks one token ahead. This is similar to the param parser with the x-lr2.y grammar.

%token WW XX YY ZZ
%% 
stmt : a YY WW 
     | b YY ZZ
     ;

a : XX ;

b : XX ;

%%


x-reducereduce4.y

This is the same language as the last but the grammar is fixed so it is LR(1). This was done my migrating the token that hides the deciding look ahead into the a and b productions.

 
%token WW XX YY ZZ
%%
stmt : a WW
     | b ZZ
     ; 

a : XX YY ;

b : XX YY ;

%%


x-shiftreduce.y Here is a shift reduce problem in an ambiguous grammar that looks like the operator ambiguity by lack of associativity for the XX operator.

%token XX YY ZZ
%%
stmt : stmt XX stmt
     | ZZ
     ; 
%%
Here is the colliding state:
state 5

    1 stmt: stmt . XX stmt
    1     | stmt XX stmt .
 
    XX  shift, and go to state 4 

    XX        [reduce using rule 1 (stmt)]
    $default  reduce using rule 1 (stmt)


if.y

This gives a shift reduce error, but accidentally does what we want if you want left associativity. DON'T DO THIS!!!!!!!!!!!!!!!!!! This is only an educational example. The reason this works is because bison by default will select a shift over a reduce which is the right answer in this case.

%token IF XX YY THEN ELSE
%%
stmt : IF XX THEN stmt
     | IF XX THEN stmt ELSE stmt
     | YY
     ;
%%
given this input:
if xx then if xx then yy else yy

Starting parse
Entering state 0
Reading a token: Next token is 257 (IF)
Shifting token 257 (IF), Entering state 1 
Reading a token: Next token is 258 (XX) 
Shifting token 258 (XX), Entering state 3
Reading a token: Next token is 260 (THEN)
Shifting token 260 (THEN), Entering state 4
Reading a token: Next token is 257 (IF)
Shifting token 257 (IF), Entering state 1
Reading a token: Next token is 258 (XX)
Shifting token 258 (XX), Entering state 3
Reading a token: Next token is 260 (THEN) 
Shifting token 260 (THEN), Entering state 4
Reading a token: Next token is 259 (YY)
Shifting token 259 (YY), Entering state 2
Reducing via rule 3 (line 30), YY  -> stmt
state stack now 0 1 3 4 1 3 4
Entering state 5
Reading a token: Next token is 261 (ELSE)
Shifting token 261 (ELSE), Entering state 6
Reading a token: Next token is 259 (YY)
Shifting token 259 (YY), Entering state 2
Reducing via rule 3 (line 30), YY  -> stmt 
state stack now 0 1 3 4 1 3 4 5 6
Entering state 7
Reducing via rule 2 (line 29), IF XX THEN stmt ELSE stmt  -> stmt 
state stack now 0 1 3 4
Entering state 5
Reading a token: Now at end of input.
Reducing via rule 1 (line 28), IF XX THEN stmt  -> stmt
state stack now 0 
Entering state 8
Now at end of input.
Shifting token 0 ($), Entering state 9
Now at end of input.

Another shift reduce error

 
stmt : XX stmt YY 
     | XX stmt
     | ZZ 
     ; 
This is the dangling else problem disguised. What is the follow set for stmt? When should we shift and when should we reduce?


Robert Heckendorn Up One Level Last updated: Oct 5, 2010 6:29