A Bison Primer

Bison is a preprocessor. Bison can generate C or C++ compatible code. It is essentially the same program as yacc with a few changes.

Like Flex a Bison program has 3 sections: definitions, rules, user subroutines. Each section is separated by a pair of percents signs (%) in the first column. The form is roughly like:

definitions 
%%
rules
%%
user subroutine section

The Definition Section

In the definitions section you can have a region of text that is explicitly copied to the output. This section usually contains includes, global variables declarations and prototypes of functions declared in the user subroutine section. This region of copied code is contained between %{ and %} Warning: It is NOT   %{ and }%!:

%{
code to copy into the final program
%}

After this comes token declarations, nonterminal type declarations, precedence/association rules, and other options.

Tokens are declared by the type of value information they return.

%token <dvalue> NUMBER
%token <varindex> NAME
Declares token NUMBER to return value information of type <dvalue>

Nonterminal types must also be declared as in:

%type <dvalue> expression
%type <dvalue> term
%type <dvalue> varornum

Which declares expression to return type <dvalue>.

A list of equal precedence operators preceded by %left if the operators are left associative, or %right if they are right associative, can be given. The order in the list is from lowest precedence to highest. For example the three lines in this order:

%right implies
%left or xor
%left and
%nonassoc not
Means and has the highest precedence and is left associative. or and xor are of equal precedence and lower precedence than and and finally implies is the lowest precedence and is right associative. %nonassoc means that the not operator does not have any associativity rule.

The Rules Section

Consists of a collection of productions in a BNF like format. For each production and in fact for each symbol you can attach a block of code. For example:
expression: expression '+' term     { $$ = $1 + $3; } 
          | term                    { $$ = $1; } 
          ;
Also see the program example to follow.

The User Subroutine Section

Simply put other routines that you want here including any routines that refer to routines created by flex such as a main which refers to yyparse. See below.

Redirecting Input

Input for the scanner is found from the variable yyin which is defined
extern FILE *yyin;
By setting this variable to a file pointer you open you can set the source of input. However, it is tricky to change the source of input once you start reading input by calling yyparse(). This is because the input tokens are buffered up. So to do an include you have to swap the token buffers. If you ever what to do that check the web for this topic.

Example

Here is a simple numeric calculator program that uses flex to build a scanner and bison to process the syntax. A by-product of the syntax analysis happens to be the calculations. For more complex programs this is not possible.

The bison source code is calc.y
The flex source code is calc.l
A makefile that you can use called makefile. You can envoke it with make.

Here is a Bourne shell script for our Sun machines to compile a program that uses both Bison and Flex using either C or C++:

#!/bin/sh -x
bison -v -t -d $1.y                # create $1.tab.c and $1.tab.h
flex $1.l                          # create lex.yy.c
# gcc -g lex.yy.c $1.tab.c -lfl -lm -o $1  # create calc using C
g++ -DCPLUSPLUS -g $1.tab.c lex.yy.c -lfl -lm -o $1  # create calc using C++
For bison the -v option creates a .output file that contains a verbose description of the parser table created including states and conflicts. This is extremely useful in debugging reduce and shift errors. The -t option loads the debug features so that if the variable yydebug is set to 1 debugging information will be dumped showing every step of parsing. In order to use the yydebug variable you need to declare the variable with extern int yydebug in your bison file. The option -d creates the mandatory .h file for the token definitions that will be used by the flex file.

For flex the -d enables the debugging capability of flex. If the debugging flag iis turned on in the C code (the variable yy_flex_debug) then it will debug.

You can choose either the gcc compiler or the g++ compiler. If you use the g++ compiler you need to set the macro variable CPLUSPLUS as in the above script. This declares yylex and includes the string.h file for the benefit of C++. This can be done as above or the declarations can be made by hand.

Note the inclusion of the flex library with -lfl and the optional math library -lm. If this scripts is called dobison then for files calc.y and calc.l you would call dobison calc and the executable file calc would be created.

What are those shift/reduce reduce/reduce errors?

Bison uses what is known as a shift/reduce parser to build a parse tree. In the process of analyzing the input it can either reduce the code by the use of some production or shift a token onto a stack and continue looking. When it can't tell if it should shift or reduce you get a shift/reduce error and it makes a wild guess what you want. This generally happens as a result of a ambiguity in your grammar. If you get a reduce/reduce error that means it can't tell which of two production you meant. It will make a guess and move on.

If you get these errors look to resolve ambiguities in your grammar.

A makefile for Bison/Flex

In the following makefile remember that each line that begins with whitespace begins with a single tab and not a bunch of blanks. That will make the file portable to all makes.
# next, define the name of thing to be built:
BIN  = cb1
CC   = g++
# CFLAGS = -g 
# use the following with C++ if file ext is .cc 
# CCFLAGS = -DCPLUSPLUS -g  
# use the following with C++ if file ext is .c
CFLAGS = -DCPLUSPLUS -g  

SRCS = $(BIN).y $(BIN).l
OBJS = lex.yy.o $(BIN).tab.o
LIBS = -lfl -lm 

$(BIN): $(OBJS)
	$(CC) $(CCFLAGS) $(OBJS) $(LIBS) -o $(BIN)

$(BIN).tab.h $(BIN).tab.c: $(BIN).y
	bison -v -t -d $(BIN).y

lex.yy.c: $(BIN).l $(BIN).tab.h
	flex $(BIN).l
 
all:
	touch $(SRCS)
	make

clean:
	rm -f $(OBJS) $(BIN) lex.yy.c $(BIN).tab.h $(BIN).tab.c $(BIN).tar

tar:
	tar -cvf $(BIN).tar $(SRCS) makefile
 

Further Reading

The Bison Manual
Robert Heckendorn Up One Level Last updated: Sep 4, 2007 15:13