The Problem

This homework consists of these tasks:

  • Tweak-in some improvements to the interface.
  • Use Flex and Bison to build a parser for the C- language.
  • While doing the parsing you will construct the syntax tree. This is not the annotated syntax tree but just the initial abstract syntax tree.
  • You will write the procedure printTree which will print out the tree exactly as I have in the examples I give below and in the test data. Almost the same format is not good enough. The more I have to work to show your output is the same the fewer points you get. Your main program will contain:

    yyparse();         // parse and build the tree in the global var syntaxTree.
    if (printTreeFlag) printTree(stdout, syntaxTree);

    The first line calls the parser which will store the tree in the global called syntaxTree which is defined:

Do not be fooled. This is a nontrivial homework. Do not put off this assignment. It is complicated and detailed.

These tasks are described further below.

Improving the interface

When done with this assignment you will have created code that will recognize legal C- programs and generate the first pass at the abstract syntax tree.

The parser will be named c- just like last time. It will read and process a stream of tokens from a filename given as an argument to the c- command OR from standard input if the filename argument is not present.

I recommend using the getopt routine to handle UNIX command line arguments in a uniform and standard way and make your life easier.

Be sure to use ourgetopt. It is more resilient than some. And be sure to include the getopt code in your tar!

Your program will now also take (using ourgetopt) the -d option as an argument. The -d option turns on the yydebug flag by setting it to 1. For example: c- -d sort.c- should run the c- compiler on the program sort.c- and give details of the parsing that is going on. While c- sort.c- should simply run the c- compiler.

The abstract syntax tree will be printed if the -p is given. So c- -p sort.c- will print the abstract syntax tree (AST) for the assignment and c- sort.c- will not print the AST.

The Parser

For the parsing part of the assignment replace your assignment 1 Bison grammar with the grammar to parse C- code. A good approach is to initially forget about the syntax tree part of the assignment. If you get the right grammar into your compiler it will successfully parse any C- program. A program that simply recognizes whether a program is legal or not is called a recognizer. If you build your Bison grammar directly from the grammar supplied, you will find that you have the dangling else problem. There are several ways to fix this problem. I have discussed one in class. See the handout on the dangling else problem.

To be successful, you need to get a clean compile of your grammar in Bison with no shift-reduce or reduce-reduce errors. Those indicate that your grammar is ambiguous. Bison will make a decision for you and go ahead and generate code making a selection of its choice e.g. reduce before shift. But you must decide for yourself and fix the grammar to be unambiguous.

Coding restriction: Do not attempt to fix dangling else or any other grammar issue with associativity declarations such as %left, %right, or %assoc. Do not fix any other problem with your grammar by using the %expect feature of Bison. This causes Bison to ignore some number of parsing errors and me to deduct points from your assignment. Really, you have the knowledge now to do this with out this "feature". IMPORTANT: I expect your parser to compile without any parser errors.

Now that your recognizer is working, let's look at the syntax tree I want you to produce. As we will discuss in class, the tree is an abbreviated portion of the parse tree containing the parts we are interested in. Here is a sample TreeNode that I used based on the book. You can use what you want. I will be giving examples in class based on the one in the book.:

typedef struct treeNode
    // connectivity in the tree
    struct treeNode *child[MAXCHILDREN];   // children of the node
    struct treeNode *sibling;              // siblings for the node

    // what kind of node
    int lineno;                            // linenum relevant to this node
    NodeKind nodekind;                     // type of node
    union                                  // subtype of type
	DeclKind decl;                     // used when DeclK
	StmtKind stmt;                     // used when StmtK
	ExpKind exp;                       // used when ExpK
    } kind;
    // extra properties about the node depending on type of the node
    union                                  // relevant data to type -> attr
        OpKind op;                         // type of token (same as in bison)
	int value;                         // used when an integer constant or boolean
        unsigned char cvalue               // used when a character
	char *string;                      // used when a string constant
	char *name;                        // used when IdK
    } attr;                                 
    ExpType expType;		           // used when ExpK for type checking
    bool isArray;                          // is this an array
    bool isStatic;                         // is staticly allocated?

    // even more semantic stuff will go here in later assignments.
} TreeNode;

This design is stolen straight from the book. This way you can use the one in the book as an example to work from. Ours has to have extra features and node types. We will discuss this in detail in class. Feel free to use a class rather than a struct in this case if you like. Or feel free to use the modern struct rather than using typedefs. You can actually use any data structure you like as long as the output matches mine and it builds a tree of the same shape as mine.

To encode the program as a tree you need to make the right nodes at the right steps in the parsing. When you need to make a node you will use routines you write similar to the newStmtNode function in util.c for the Tiny language in the book. These nodes will be passed up the tree as pointers and assembled as in the tiny example in the book. This means that many of the nonterminals will be of type pointer to a node. Coding restriction: Do not use YYSTYPE as used in the book. This subverts the type checking features that are there to help you. I will take off for using it. USE UNION. Here is an example of what could be in your union. It is up to you what you want to put in your union. This is influenced by the types you want for your nonterminals. Mine is a bit simpler for example.:

%union { 
    ExpType type;
    TokenData *tokenData; 
    TreeNode *tree;

The Parser and printTree

So what should the tree look like for a given program? This is essentially described the the Bison code. I think the best way to describe this is in class and by example.

To understand the examples you must understand the output format of the printTree function. The printTree function prints the the important information contained in the node pointed to by the second argument. It then applies the printTree function to all the nonnull children and prints them out numbered starting with child[0] being numbered "Child 0" etc. It then recursively follows the sibling pointer if it is nonnull and applies the printTree function to that. The first sibling found is numbered 0. Reading the syntax tree printed for sample input programs shows you what to do in each case. For example, given this program:

bool a;
int b[100];

int max(int x, y)
    int z, zz;
    char c;

    if x>y then z=x;
    else z=y;

    while 666>665 do break;
    b[41] = max(42, 43) + 44*55 + b[99];


you should get the following output from your c-.

Var a of type bool [line: 1]
Sibling: 0  Var b is array of type int [line: 2]
Sibling: 1  Func max returns type int [line: 4]
!   Child: 0  Param x of type int [line: 4]
!   Sibling: 0  Param y of type int [line: 4]
!   Child: 1  Compound [line: 5]
!   !   Child: 0  Var z of type int [line: 6]
!   !   Sibling: 0  Var zz of type int [line: 6]
!   !   Sibling: 1  Var c of type char [line: 7]
!   !   Child: 1  If [line: 9]
!   !   !   Child: 0  Op: > [line: 9]
!   !   !   !   Child: 0  Id: x [line: 9]
!   !   !   !   Child: 1  Id: y [line: 9]
!   !   !   Child: 1  Assign: = [line: 9]
!   !   !   !   Child: 0  Id: z [line: 9]
!   !   !   !   Child: 1  Id: x [line: 9]
!   !   !   Child: 2  Assign: = [line: 10]
!   !   !   !   Child: 0  Id: z [line: 10]
!   !   !   !   Child: 1  Id: y [line: 10]
!   !   Sibling: 0  While [line: 12]
!   !   !   Child: 0  Op: > [line: 12]
!   !   !   !   Child: 0  Const: 666 [line: 12]
!   !   !   !   Child: 1  Const: 665 [line: 12]
!   !   !   Child: 1  Break [line: 12]
!   !   Sibling: 1  Assign: = [line: 14]
!   !   !   Child: 0  Op: [ [line: 14]
!   !   !   !   Child: 0  Id: b [line: 14]
!   !   !   !   Child: 1  Const: 41 [line: 14]
!   !   !   Child: 1  Op: + [line: 14]
!   !   !   !   Child: 0  Op: + [line: 14]
!   !   !   !   !   Child: 0  Call: max [line: 14]
!   !   !   !   !   !   Child: 0  Const: 42 [line: 14]
!   !   !   !   !   !   Sibling: 0  Const: 43 [line: 14]
!   !   !   !   !   Child: 1  Op: * [line: 14]
!   !   !   !   !   !   Child: 0  Const: 44 [line: 14]
!   !   !   !   !   !   Child: 1  Const: 55 [line: 14]
!   !   !   !   Child: 1  Op: [ [line: 14]
!   !   !   !   !   Child: 0  Id: b [line: 14]
!   !   !   !   !   Child: 1  Const: 99 [line: 14]
!   !   Sibling: 2  Return [line: 16]

The tree as a graphic:

In the cases where there is an absent optional expression or statement the corresponding child pointer is set to NULL (i.e. 0). For example compound statements might not have any declarations so child[0] pointer would be set to NULL. Return optionally takes an expression. If there isn't an expression then the Child[0] pointer is NULL. The while statement might not have a body: for example while (searching()); in which case child[1] is NULL. The default for unneeded children and siblings is always the NULL pointer. This means your code has to check for NULL pointers and alter its behavior.

HINT: The yacc code in the book is a good example of how to connect the nodes you create. The node create code is a good model for how to create nodes and print a tree. Use your notes from class on how to put the rest of it together. The calculator example we did in class works great for a example of how to organize the interaction between flex and bison parts.

Sizes of arrays, locations in memory, etc will all be handled in latter assignments.

I will only give you legal C- code in this assignment.


Homework will be submitted as an uncompressed tar file that contains no subdirectories. The tar file is submitted to the class submission page. You can submit as many times as you like. The LAST file you submit BEFORE the deadline will be the one graded. Absolutely, no late papers. For all submissions you will receive email at your uidaho address showing how your file performed on the pre-grade tests. The grading program will use more extensive tests, so thoroughly test your program with inputs of your own. Your code should compile and run with runtime errors such as seg faults. If it doesn't, it is considered nearly ungradable.