The Problem

This homework consists of these tasks:

  • Tweak-in some improvements to the interface.
  • Use Flex and Bison to build a parser for the C- language.
  • While doing the parsing your program will construct the abstract syntax tree (AST) using a TreeNode data type. This is not the full annotated syntax tree yet, but just the initial abstract syntax tree.
  • You will write the procedure printTree which will print out the tree exactly as I have in the examples I give below and in the test data. Almost the same format is not good enough. The more I have to work to show your output is the same the fewer points you get. You need to fill the tree nodes with enough information to get the tree printed correctly.

    Your main program in your .y file will contain:

    yyparse();         // parse and build the tree in the global var syntaxTree.
    if (printTreeFlag) printTree(stdout, syntaxTree);  // printTreeFlag is set by -p option
    

    The first line calls the parser which will store the tree in the global called syntaxTree which is defined:

    TreeNode *syntaxTree;
    
Do not be fooled. This is a nontrivial homework. Do not put off this assignment. It is complicated and detailed.

Improving the interface

When done with this assignment you will have created code that will recognize legal C- programs and generate the first pass at the abstract syntax tree.

The parser will be named c- just like last time. It will read and process a stream of tokens from a filename given as an argument to the c- command OR from standard input if the filename argument is not present.

The abstract syntax tree will be printed if the -p is given. So, for example, c- -p sort.c- will print the abstract syntax tree (AST) for the assignment and c- sort.c- will not print the AST. The -p option which will set the printTreeFlag mentioned above. This is the first of several options we will be adding to the compiler interface.

I recommend using the getopt routine to handle UNIX command line arguments in a uniform and standard way and make your life easier.

Be sure to use ourgetopt. It is more resilient than some. And be sure to include the getopt code in your tar! Hint: you can raid the example for the code you need.

Your program will now also take (using ourgetopt) the -d option as an argument. The -d option turns on the yydebug flag by setting it to 1. For example: c- -d sort.c- should run the c- compiler on the program sort.c- and give details of the parsing that is going on. While c- sort.c- should simply run the c- compiler. Hint: be sure to declare yydebug to be "extern int".

The Parser

For the parsing part of the assignment replace your assignment 1 Bison grammar with the grammar to parse C- code. A good approach is to initially forget about the syntax tree part of the assignment. If you get the right grammar into your compiler it will successfully parse any legal C- program. A program that simply recognizes whether a program is legal or not is called a recognizer. If you build your Bison grammar directly from the grammar supplied, you will find that you have the dangling else problem as indicated by a shift/reduce error when you do the bison compile. There are several ways to fix this problem. I have discussed one in class. See the section on dangling else in the grammar tutorial or in the handout on the dangling else problem.

To be successful, you need to get a clean compile of your grammar in Bison with no shift-reduce or reduce-reduce errors!! The errors indicate that your grammar is ambiguous. Bison will make a decision for you and go ahead and generate code making the selection of its choice e.g. reduce before shift. But you must decide for yourself and fix the grammar to be unambiguous.

Coding restriction: Do not attempt to fix dangling else or any other grammar issue with associativity declarations such as %left, %right, or %assoc. Do not fix any other problem with your grammar by using the %expect feature of Bison. This causes Bison to ignore some number of parsing errors and me to deduct points from your assignment. Really, you have the knowledge now to do this with out this "feature". IMPORTANT: I expect your parser to compile without any parser errors.

Now that your recognizer is working, let's look at the syntax tree I want you to produce. As we discussed in class, the tree is kind of like a distilled portion of the parse tree containing the parts we are interested in. Here is a sample TreeNode that I used based on the book. You can use what you want. I will be giving examples in class based on the one in the book.:

typedef struct treeNode
{
    // connectivity in the tree
    struct treeNode *child[MAXCHILDREN];   // children of the node
    struct treeNode *sibling;              // siblings for the node

    // what kind of node
    int lineno;                            // linenum relevant to this node
    NodeKind nodekind;                     // type of this node
    union                                  // subtype of type
    {
	DeclKind decl;                     // used when DeclK
	StmtKind stmt;                     // used when StmtK
	ExpKind exp;                       // used when ExpK
    } subkind;
    
    // extra properties about the node depending on type of the node
    union                                  // relevant data to type -> attr
    {
        OpKind op;                         // type of token (same as in bison)
	int value;                         // used when an integer constant or boolean
        unsigned char cvalue               // used when a character
	char *string;                      // used when a string constant
	char *name;                        // used when IdK
    } attr;                                 
    ExpType expType;		           // used when ExpK for type checking
    bool isArray;                          // is this an array
    bool isStatic;                         // is staticly allocated?

    // even more semantic stuff will go here in later assignments.
} TreeNode;

These are special types used in the tree node above. The nodes in the AST are each of a NodeKind and a subkind like in the book. Your tree traversal will use this information.

// the exact type of the token or node involved.  These are divided into
// various "kinds" in the enums that follow

// Kinds of Operators
// these are the token numbers for the operators same as in flex
typedef int OpKind;  

// Kinds of Statements
//typedef enum {DeclK, StmtK, ExpK} NodeKind;
enum NodeKind {DeclK, StmtK, ExpK};

// Subkinds of Declarations
enum DeclKind {VarK, FuncK, ParamK};

// Subkinds of Statements
enum  StmtKind {NullK, IfK, WhileK, ForK, CompoundK, ReturnK, BreakK, RangeK};

// Subkinds of Expressions
enum ExpKind {OpK, ConstantK, IdK, AssignK, InitK, CallK};

// ExpType is used for type checking (Void means no type or value, UndefinedType means undefined)
enum ExpType {Void, Integer, Boolean, Char, CharInt, Equal, UndefinedType};

// What kind of scoping is used?  (decided during typing)
enum VarKind {None, Local, Global, Parameter, LocalStatic};

This design is stolen straight from the book. This way you can use the one in the Tiny compiler in the book as an example to work from. Ours has to have extra features and node types. We will discuss this in detail in class. Feel free to use a C++ class rather than a struct in this case if you like. Or feel free to use the modern struct rather than using typedefs. I do. You can actually use any data structure you like as long as the output matches mine as long as it your own design and it builds a tree of the same shape as mine since success will be measured by creating the same tree. You will need to traverse the tree to do all the remaining assignments.

To encode the program as a tree you need to make the right nodes at the right steps in the parsing. When you need to make a node you will use routines you write similar to the newStmtNode function in util.c for the Tiny language in the book. These nodes will be passed up the tree as pointers and assembled to form a tree as in the tiny example in the book. The code I will suggest below is a bit simpler to use than the book and easier to read. This means that many of the nonterminals will be of type pointer to a node! Coding restriction: Do not use YYSTYPE as used in the book. This subverts the type checking features that are there to help you. I will take off for using it. USE THE UNION Luke! Here is an example of what could be in your union. It is up to you what you want to put in your union. This is influenced by the types you want for your nonterminals. Mine is a bit simpler for example.:

%union { 
    ExpType type;            // for passing types.  typespec to pass up a type in decl like int or bool
    TokenData *tokenData;    // for terminals.  token data comes from yylex() in the $ variables
    TreeNode *tree;          // for nonterminals.   these tree nodes as you build the tree
}

These are prototypes for constructing the different kinds of treenode. By different kinds I mean different sets of default values in the nodes for convenience. Note that this is how children are added to a node in almost all cases (see the oddity of the initialization of a var). Note the use of default values for the children make this easy and the action code in the Bison simple.

TreeNode *newDeclNode(DeclKind kind,
                      ExpType type,
                      TokenData *token=NULL,
                      TreeNode *c0=NULL,
                      TreeNode *c1=NULL,
                      TreeNode *c2=NULL);  // save TokenData block!!
TreeNode *newStmtNode(StmtKind kind,
                      TokenData *token,
                      TreeNode *c0=NULL,
                      TreeNode *c1=NULL,
                      TreeNode *c2=NULL);
TreeNode *newExpNode(ExpKind kind,
                     TokenData *token,
                     TreeNode *c0=NULL,
                     TreeNode *c1=NULL,
                     TreeNode *c2=NULL);

The Parser and printTree

So what should the tree look like for a given program? This is essentially described the the Bison code. I think the best way to describe this is in class and by example.

To understand the examples you must understand the output format of the printTree function. The printTree function prints the the important information contained in the node. It then applies the printTree function to all the nonnull children and prints them out numbered starting with child[0] being numbered "Child 0" etc. It then it applies printTree to the sibling pointer if it is nonnull. The first sibling found is numbered 1. Reading the syntax tree printed for sample input programs shows you what to do in each case. I will supply you with a crazy number of examples. For example, given this program:

 
// C-S21
int main()
{
    int x;
    int y[5];
    bool b;

    b = true or false and 11 + 22 * 33 / 44 > 55;

    if 666>777 then {
        x = 111;
    }
    else y[3] = x+222;

    while x>999 do {
        x = 333;
        if x<9 then break;
        x = 444;
        break;
   }

   for z = 1 to 8 do if z>x then x = z;
}

you should get the following output from your c-. This is the output I will compare against in the tests!

Func: main returns type int [line: 2]
.   Child: 1  Compound [line: 3]
.   .   Child: 0  Var: x of type int [line: 4]
.   .   Sibling: 1  Var: y is array of type int [line: 5]
.   .   Sibling: 2  Var: b of type bool [line: 6]
.   .   Child: 1  Assign: = [line: 8]
.   .   .   Child: 0  Id: b [line: 8]
.   .   .   Child: 1  Op: OR [line: 8]
.   .   .   .   Child: 0  Const of type bool: true [line: 8]
.   .   .   .   Child: 1  Op: AND [line: 8]
.   .   .   .   .   Child: 0  Const of type bool: false [line: 8]
.   .   .   .   .   Child: 1  Op: > [line: 8]
.   .   .   .   .   .   Child: 0  Op: + [line: 8]
.   .   .   .   .   .   .   Child: 0  Const of type int: 11 [line: 8]
.   .   .   .   .   .   .   Child: 1  Op: / [line: 8]
.   .   .   .   .   .   .   .   Child: 0  Op: * [line: 8]
.   .   .   .   .   .   .   .   .   Child: 0  Const of type int: 22 [line: 8]
.   .   .   .   .   .   .   .   .   Child: 1  Const of type int: 33 [line: 8]
.   .   .   .   .   .   .   .   Child: 1  Const of type int: 44 [line: 8]
.   .   .   .   .   .   Child: 1  Const of type int: 55 [line: 8]
.   .   Sibling: 1  If [line: 10]
.   .   .   Child: 0  Op: > [line: 10]
.   .   .   .   Child: 0  Const of type int: 666 [line: 10]
.   .   .   .   Child: 1  Const of type int: 777 [line: 10]
.   .   .   Child: 1  Compound [line: 10]
.   .   .   .   Child: 1  Assign: = [line: 11]
.   .   .   .   .   Child: 0  Id: x [line: 11]
.   .   .   .   .   Child: 1  Const of type int: 111 [line: 11]
.   .   .   Child: 2  Assign: = [line: 13]
.   .   .   .   Child: 0  Op: [ [line: 13]
.   .   .   .   .   Child: 0  Id: y [line: 13]
.   .   .   .   .   Child: 1  Const of type int: 3 [line: 13]
.   .   .   .   Child: 1  Op: + [line: 13]
.   .   .   .   .   Child: 0  Id: x [line: 13]
.   .   .   .   .   Child: 1  Const of type int: 222 [line: 13]
.   .   Sibling: 2  While [line: 15]
.   .   .   Child: 0  Op: > [line: 15]
.   .   .   .   Child: 0  Id: x [line: 15]
.   .   .   .   Child: 1  Const of type int: 999 [line: 15]
.   .   .   Child: 1  Compound [line: 15]
.   .   .   .   Child: 1  Assign: = [line: 16]
.   .   .   .   .   Child: 0  Id: x [line: 16]
.   .   .   .   .   Child: 1  Const of type int: 333 [line: 16]
.   .   .   .   Sibling: 1  If [line: 17]
.   .   .   .   .   Child: 0  Op: < [line: 17]
.   .   .   .   .   .   Child: 0  Id: x [line: 17]
.   .   .   .   .   .   Child: 1  Const of type int: 9 [line: 17]
.   .   .   .   .   Child: 1  Break [line: 17]
.   .   .   .   Sibling: 2  Assign: = [line: 18]
.   .   .   .   .   Child: 0  Id: x [line: 18]
.   .   .   .   .   Child: 1  Const of type int: 444 [line: 18]
.   .   .   .   Sibling: 3  Break [line: 19]
.   .   Sibling: 3  For [line: 22]
.   .   .   Child: 0  Var: z of type int [line: 22]
.   .   .   Child: 1  Range [line: 22]
.   .   .   .   Child: 0  Const of type int: 1 [line: 22]
.   .   .   .   Child: 1  Const of type int: 8 [line: 22]
.   .   .   Child: 2  If [line: 22]
.   .   .   .   Child: 0  Op: > [line: 22]
.   .   .   .   .   Child: 0  Id: z [line: 22]
.   .   .   .   .   Child: 1  Id: x [line: 22]
.   .   .   .   Child: 1  Assign: = [line: 22]
.   .   .   .   .   Child: 0  Id: x [line: 22]
.   .   .   .   .   Child: 1  Id: z [line: 22]

In the cases where there is an absent optional expression or statement the corresponding child pointer is set to NULL. For example compound statements might not have any declarations so child[0] pointer would be set to NULL. Return optionally takes an expression. If there isn't an expression then the Child[0] pointer is NULL. The while statement might not have a body: for example while (searching()); in which case child[1] is NULL. The default for unneeded children and siblings is always the NULL pointer. This means your code has to check for NULL pointers and alter its behavior. Use defensive programming and you will save time! Never use a pointer without checking it is non-null. (You can use safe pointer objects if you like. Maybe overkill?)

HINT: The yacc code in the book is a good example of how to connect the nodes you create. The node create code is a good model for how to create nodes and print a tree. Use your notes from class on how to put the rest of it together. The calculator example we did in class works great for a example of how to organize the interaction between flex and bison parts.

IMPORTANT: For this assignment, I will only give you legal C- code. Error handling is coming later.

Here are two useful pieces of code for connecting siblings and passing type information down a sibling list. This also shows the use of some of the types.

// add a TreeNode to a list of siblings.
// Adding a NULL to the list is probably a programming error!
TreeNode *addSibling(TreeNode *t, TreeNode *s)
{
    if (s==NULL) {
        printf("ERROR(SYSTEM): never add a NULL to a sibling list.\n");
        exit(1);
    }
    if (t!=NULL) { 
        TreeNode *tmp;

        tmp = t;
        while (tmp->sibling!=NULL) tmp = tmp->sibling;
        tmp->sibling = s; 
        return t;
    }
    return s;
}


// pass the static and type attribute down the sibling list
void setType(TreeNode *t, ExpType type, bool isStatic)
{
    while (t) {
        t->expType = type;
        t->isStatic = isStatic;

        t = t->sibling;
    }
}

Some small miscellany

A couple of little extra things to be done for this assignment.
  1. Now that we have a parser running we can tell the difference between a unary minus and a binary minus. In the parser rename the unary minus to a CHSIGN token and rename the unary asterisk to the SIZEOF token.

  2. I have simplified the C- grammar in one small place. I removed the case where iterRange is allowed to be simply a simpleExp. See the new grammar sheet. This solves some annoying things and will save us time in the long run.

Submission

Homework will be submitted as an uncompressed tar file that contains no subdirectories. The tar file is submitted to the class submission page. You can submit as many times as you like. The LAST file you submit BEFORE the deadline will be the one graded. Absolutely, no late papers. For all submissions you will receive email at your uidaho address showing how your file performed on the pre-grade tests. The grading program will use more extensive tests, so thoroughly test your program with inputs of your own. Your code should compile and run with runtime errors such as seg faults. If it doesn't, it is considered nearly ungradable.