Note that this assignment needs to be done quickly so we can work on the final code gen phase assignments! I highly recommend you start work right away.

In this assignment you will compute the scope of each variable (varKind) and what the address and size is for each variable and string constant. This assignment probably requires about 30 to 50 lines of carefully thought out code. Do not try to do it at the last minute because it is way too tricky for that.

You will need to make modifications to the semantic analysis section and the tree print routine, and maybe the parser depending on what you have already done. These changes will get your code ready for code generation. The text below combined with what you learned in class (see the battle plan in the whiteboards or all the details below) will tell you what you will need to do to get this to work. To do this you will need to start by allowing for your tree node definition to have components to save the size, type of reference, and offset location. You may already have done this. You will hijack the tree traversal in your semantic analysis section to let you use the SymbolTable and scope following you are doing to help decide where each variable and string constant belongs. See the example output to see what needs to have values stored. You will need two global values in your compiler code: a local frame offset called foffset, and a global frame offset called goffset. These need to be decremented appropriately. foffset will the offset from Register 1 for location of the next variable in the frame of the current function (a parameter or local variable). goffset is similarly designed for the global offset for globally referenced items (a global or static variable or an array constant like a string).

The results will be that the -M option on your compiler will now print the location and scope type for each variable and some extra size information at compound nodes. Example below.

The Parser

If you haven't already, you need to record the size of every declared variable at parse time to be very convenient. For arrays use the array size + 1 as the size since the size is stored with the array in our implementation except when it is a parameter. For all other items use a size of 1. Also, you need to be sure the variable declaration nodes are marked as being static or not.

The Semantic Analysis Changes

  1. You will need to augment the TreeNode struct or class to handle the offset in local variable space or global space for variable nodes and the size of the thing to be allocated. Also you need to be sure your are recording whether it is a local, local static, parameter, or global. How to do this was discussed in class. (Feel free to ask if you want me to go over any or all of it again.)

    If you remember, you will maintain pointers for the place in global memory and the place in local (frame) memory in two offset variables in the semantic section. You will decrement (the stack and frame grow down to lower memory addresses) them as you allocate space for each variable either in the local frame or in the global space. You will reset the local memory pointer foffset when you enter each function. Don't forget to leave room for the return frame pointer and return address at the beginning of each activation record (frame). The result will be that every declaration that requires space will store the offset of that space and the size in the TreeNode associated with allocating the space. e.g. the VarK node. A "reference type" variable in your tree node will help when deciding how to allocate space and how to reference it at code generation time. So the reference type in our node will have one of these values: Local, Static, Parameter, Global. This, will of course, be used to tell you what offset register to use: 0 or 1 at code generation time as explained when we discussed the C- memory architecture. Remember the address of an array points to the first element in the array. "Before" the first element (next higher addressed spot) is the size of the array. So for an array: int x[10] the address of the array is location of x[0]. The size of the array is effectively stored at x[-1] and the total size of the space allocated is 11. Indexing will be done at run time by subtracting the index from the address of x[0].

    Constants don't normally require that space be allocated in data memory for them. They are often just loaded from instruction memory as in the LDC instruction. However, in the case of constant arrays, which currently we have only defined constant char arrays, you will need to allocate space. This is done by allocating them as arrays in the global space. They are simply global char arrays and so their address is the address of the first character with the size stored in the location before the first element.

    The offset and size that is used for variables goes unused for all the rest of the nodes accept for maybe the function declaration and perhaps, in one implementation, the compound nodes in the final assignment. The offset location in function declaration nodes will be used later for storing the starting instruction address of the function so that it can be looked up in the symbol table and we can jump to the various functions. It can be initialized to zero. The function symbol is, of course, global.

  2. The symbol table is already used to point to the TreeNodes so you can now find the offset, size, and reference type of every reference to a variable (ID node) when you arrive at them and look them up in the symbol table. You can now copy that information for use in those referencing nodes.

  3. When you leave the semantic analysis section the symbol table will have been reduced back down to just the global scope. If you freed the symbol table before... don't now... you should pass symbol table of globals along with the end of the global space pointer (goffset) to the code generation section as described below! This will be used for doing such things as initializing global and static variables and setting up the frame stack.

    It is assumed that the foffset pointer will be saved when entering a new scope and restored on exit so the memory can be reclaimed as explained in class. This was called overlaying memory or parallel compound statements. Be sure your code does this. You might start by not doing it and then later add the few lines that makes that work. WARNING: If you don't get this right a lot of your memory locations will be wrong.

The code will be tested by comparing the results of doing c- -M which will print an augmented tree as seen below:

  1 int g;                // loc G 0
  2 char h[10]:"dogs";    // loc G -2
  3 
  4 beachDay(int p, q[])  // loc L -2, -3
  5 {
  6     int a:17;         // loc L -4
  7     bool b[10];       // loc L -6
  8     static int s;     // loc G -12
  9     static int t[10]; // loc G -14
 10     static char u[10]:"corgi";  // loc G -25
 11     int z;            // loc L -16
 12     a;
 13     b;
 14     s;
 15     t;
 16     u;
 17     666;
 18     'x';
 19     "purple";
 20     true;
 21 }
 22 
 23 main() {
 24    int a;           // loc L -2
 25    {
 26    int b, c, d;     // loc L -3, -4, -5
 27    }
 28    {
 29    int e, f;        // loc L -3, -4
 30    }
 31    for a = 1 to 10 do a;   // loc L -3
 32    
 33 }
 34 // next free space in globals is -35

The output of c- -M loctest2.c-

WARNING(13): Variable 'b' may be uninitialized when used here.
WARNING(4): The parameter 'p' seems not to be used.
WARNING(4): The parameter 'q' seems not to be used.
WARNING(11): The variable 'z' seems not to be used.
WARNING(26): The variable 'b' seems not to be used.
WARNING(26): The variable 'c' seems not to be used.
WARNING(26): The variable 'd' seems not to be used.
WARNING(29): The variable 'e' seems not to be used.
WARNING(29): The variable 'f' seems not to be used.
WARNING(24): The variable 'a' seems not to be used.
WARNING(4): The function 'beachDay' seems not to be used.
WARNING(1): The variable 'g' seems not to be used.
WARNING(2): The variable 'h' seems not to be used.
Var: g  type int [mem: Global loc: 0 size: 1] [line: 1]
Sibling: 1  Var: h  array of type char [mem: Global loc: -7 size: 11] [line: 2]
.   Child: 0  Const "dogs" [mem: Global loc: -2 size: 5] of array of type char [line: 2]
Sibling: 2  Func: beachDay returns type void [mem: Global loc: 0 size: -4] [line: 4]
.   Child: 0  Parm: p  type int [mem: Parameter loc: -2 size: 1] [line: 4]
.   Sibling: 1  Parm: q  array of type int [mem: Parameter loc: -3 size: 1] [line: 4]
.   Child: 1  Compound [mem: None loc: 0 size: -17] [line: 5]
.   .   Child: 0  Var: a  type int [mem: Local loc: -4 size: 1] [line: 6]
.   .   .   Child: 0  Const 17 of type int [line: 6]
.   .   Sibling: 1  Var: b  array of type bool [mem: Local loc: -6 size: 11] [line: 7]
.   .   Sibling: 2  Var: s  static type int [mem: LocalStatic loc: -17 size: 1] [line: 8]
.   .   Sibling: 3  Var: t  static array of type int [mem: LocalStatic loc: -19 size: 11] [line: 9]
.   .   Sibling: 4  Var: u  static array of type char [mem: LocalStatic loc: -36 size: 11] [line: 10]
.   .   .   Child: 0  Const "corgi" [mem: Global loc: -30 size: 6] of array of type char [line: 10]
.   .   Sibling: 5  Var: z  type int [mem: Local loc: -16 size: 1] [line: 11]
.   .   Child: 1  Id: a [mem: Local loc: -4 size: 1] of type int [line: 12]
.   .   Sibling: 1  Id: b [mem: Local loc: -6 size: 11] of array of type bool [line: 13]
.   .   Sibling: 2  Id: s [mem: LocalStatic loc: -17 size: 1] of static type int [line: 14]
.   .   Sibling: 3  Id: t [mem: LocalStatic loc: -19 size: 11] of static array of type int [line: 15]
.   .   Sibling: 4  Id: u [mem: LocalStatic loc: -36 size: 11] of static array of type char [line: 16]
.   .   Sibling: 5  Const 666 of type int [line: 17]
.   .   Sibling: 6  Const 'x' of type char [line: 18]
.   .   Sibling: 7  Const "purple" [mem: Global loc: -47 size: 7] of array of type char [line: 19]
.   .   Sibling: 8  Const true of type bool [line: 20]
Sibling: 3  Func: main returns type void [mem: Global loc: 0 size: -2] [line: 23]
.   Child: 1  Compound [mem: None loc: 0 size: -3] [line: 23]
.   .   Child: 0  Var: a  type int [mem: Local loc: -2 size: 1] [line: 24]
.   .   Child: 1  Compound [mem: None loc: 0 size: -6] [line: 25]
.   .   .   Child: 0  Var: b  type int [mem: Local loc: -3 size: 1] [line: 26]
.   .   .   Sibling: 1  Var: c  type int [mem: Local loc: -4 size: 1] [line: 26]
.   .   .   Sibling: 2  Var: d  type int [mem: Local loc: -5 size: 1] [line: 26]
.   .   Sibling: 1  Compound [mem: None loc: 0 size: -5] [line: 28]
.   .   .   Child: 0  Var: e  type int [mem: Local loc: -3 size: 1] [line: 29]
.   .   .   Sibling: 1  Var: f  type int [mem: Local loc: -4 size: 1] [line: 29]
.   .   Sibling: 2  For [mem: None loc: 0 size: -4] [line: 31]
.   .   .   Child: 0  Var: a  type int [mem: Local loc: -3 size: 1] [line: 31]
.   .   .   Child: 1  Range [line: 31]
.   .   .   .   Child: 0  Const 1 of type int [line: 31]
.   .   .   .   Child: 1  Const 10 of type int [line: 31]
.   .   .   Child: 2  Id: a [mem: Local loc: -3 size: 1] of type int [line: 31]
Offset for end of global space: -53
Number of warnings: 13
Number of errors: 0