Because the output of your program will first be preprocessed by an automatic comparison program before being examined by a human being. Please follow formatting instructions/examples very carefully. The results your program produces will need to look exactly like the target. Do not embellish with extra titles or other text such as "run complete" or "CS445 output" or even an extra space. I will take off points for this. (This is realistic. Most companies run test suites on their products and breaking the test suites is not looked upon well in industry.) The testing facility of the submit script will help you get this annoying detail right. Thanks for your patience.

The Problem

Use a combination of Flex and Bison code as instructed in class to build and test a scanner for the C- language defined on Jan 19 or later. The scanner will be named c- (note the lowercase. That is, c- will ultimately be the compiler for the language C-.). It will read and process a stream of characters representing tokens from a file. The filename may be given as an argument to the c- command OR the input can come from standard input if the filename argument is not present. This means the call to c- on the C- code in file filename.c-may be given as:

c- {filename.c-}

or

 
cat filename.c- | c- 

or

c- < filename.c-

To get this to work requires that you be able to optionally read a file off the command line. Do this in the main function. You will need to define arguments to main (as you remember from early classes):

int main(int argc, char *argv[])

and use the variable yyin. In order to access the variable yyin, it needs to be declared extern in the bison code where main is:

extern FILE *yyin;

Using fopen() you can get a FILE * for the commandline file. This is the same type as the familiar stdin and stdout.

Your program will produce a stream of tokens and its output is as described below. Pretesting your answer well in advance of the due date will help assure compliance.

Your program will be constructed using both flex and Bison to run on the department linux machines. The machine cs-445.cs.uidaho.edu is available for class use using your UI credentials. IMPORTANT: That is where the grading will occur and it must compile and run there!

The Flex Part

Build a Flex scanner that returns a token class for each token in the C- grammar. For numbers it should also "return" a numerical value and the string the user typed. For ids it should also return a string. It should treat the Booleans "true" and "false" as keywords but internally treat them as a BOOLCONST with values 0 and 1 for false and true respectively. The scanner should ignore comments and whitespace and not return.

Your compiler will generate an error if there is an illegal character in the input. That occurs when none of the token patterns match the the current location in the input stream. See the example output for the exact wording of the error message. No token will be returned in the case of this error and scanning will continue.

Your compiler will generate an warning if a character constant is given with more than one character. e.g. 'dogs'. If this happens the first character will be used, the remaining ignored, and a warning issued.

Your compiler will generate an error if a character constant is given with no characters in it, i.e. ''. No token will be returned and the token ignored.

HINT: The scanner should keep track of the line number of each token and return it with the token and a string and and/or numeric representation of the token, if appropriate, in a struct or class instance. This can be done by using yylval to contain a pointer to a struct or class instance you create and want to return.

CODING RESTRICTIONS: DO NOT USE YYSTYPE!!!!!! I will take off points. Really. This is bad programming practice and is done in some of the code in the book. flex/Bison provides %union to make this association possible and that provides type checking so we should use that.

Note that in C-, like C and C++, newline is not an element of the grammar and is merely whitespace. This was not true for the calculator program we reviewed in class.

The Bison Part

Build a Bison parser as instructed in class that accepts any stream of legal tokens from the scanner. You will have to come up with the simple grammar for this. This is a grammar for just a stream of any legal tokens. It is NOT the grammar for C-! The Bison part prints out the line number, the token type, and any extra information returned by the scanner. See example output to see what it should look like. Again, this first program will not recognize C-. It will only recognize C- tokens. One of the goals of this assignment is to get the basic build and communication between flex and Bison up and running. SUPER-HINT: Here is a template for the Bison file.

Test data

Use the test data provided above to decide the exact format for test output.

Note that IDs may not be what you think they are. They are not exactly like in C++.

Character constants may be the null character and so print in a way that can't be seen if you use a %c format. That is OK. I will be checking for an actual null character to be output with the %c. Be sure to use %c for chars for this assignment!

Note the class of any single special character token is printed as the characters itself. The class of any multicharacter token is printed as an all uppercase string.

Further test data will be available as soon as the submit script is available.

Build and Test

You tar will have at least:

  • a file parser.l that contains the flex code
  • a file parser.y that contains the bison code
  • a file scanType.h that contains the declaration of either a struct or class that is used to pass your token information back from the scanner. This file will be included in the right place BOTH the .l and .y files.
  • a makefile (note the all lowercase) that I will execute to build your c-.
Here is an example of what the whole scantype.h file might look like:

#ifndef _SCANTYPE_H_
#define _SCANTYPE_H_
// 
//  SCANNER TOKENDATA
// 
struct TokenData {
    int  tokenclass;        // token class
    int  linenum;           // line where found
    char *tokenstr;         // what string was actually read
    char cvalue;            // any character value
    int  nvalue;            // any numeric value or Boolean value
    char *svalue;           // any string value e.g. an id
};
#endif

I will then run several files containing tokens through your c- and compare the results. I will do multiple runs, running them by both piping data into the file AND by using the filename as an argument. Make sure your code can handle both cases.

Submission

Homework will be submitted as an uncompressed tar file that contains no subdirectories. The tar file is submitted to the class submission page. You can submit as many times as you like. The LAST file you submit BEFORE the deadline will be the one graded. Absolutely, no late papers. For all submissions you will receive email at your uidaho address showing how your file performed on the pre-grade tests. The grading program will use more extensive tests, so thoroughly test your program with inputs of your own. Your code should compile and run with runtime errors such as seg faults. If it doesn't it is considered nearly ungradable.

If you have tests you really think are important or just cool please send them to me and I will consider adding them to the test suite.