CS212 - Assignment 4

Because the output of your program will first be preprocessed by an automatic comparison program before being examined by a human being: please follow formatting instructions/examples very carefully. The results your program produces will need to look exactly like the target! Do not embellish with extra titles, blanks, tabs, decimals after integers, or other text. No extras such as "run complete" or "CS212 output" or even an extra space. The formats must be identical. I will take off points for this. (This is realistic. Most companies run test suites on their products and breaking the test suites is not looked upon kindly in industry and often the output of one program can be the input to another.) The testing facility of the submit script will help you get this annoying detail right. I hope to make it easy to get it right. Thanks for your patience.

IMPORTANT: be sure the first line in your python file is:

#!/usr/bin/env python3

The Problem

The assignment is to write a command line tool called joincol.py. This tool is really useful for blending together to files of data in which the data is stored in columns and the blending is controlled by using a matching column. For example blending a file of: name and heath statistics a file of name and athletic performance.

The tool takes a list of filenames and the "matching column" numbers in each file for example:

joincol.py english.txt 1 spanish.txt 2

Each line in each file is broken into columns using the split() command. NOTE: The first column is column 1 and not 0 so be sure to add or subtract 1 in the right places. The files are combined on the columns mentioned. For example: suppose the file english.txt above was:

1 one
2 two
3 three
4 four
5 five

and the file spanish.txt was:

cinco 5 v
uno 1 i
dos 2 ii
tres 3 iii
quatro 4 iv
sies 6 vi

Column 1 in english.txt is the "matching column" in the english.txt file and 2 in the spanish.txt file is the matching column in the spanish.txt file. The output would match up the matching columns in each file and print: first the match value, then the remainder of the columns in english not including the matching column and followed by the remaining columns in spanish.txt not including the match column. The result for joincol.py english 1 spanish.txt 2 would be:

1 one uno i
3 three tres iii
4 four quatro iv
6 - sies vi
2 two dos ii
5 five cinco v

Notice the first column is the matching column and it appears only once. The order is just dependent on the order in which the elements are given in when a for statement is used and not reliable so I will sort the output before comparison. There is only one blank between each column.

Hints

I am not looking to use the argument parser on this assignment because it isn't a great example. If I were building the tool for my own use, I would use the argument parser to get a good usage message.
the line: args = sys.argv[1:] might be helpful.
I made two dictionaries. One indexed by the file names giving the "matching column" number for each file. Another also index by the file names giving a dictionary for each of the remainders of the lines not including the matching column and indexed by the "matching column" contents. Yes, this is a dictionary of dictionaries! That is the value of the key/value pair is a dictionary of lines in the file whose name is the key. This means from the file name I can get a dictionary and in that dictionary I can look up the line in that file by the text in the matching column. Pretty slick.
Just print out the matching column contents and the line from each file that has that in its matching column. If there is nothing in that file that is equal to the matching column contents in that file then print a dash. The contents of each line for each file is separated from the other lines by a single blank.
Making a set of matching column values would be super useful!!! You can then use a for statement on that.

Submission

Homework will be submitted as an uncompressed tar file that contains no subdirectories. The tar file is submitted to the class submission page. You can submit as many times as you like. The LAST file you submit BEFORE the deadline will be the one graded. Absolutely, no late papers. For all submissions you will receive email at your uidaho address showing how your file performed on the pre-grade tests. The grading program will use more extensive tests, so thoroughly test your program with inputs of your own. Your code should compile and run with runtime errors such as seg faults. If it doesn't it is considered nearly ungradable.

If you have tests you really think are important or just cool please send them to me and I will consider adding them to the test suite.