In this assignment we will write the simple getcol tool in Python 3. It extracts a subset of columns of data from a file which has data organized in columns. It is a lot like the tool we built in class. But it has more options and features so you can practice parsing and processing options and then using Python 3.

Here is the help message which is gotten by the default -h option:

Usage: getcol [options] {col{:col{:col}}}*

Options:
  -h, --help            show this help message and exit
  -i STR, --insep=STR   input separator
  -o STR, --outsep=STR  output separator
  -l STR, --lastsep=STR
                        last output separator
  -s                    strip each input column of whitespace

  • The input separator string is what divides the columns. If none is specified then the default of whitespace is used. See the split function.
  • The output separator string is what separates columns on output. It is only used between two columns that are printed out. If a column that is to be printed is missing on input then it, of course, is not printed and neither is the separator string (see the test data in the side bar). In Python3 there is the "end=" option on the print function. That is very useful to know about. You saw it briefly in class to prevent a newline from being printed.
  • The last separator string is what is printed after the list of columns. It is the last thing on the line before the newline.
  • If the strip flag is set then each column is stripped of white space front and back. There is a simple string function for this.

As many columns as desired can be specified on the command line in a list of columns. Columns are numbered starting at 1. Duplicate column numbers are allowed! The columns are printed in the order they are specified in the column list.

Furthermore each column specifier in the the column list may have zero, one, or two separating colons with no whitespace between. So the column specifier may be a num or num:num or num:num:num. These mean:

  • num mean that column is to be printed.
  • num1:num2 means starting at column num1 and going to num2. For example: 1:4 means columns 1, 2, 3, 4.
  • num1:num2:num3 means starting at column num1 and going to num2 in steps of num3. For example 1:8:3 means columns 1, 4, 7 and 4:1:-1 means 4, 3, 2, 1.

An example column specification might be: 2 2 10:11 8:12:2 10 which would print columns: 2, 2, 10, 11, 8, 10, 12, 10 Consider implementing this feature with the range function.

Testing

To test your code and better understand the definition of the functions there is a tar/zip in the sidebar that contains test scripts and a makefile for this assignment.

Submission

Homework will be submitted as an uncompressed tar file to the homework submission page linked from the main class page. No makefile is needed for a Python program. Your pogram will be named getcol with no py extension. To invoke Python 3 you put:

#!/usr/bin/env python3        
as the first line of your program. This will let you use the python3 that is installed in local environment. FYI: it is Python 3.4. Have fun.

You can submit as many times as you like. The LAST file you submit BEFORE the deadline will be the one graded. For all submissions you will receive email giving you some automated feedback on the unpacking and compiling and running of code and possibly some other things that can be autotested. I will read the results of the runs and the reports you submit.

Have fun.