The Task

First develop the measure for the quality of clustering given on page 130 in Linear Discriminant Analysis. It should be the value SB/SW where SW is the with-in cluster distance measure given and the SB is the between cluster measure.

Now modify your k-means program to loop over the k-means algorithm from assignment 4. Each time the k-means algorithm is run compute the cluster quality. Each time the cluster quality improves remember the quality and the cluster points.

When the cluster quality has not changed for 10 runs, stop and report the best set of cluster points and the quality.


Tar up the kmeans code with a makefile to build the program kmeans that reads from stdin as described above. Remember it has to run on the test machine which is like wormulon. Homework will be submitted as an uncompressed tar file to the homework submission page linked from the class web page. You can submit as many times as you like. The LAST file you submit BEFORE the deadline will be the one graded. For all submissions you will receive email at your mail address giving you some automated feedback on the unpacking and compiling and running of code and possibly some other things that can be autotested.

Have fun.