The Task

Write code to perform k-means analysis. The first input is the number of expected groups called k. The second input number is the number of times the center points of each cluster or group is computed. This is followed by a matrix of input points, one per row in standard input format of: first the number of rows, then the number of columns then the data. Note that the input points may be in high dimensional space, not just two dimensions.

Your program should then run the k-means algorithm to determine the k center points that are the centers of each group of input points. It should begin by surveying the input and initializing k random points such that each of the values in a given dimension lie between the min and max of the input points in that dimension. Part of the algorithm is to collect all the input points that are considered closest to point into a group and average the group to find the a new location for that center point. (See the algorithm.) It is possible that this process will find that for some center point the process finds no points in its group. In that case, reinitialize that center point randomly as it did at the beginning.

Run the regroup/average part of the algorithm the number of times specified to try to converge toward the center points for the clusters.. Print the center points at the beginning and after each averaging to create new center points. Try to match the output for form but the order of the points is not important.

Submission

Tar up the kmeans code with a makefile to build the program kmeans that reads from stdin as described above. Remember it has to run on the test machine which is like wormulon. Homework will be submitted as an uncompressed tar file to the homework submission page linked from the class web page. You can submit as many times as you like. The LAST file you submit BEFORE the deadline will be the one graded. For all submissions you will receive email at your uidaho.edu mail address giving you some automated feedback on the unpacking and compiling and running of code and possibly some other things that can be autotested.

Have fun.