In: Computer Science
Task:
Write a program that implements several sorting algorithms and use it to demonstrate the comparative performance of the algorithms for a variety of datasets.
Background
The skeleton program sorting.cpp contains a main function for testing the operation of several sort algorithms over various data sizes and dataset organisations.
The program understands the following arguments:
-i insertion sort
-s selection sort (default)
-q quicksort
-a (already) sorted dataset
-v reverse-sorted dataset
-r random dataset (default)
-n no sorting
x generate a dataset of size x (default 10000)
Level 1: Basic sorts
Implement the selectionSort and insertionSort functions. Note that you can base your code on the sample code used in lectures, although you will need to modify it from passing the data using an array and two indexes to passing it using two pointers. The program will check that the final list is sorted correctly
Level 2: Quicksort
Implement the quickSort function. The real work of quicksort happens in the partition operation, so it is probably best to define a separate function to do the partitioning
Level 3: Profiling
Gather data on the number of times that each algorithm compares and swaps items. A simple strategy is to use variables to count the comparisons and swaps. The skeleton code declares comparisons and swaps variables for just this purpose. To make use of them, modify your code to increment each variable where appropriate and then uncomment the output statements to display their final values.
When you have profiling working, run the program and record the number of compares and swaps for each combination of algorithm (-i, -s, and -q) and dataset organisation (-a, -v, and -r). Use dataset sizes of 1000, 2000, 5000, 10,000, 50000, and 100000. In the worst case, execution should not take much longer than 30 seconds.
For each run, compute a normalised execution count by dividing the actual count by the dataset size. For example, if quicksort with a random dataset containing 10000 elements performed 161619 comparisons, the normalised count would be 161619/10000 or about 16.2.
Finally, plot the comparison results (for the options -ia, -iv, -ir, -sa, -sv, -sr, -qa, -qv, and -qr) as separate lines on a single set of axes with dataset size on the horizontal axis and execution count on the vertical axis. Repeat for swaps. Choose scales so that the full range of values can be displayed and label the lines so you know which one is which. You may find that using a log-log plot shows the values over the full range better
Level 4: Improving quicksort
A simple implementation of quicksort is likely to perform badly for certain datasets. For example, the implementation used as an example in lectures performs very poorly if the dataset is in reverse-sorted order.
The poor performance can be avoided by choosing a better pivot for the partitioning step. Common strategies include selecting a random element or using the median of the first, middle, and last elements. In any case, the chosen pivot is swapped with the last element before proceeding to do the partition as before.
Read about and implement one of the improvements to quicksort and show (by adding its profile data to the graphs of Level 3) how much improvement you have achieved. Include a brief description of the method you implemented to improve quicksort’s performance.
#include <cstdlib> #include <iostream> #include <getopt.h>
using namespace std; long compares; // for counting compares long swaps; // for counting swaps bool counted_less(int n1, int n2) { ++compares; return n1 < n2; } void counted_swap(int &n1, int &n2) { ++swaps; int t = n1; n1 = n2; n2 = t; } void selectionSort(int *start, int *stop) { // add code here } void insertionSort(int *start, int *stop) { // add code here } void quickSort(int *start, int *stop) { // add code here } void merge(int *start, int *mid, int *stop) { int temp[stop - start]; int *i = start; int *j = mid; int *k = temp; while (i < mid || j < stop) { if (i == mid) { counted_swap(*k++, *j++); } else if (j == stop) { counted_swap(*k++, *i++); } else if (counted_less(*j, *i)) { counted_swap(*k++, *j++); } else { counted_swap(*k++, *i++); // bias to left for stability } } for (i = start, k = temp; i != stop; ++i, ++k) { counted_swap(*i, *k); } } void mergeSort(int *start, int *stop) { if (stop - start <= 1) return; int *mid = start + (stop - start) / 2; mergeSort(start, mid); mergeSort(mid, stop); merge(start, mid, stop); } int main(int argc, char **argv) { string algorithm = "selection sort"; string dataset = "random"; int size = 1000; for (int c; (c = getopt(argc, argv, "ravmqsi")) != -1;) { switch (c) { case 'r': dataset = "random"; break; case 'a': dataset = "already sorted"; break; case 'v': dataset = "reversed"; break; case 'm': algorithm = "merge sort"; break; case 'q': algorithm = "quicksort"; break; case 's': algorithm = "selection sort"; break; case 'i': algorithm = "insertion sort"; break; } } argc -= optind; argv += optind; if (argc > 0) size = atoi(argv[0]); int *data = new int[size]; if (dataset == "already sorted") { for (int i = 0; i < size; ++i) data[i] = i; } else if (dataset == "reversed") { for (int i = 0; i < size; ++i) data[i] = size - i - 1; } else if (dataset == "random") { for (int i = 0; i < size; ++i) data[i] = rand() % size; } compares = 0; swaps = 0; if (algorithm == "merge sort") { mergeSort(data, data + size); } else if (algorithm == "quicksort") { quickSort(data, data + size); } else if (algorithm == "selection sort") { selectionSort(data, data + size); } else if (algorithm == "insertion sort") { insertionSort(data, data + size); } for (int i = 1; i < size; ++i) { if (data[i] < data[i - 1]) { cout << "Oops!" << endl; exit(1); } } cout << "Algorithm: " << algorithm << endl; cout << "Dataset: " << dataset << endl; cout << "Size: " << size << endl; // uncomment for checkpoints 3 and 4 // cout << "Compares: " << compares << endl; // cout << "Swaps: " << swaps << endl; }