In: Computer Science
For this assignment, write a parallel program in C++ using OpenMP for vector addition. Assume A, B, C are three vectors of equal length. The program will add the corresponding elements of vectors A and B and will store the sum in the corresponding elements in vector C (in other words C[i] = A[i] + B[i]). Every thread should execute an approximately equal number of loop iterations.
As an input vector A, initialize its size to 10,000 and elements from 1 to 10,000.
So, A[0] = 1, A[1] = 2, A[2] = 3, … , A[9999] = 10000.
Input vector B will be initialized to the same size with opposite inputs.
So, B[0] = 10000, B[1] = 9999, B[2] = 9998, … , B[9999] = 1
Using the above input vectors A and B, create output Vector C which will be computed as
C[ i ] = A[ i ] + B[ i ];
You should check whether your output vector value is 10001 in every C[ i ].
First, start with 2 threads (each thread adding 5,000 vectors), and then do with 4, and 8 threads. Remember sometimes your vector size can not be divided equally by the number of threads. You need to slightly modify the pseudo code to handle the situation accordingly. (Hint: If you have p threads, first (p - 1) threads should have the same input size and the last thread will take care of whatever the remainder portion.) Check the running time from each experiment and compare the result. Report your findings from this project in a separate paragraph.
Your output should show a team of treads does evenly distributed work, but a big vector size might cause an issue in output. You can create mini version of original vector in much smaller size of 100 (A[0] = 1, A[1] = 2, A[2] = 3, … , A[99] = 100) and run with 6 threads once and take a snapshot of your output. And run with original size with 2, 4, and 8 threads to compare running times.
Pseudocode for Assignment
mystart = myid*n/p; // starting index for the individual thread
myend = mystart+n/p; // ending index for the individual thread
for (i = mystart; i < myend; i++) // each thread computes local sum
do vector addition // and later all local sums combined
#include <stdlib.h> //malloc and free
#include <stdio.h> //printf
#include <omp.h> //OpenMP
// Very small values for this simple illustrative example
#define ARRAY_SIZE 8 //Size of arrays whose elements will be added
together.
#define NUM_THREADS 4 //Number of threads to use for vector
addition.
/*
* Classic vector addition using openMP default data
decomposition.
*
* Compile using gcc like this:
* gcc -o va-omp-simple VA-OMP-simple.c
-fopenmp
*
* Execute:
* ./va-omp-simple
*/
int main (int argc, char *argv[])
{
// elements of arrays a and b will be added
// and placed in array c
int * a;
int * b;
int * c;
int n = ARRAY_SIZE; // number of array elements
int n_per_thread; // elements per thread
int total_threads = NUM_THREADS; // number of threads
to use
int i; // loop index
// allocate spce for the arrays
a = (int *) malloc(sizeof(int)*n);
b = (int *) malloc(sizeof(int)*n);
c = (int *) malloc(sizeof(int)*n);
// initialize arrays a and b with consecutive integer
values
// as a simple example
for(i=0; i<n; i++) {
a[i] = i;
}
for(i=0; i<n; i++) {
b[i] = i;
}
// Additional work to set the number of threads.
// We hard-code to 4 for illustration purposes
only.
omp_set_num_threads(total_threads);
// determine how many elements each process will work
on
n_per_thread = n/total_threads;
// Compute the vector addition
// Here is where the 4 threads are specifically
'forked' to
// execute in parallel. This is directed by the pragma
and
// thread forking is compiled into the resulting
exacutable.
// Here we use a 'static schedule' so each thread
works on
// a 2-element chunk of the original 8-element
arrays.
#pragma omp parallel for shared(a, b, c) private(i)
schedule(static, n_per_thread)
for(i=0; i<n; i++) {
c[i] = a[i]+b[i];
// Which thread am I? Show who
works on what for this samll example
printf("Thread %d works on
element%d\n", omp_get_thread_num(), i);
}
// Check for correctness (only plausible for small
vector size)
// A test we would eventually leave out
printf("i\ta[i]\t+\tb[i]\t=\tc[i]\n");
for(i=0; i<n; i++) {
printf("%d\t%d\t\t%d\t\t%d\n", i,
a[i], b[i], c[i]);
}
// clean up memory
free(a); free(b); free(c);
return 0;
}