Question

In: Computer Science

Create a JAVA program to find the mean and standards deviation of a large set of...

Create a JAVA program to find the mean and standards deviation of a large set of data by breaking it down into smaller sets and use threads to process each smaller data set. The simulation will use only 3 threads, and each data set will be at most 100 integers. Create two classes, namely HadoopSim class for the thread tasks and HadoopDriver class for the main routine.

(Then the main routine must gather up the results of each thread and compute the overall average and standard deviation. This exercise will NOT require the use of the synchronized keyword, as each thread will be working only on its own integer array in memory, i.e., there is no shared data.)

Way calculate mean and standard deviation:

1. Compute mean - add up all of the numbers and divide that sum by the number of numbers (N)

2. Compute standard deviation as follows:

a. Subtract the mean from each number in the set and square the differences;

b. Sum up all of the squares of differences;

c. Divide this sum by the number of numbers minus 1 (N - 1); and,

d. Take the (positive) square root of the result.

Note that the above formula for σ divides the sum of squares by N; this is appropriate when computing the standard deviation of a population. Instead use the formula for the standard deviation of a sample which divides by N-1.

This is the code of HadoopSim class:

The constructor for this class is passed a file name (that contains a smaller set of integers), opens the file for reading (which MAY throw FileNotFoundException - if this happens the constructor should output an error message and exit the application with System.exit()), and reads (the Scanner is recommended to read the file) the contents of the file into an array of ints (with a size of 100, not all of the array may be used), also counting up the number of numbers

import java.util.Scanner;

public class HadoopSim implements Runnable
{
private final int SIZE = 100;
private int [] arrayData = new int [SIZE];
private int count = 0;
private int sum = 0;
private double sumDiffsSq = 0.0;
private String fileName;
private Scanner scan;
private double newMean = 0.0;

//Constructor to help read the file using Scanner
public HadoopSim(String filename)
{
Scanner scan = new Scanner(System.in);
}

//count is set when reading the file
public int getCount()
{
}

//sum is set ina thread by tun() method
public int getSum()
{
}

//call after all threads have been completed
public void setNewMean(double m)
{
}

//method to compute each task;s sum of differneces quared using new mean
public void setSumDiffSqs()
{

}

//returns the sum of differences squared for the data in each task's array
public double getSumDiffSqs()
{
}

}

Following the instantiation of a HadoopSim object, the data file has been read into memory (in the int array), and the number of numbers has been calculated and stored in the count field. This class also has a method, void run(), that loops through the array of ints and computes its sum, setting the result in the sum instance variable. Thus, the run method simply adds up all of the numbers in the array, but this is the labor-intensive part of the work. See the for Java's Thread class.

The HadoopDriver, with a main method, is responsible for:

1) prompting the user for the names of 3 files;

2) instantiating 3 HadoopSim objects;

3) instantiating 3 threads, passing each a different task;

4) starting all 3 threads using the Thread method start() (if the ThreadStateException is thrown, display an error message and call System.exit());

5) waits for the 3 threads to finish their run methods, using the Thread join() method (if the ThreadStateException is thrown, display an error message and call System.exit());

6) adds up the total counts by calling the getCount() method of each task (see below);

7) adds up the total sums by called the getSum() method of each task (see below);

8) computes the overall mean by dividing the results of step 6) by the result of step 7) (be careful here as you are dividing two ints and want a double!); this is needed by each task in order to compute the sum of the differences squared;

9) sets the new overall mean in each task by calling the task's setNewMean(int m) method (see below);

10) gets the sum of the differences squared from each task by calling the task's getSumDiffsSq method (see below); and then

11) computes and displays the overall sample mean and standard deviation as described above.

Sample output shown below

-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- END -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*

In summary, then, once a Hadoop object has been instantiated and run as a thread the count and sum are correctly set for the data in the array of integers. Note that the HadoopSim constructor handles the opening and reading the data into memory (the int array) and setting the count field. Further note that the computation-intensive work is done in the run() method (i.e., in a separate thread). Lastly note that you will be heavily penalized if your solution does not observe this division of labor. The main routine will gather up the results of each thread and produce the overall mean and standard deviation of the larger set (i.e., the union of the 3 smaller sets). This exercise's architecture is consistent with the Hadoop technique: having different threads or computers to do the intensive computation on several smaller data sets, and having the main routine receive the results in order to calculate overall results.

Sample output. Using the following data files

sort1.txt:

100 90 75 80

sort2.txt:

99 92 60 75 70

sort3.txt:

99 92 60 75 70 50

Your output (with a couple of debug statements thrown in) might be:

Enter the name of the first file

sort1.txt

Enter the name of the second file

sort2.txt

Enter the name of the third file

sort3.txt

task1 returns count = 4, sum = 345

task2 returns count = 5, sum = 396

task3 returns count = 6, sum = 446

Driver newMean = 79.133333

Driver sumDiffsSq: 571.337778 1026.822222 1875.573333

Totals sum = 1187, count = 15, sumDiffsSq = 3473.733333

Average = 79.13

Standard Deviation = 15.75

Solutions

Expert Solution

Here is Solution for Above Question :

import java.io.*;
import java.text.DecimalFormat;
import java.util.Scanner;
class Test {
    public static void main (String[] args) throws FileNotFoundException {
        HadoopSim h=new HadoopSim("sort1.txt");
        Thread thread = new Thread(h);
        thread.start();
        int c1=h.getCount();
        int sum1=h.getSum();

        System.out.println("Task1 return count = "+c1 +" and sum = "+sum1);

        HadoopSim h2=new HadoopSim("sort2.txt");
        Thread thread2 = new Thread(h2);
        thread2.start();

        int c2=h2.getCount();
        int sum2=h2.getSum();

        System.out.println("Task2 return count = "+c2 +" and sum = "+sum2);

        HadoopSim h3=new HadoopSim("sort3.txt");
        Thread thread3 = new Thread(h3);
        thread3.start();

        int c3=h3.getCount();
        int sum3=h3.getSum();

        System.out.println("Task3 return count = "+c3 +" and sum = "+sum3);
        double final_sum=sum1+sum2+sum3;
        double final_count=c1+c2+c3;
         h.setNewMean(final_sum/final_count);

         double mean=final_sum/final_count;

        double sd= h.setSumDiffSqs();
        h2.setNewMean(final_sum/final_count);
         double sd2=h2.setSumDiffSqs();
        h3.setNewMean(final_sum/final_count);
        double sd3=  h3.setSumDiffSqs();

         System.out.println("Driver Mean is ::"+mean);
         System.out.println("Driver sumDeffSeq ::"+sd +" "+sd2+"  "+sd3);
         System.out.println("Total Sum ="+ final_sum+" , count = "+final_count+" sumDeffSeq = "+(sd+sd2+sd3));
         double avg=final_sum/final_count;
        DecimalFormat df = new DecimalFormat("#.##");
         System.out.println("Average is = "+df.format(avg));
         System.out.println("Standard Deviation is = "+ Math.sqrt((sd+sd2+sd3)/final_count));


    }

}



class HadoopSim implements Runnable
{
    private final int SIZE = 100;
    private int [] arrayData = new int [SIZE];
    private int count = 0;
    private int sum = 0;
    private double sumDiffsSq = 0.0;
    private String fileName;
    private Scanner scan;
    private double newMean = 0.0;

    //Constructor to help read the file using Scanner
    public HadoopSim(String filename)
    {
        Scanner scan = new Scanner(System.in);
        try {

            String FilePath="";//////////please provide file path here.
            File file = new File(FilePath+"/"+filename);
            scan = new Scanner(file);
            int i=0;
            while(scan.hasNextLine()){
                arrayData[i] = scan.nextInt();
                count++;
                sum+=arrayData[i];
                i++;
            }
            scan.close();

        } catch(Exception e) {
        }

    }

    public void run() {


    }
    //count is set when reading the file
    public int getCount()
    {
        return count;
    }

    //sum is set ina thread by tun() method
    public int getSum()
    {
        return sum;
    }

    //call after all threads have been completed
    public void setNewMean(double m)
    {
        newMean=m;
    }

    //method to compute each task;s sum of differneces quared using new mean
    public double setSumDiffSqs()
    {
     // System.out.println(newMean);
         double x=0.0;
              for(int i=0;i<arrayData.length;i++)
              {
                 // System.out.println(arrayData[i]);
                  if(arrayData[i]!=0)
                    x+= (Math.abs(newMean-arrayData[i])* Math.abs(newMean-arrayData[i]));
                  //System.out.println(x);

              }
              sumDiffsSq+=x;
           return x;
    }

    //returns the sum of differences squared for the data in each task's array
    public double getSumDiffSqs()
    {
        return sumDiffsSq;
    }

}

Please provide File Path where every your .txt file reside.

Test cases ::

Input :(Please provide file path and put this data in file)

sort1.txt:

100 90 75 80

sort2.txt:

99 92 60 75 70

sort3.txt:

99 92 60 75 70 50

output :

Task1 return count = 4 and sum = 345
Task2 return count = 5 and sum = 396
Task3 return count = 6 and sum = 446
Driver Mean is ::79.13333333333334
Driver sumDeffSeq ::571.3377777777774 1026.8222222222223 1875.5733333333337
Total Sum =1187.0 , count = 15.0 sumDeffSeq = 3473.7333333333336
Average is = 79.13
Standard Deviation is = 15.217825804700954

venereology answered 2 years ago

Next > < Previous

Related Solutions

Write a program in JAVA to create the move set of a Pokémon, and save that...

Write a program in JAVA to create the move set of a Pokémon, and save that move set to a file. This program should do the following: Ask for the pokemon’s name. Ask for the name, min damage, and max damage of 4 different moves. Write the move set data into a file with the pokemon’s name as the filename. The format of the output file is up to you, but keep it as simple as possible

Find the standard deviation for a set of data that has a mean of 100 and...

Find the standard deviation for a set of data that has a mean of 100 and 95% of the data falls between 70 and 130. ** Please show me the procedure, thanks!!!

Write a Java program where you will set the datatype and variables to find the value...

Write a Java program where you will set the datatype and variables to find the value of following expression. The Values of the variables to be used are given below a) 101 + 0) / 3 b) 3.0e-6 * 10000000.1 c) true && true d) false && true e) (false && false) || (true && true) f) (false || false) && (true && true)

Find the mean, median, mode, and standard deviation of the following set of data: 2, 3,...

Find the mean, median, mode, and standard deviation of the following set of data: 2, 3, 3, 4, 5, 6, 9 Mean_____________________ Median____________________ Mode_____________________ Standard Deviation__________ You roll two fair dice, a green one and a red one. Find the probability of getting a sum of 6 or a sum of 4. ___________________ You draw two cards from a standard deck of 52 cards without replacing the first one before drawing the second. Find the probability of drawing an ace...

Create a java program that allows people to buy tickets to a concert. Using java create...

Create a java program that allows people to buy tickets to a concert. Using java create a program that asks for the users name, and if they want an adult or teen ticket. As long as the user wants to purchase a ticket the program with "yes" the program will continue. When the user inputs "no" the program will output the customer name, total amount of tickets, and the total price. The adult ticket is $60 and the child ticket...

the mean of the data set: 37634.3 the standard deviation of the data set: 10967.85287 the...

the mean of the data set: 37634.3 the standard deviation of the data set: 10967.85287 the sample size of the data set: 50 Using the numbers above calculate the following Show your step-by-step work for each question: Determine the 90% confidence interval, assuming that sigma is unknown, list each in proper (lower bound, upper bound) notation. Make a confidence statement. Determine the 95% confidence interval, assuming that sigma is unknown, list each in proper (lower bound, upper bound) notation. Make...

In java. Prefer Bluej Create a program in java that calculates area and perimeter of a...

In java. Prefer Bluej Create a program in java that calculates area and perimeter of a square - use a class and test program to calculate the area and perimeter; assume length of square is 7 ft.

in java we need to order a list , if we create a program in java...

in java we need to order a list , if we create a program in java what are the possible ways of telling your program how to move the numbers in the list to make it sorted, where each way provides the required result. list the name of sorting with short explanation

you are to write a program in Java, that reads in a set of descriptions of...

you are to write a program in Java, that reads in a set of descriptions of various geometric shapes, calculates the areas and circumferences of the shapes, and then prints out the list of shapes and their areas in sorted order from smallest to largest area. There are four possible shapes: Circle, Square, Rectangle, and Triangle. The last is always an equilateral triangle. The program should read from standard input and write to standard output. The program should read until...

Create a program in java with the following information: Design a program that uses an array...

Create a program in java with the following information: Design a program that uses an array with specified values to display the following: The lowest number in the array The highest number in the array The total of the numbers in the array The average of the numbers in the array Initialize an array with these specific 20 numbers: 26 45 56 12 78 74 39 22 5 90 87 32 28 11 93 62 79 53 22 51 example...

Subjects