Question

In: Statistics and Probability

You have n numbers and you want to find a number x (out of the n)...

You have n numbers and you want to find a number x (out of the n) such that x is larger than the median. You can create an algorithim that takes time O(nlogn): sort the n numbers and then report any number that is larger than the element in position n2 of the sorted array. You can also create an algo in O(n) time, by finding the median in linear time and then doing a linear scan to find a number larger than the median.

Now I need to design a procedure that solves the same problem in time O(k), where k < n. The procedure will be probabilistic, i.e., it will succeed in finding the number larger than the median with probability that depends on k. FInd this dependency.

Solutions

Expert Solution

Given an array of size n, find all elements in array that appear more than n/k times

A simple method is to pick all elements one by one. For every picked element, count its occurrences by traversing the array, if count becomes more than n/k, then print the element. Time Complexity of this method would be O(n2).

A better solution is to use sorting. First, sort all elements using a O(nLogn) algorithm. Once the array is sorted, we can find all required elements in a linear scan of array. So overall time complexity of this method is O(nLogn) + O(n) which is O(nLogn).

Following is an interesting O(nk) solution:
We can solve the above problem in O(nk) time using O(k-1) extra space. Note that there can never be more than k-1 elements in output (Why?). There are mainly three steps in this algorithm.

1) Create a temporary array of size (k-1) to store elements and their counts (The output elements are going to be among these k-1 elements). Following is structure of temporary array elements.

struct eleCount { int element; int count; }; struct eleCount temp[]; 

This step takes O(k) time.

2) Traverse through the input array and update temp[] (add/remove an element or increase/decrease count) for every traversed element. The array temp[] stores potential (k-1) candidates at every step. This step takes O(nk) time.

3) Iterate through final (k-1) potential candidates (stored in temp[]). or every element, check if it actually has count more than n/k. This step takes O(nk) time.

The main step is step 2, how to maintain (k-1) potential candidates at every point? The steps used in step 2 are like famous game: Tetris. We treat each number as a piece in Tetris, which falls down in our temporary array temp[]. Our task is to try to keep the same number stacked on the same column (count in temporary array is incremented).

Consider k = 4, n = 9 Given array: 3 1 2 2 2 1 4 3 3 i = 0 3 _ _ temp[] has one element, 3 with count 1 i = 1 3 1 _ temp[] has two elements, 3 and 1 with counts 1 and 1 respectively i = 2 3 1 2 temp[] has three elements, 3, 1 and 2 with counts as 1, 1 and 1 respectively. i = 3 - - 2 3 1 2 temp[] has three elements, 3, 1 and 2 with counts as 1, 1 and 2 respectively. i = 4 - - 2 - - 2 3 1 2 temp[] has three elements, 3, 1 and 2 with counts as 1, 1 and 3 respectively. i = 5 - - 2 - 1 2 3 1 2 temp[] has three elements, 3, 1 and 2 with counts as 1, 2 and 3 respectively. 

Now the question arises, what to do when temp[] is full and we see a new element – we remove the bottom row from stacks of elements, i.e., we decrease count of every element by 1 in temp[]. We ignore the current element.

i = 6 - - 2 - 1 2 temp[] has two elements, 1 and 2 with counts as 1 and 2 respectively. i = 7 - 2 3 1 2 temp[] has three elements, 3, 1 and 2 with counts as 1, 1 and 2 respectively. i = 8 3 - 2 3 1 2 temp[] has three elements, 3, 1 and 2 with counts as 2, 1 and 2 respectively.

Finally, we have at most k-1 numbers in temp[]. The elements in temp are {3, 1, 2}. Note that the counts in temp[] are useless now, the counts were needed only in step 2. Now we need to check whether the actual counts of elements in temp[] are more than n/k (9/4) or not. The elements 3 and 2 have counts more than 9/4. So we print 3 and 2.

Note that the algorithm doesn’t miss any output element. There can be two possibilities, many occurrences are together or spread across the array. If occurrences are together, then count will be high and won’t become 0. If occurrences are spread, then the element would come again in temp[]. Following is C++ implementation of above algorithm.

// A C++ program to print elements with count more than n/k

#include<iostream>

using namespace std;

// A structure to store an element and its current count

struct eleCount

{

    int e; // Element

    int c; // Count

};

// Prints elements with more than n/k occurrences in arr[] of

// size n. If there are no such elements, then it prints nothing.

void moreThanNdK(int arr[], int n, int k)

{

    // k must be greater than 1 to get some output

    if (k < 2)

       return;

    /* Step 1: Create a temporary array (contains element

       and count) of size k-1. Initialize count of all

       elements as 0 */

    struct eleCount temp[k-1];

    for (int i=0; i<k-1; i++)

        temp[i].c = 0;

    /* Step 2: Process all elements of input array */

    for (int i = 0; i < n; i++)

    {

        int j;

        /* If arr[i] is already present in

           the element count array, then increment its count */

        for (j=0; j<k-1; j++)

        {

            if (temp[j].e == arr[i])

            {

                 temp[j].c += 1;

                 break;

            }

        }

        /* If arr[i] is not present in temp[] */

        if (j == k-1)

        {

            int l;

             

            /* If there is position available in temp[], then place

              arr[i] in the first available position and set count as 1*/

            for (l=0; l<k-1; l++)

            {

                if (temp[l].c == 0)

                {

                    temp[l].e = arr[i];

                    temp[l].c = 1;

                    break;

                }

            }

            /* If all the position in the temp[] are filled, then

               decrease count of every element by 1 */

            if (l == k-1)

                for (l=0; l<k; l++)

                    temp[l].c -= 1;

        }

    }

    /*Step 3: Check actual counts of potential candidates in temp[]*/

    for (int i=0; i<k-1; i++)

    {

        // Calculate actual count of elements

        int ac = 0; // actual count

        for (int j=0; j<n; j++)

            if (arr[j] == temp[i].e)

                ac++;

        // If actual count is more than n/k, then print it

        if (ac > n/k)

           cout << "Number:" << temp[i].e

                << " Count:" << ac << endl;

    }

}

/* Driver program to test above function */

int main()

{

    cout << "First Test\n";

    int arr1[] = {4, 5, 6, 7, 8, 4, 4};

    int size = sizeof(arr1)/sizeof(arr1[0]);

    int k = 3;

    moreThanNdK(arr1, size, k);

    cout << "\nSecond Test\n";

    int arr2[] = {4, 2, 2, 7};

    size = sizeof(arr2)/sizeof(arr2[0]);

    k = 3;

    moreThanNdK(arr2, size, k);

    cout << "\nThird Test\n";

    int arr3[] = {2, 7, 2};

    size = sizeof(arr3)/sizeof(arr3[0]);

    k = 2;

    moreThanNdK(arr3, size, k);

    cout << "\nFourth Test\n";

    int arr4[] = {2, 3, 3, 2};

    size = sizeof(arr4)/sizeof(arr4[0]);

    k = 3;

    moreThanNdK(arr4, size, k);

    return 0;

}


Related Solutions

QUESTION ONE You want to find out the performance of county XYZ and you have been...
QUESTION ONE You want to find out the performance of county XYZ and you have been given the following   secondary data in tabular form of the country’s Gross Domestic Product (GDP in K’m), broken down into three different categories, for three separate years. CATEGORY 2016 2017 2018 Public Expenditure 3 5 15 Mining 12 15 30 Other 18 25 55 Total GDP 33 45 100 Present, illustrate, interpret and analyse the information using a) A Pictogram A bar chart A...
algorithm binarySearch input bottom, top: a number    a: array a[0] to a[n-1] of numbers    x: a...
algorithm binarySearch input bottom, top: a number    a: array a[0] to a[n-1] of numbers    x: a number output result: true if x in a false otherwise Sideeffect NA Plan if (top < bottom) result := false else // get the middle of the array (refining) middle := (top - bottom)/2 + bottom (refining by +1 or -1 or integer division …) // example what is midpoint between 6 and 8 // (8-6)/2 = 1 we need to add 6 to...
A student must answer 7 out of 10 questions on an exam. Find the number n...
A student must answer 7 out of 10 questions on an exam. Find the number n of choices (a) if there are no restrictions, (b) if the student must answer the first two questions, (c) the first or second question but not both. Explain C)
Write a C++ program to read N numbers. Find sum, product, and average of N numbers
Write a C++ program to read N numbers. Find sum, product, and average of N numbers
a-Assume you have a stack of integers. You want to organize it such that all numbers...
a-Assume you have a stack of integers. You want to organize it such that all numbers that are smaller than -100 go to the bottom, numbers between -100 and +100 in the middle and numbers larger than 100 in the top. Write a Java code that uses no more than three additional Stacks to solve the problem. NB: You do not need to write the code for Stacks, you are using a Stack from the library with a name SStack...
A triangular number is the sum of the n natural numbers from 1 to n. For...
A triangular number is the sum of the n natural numbers from 1 to n. For example: The triangular number for 3 is 1 + 2 + 3 = 6 The triangular number for 7 is 1 + 2 + 3 + 4 + 5 + 6 + 7 = 28 Write a program segment (using a loop), that calculates and then prints the integer n and its triangular number.
You want to find out the percentage of community college students that own a pet. You...
You want to find out the percentage of community college students that own a pet. You decide to email five students in one of your classes and ask them if they own any pets. (a) What is the population in this example? (b) What is the sample? (c) What is the parameter of interest? (d) Suppose three of the five students tell you they own a pet, while the other two say they do not. i. Is this numerical or...
You have to use the numbers that is given to figure out what the question is...
You have to use the numbers that is given to figure out what the question is asking. 1. You are the Nutrition and Food Services Director and your Chief Financial Officer (CFO) has requested that you evaluate the inventory within the department. Specifically, the CFO wishes to know if the facility is effectively managing the inventory.    To accomplish this task, you will evaluate the inventory turnover from the previous quarter. You have determined the following information: Inventory value at the...
1. As an investor of a company, you want to find out your ROI provided by...
1. As an investor of a company, you want to find out your ROI provided by the company at the end of the current year.   Which of the following formulas would you use? Group of answer choices Net Income divided by the Average Asset of the company. Net Income divided by the Sales amount of the company. Net Income divided by the Equity amount of the company's ending Balance Sheet. Net Income divided by the average amount of the Total...
Let f(x)=x • 3^x a) Find formula for f^(n) •(x) for natural n (the n order...
Let f(x)=x • 3^x a) Find formula for f^(n) •(x) for natural n (the n order derivative). b) Write the Taylor series generated by f(x) in 0.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT