Question

In: Computer Science

Write pseudo-code to solve the following problem using MapReduce and explain how it works. Each line...

Write pseudo-code to solve the following problem using MapReduce and explain how it works.

Each line in the file lists a user ID, the ID of the movie the user watched, the rating the user gave for the movie, and the timestamp. For example line 1 indicates that the user’s ID is 196, the movie ID is 242, the user gave this movie a rating of 3, and the timestamp is 881250949. Given the file, find out the top similar movies for each movie. Hint: Similarity is defined as how similarly two movies are rated by the same user across all users who have watched both movies.

196 242 3 881250949

186 302 3 891717742

22 377 1 878887116

244 51 2 880606923

166 346 1 886397596

298 474 4 884182806

Solutions

Expert Solution

HELLO I AM PROVIDING STEP BY STEP SOLUTION AS PER YOUR REQUIREMENT

PLEASE GO THROUGH IT

Solution:
        To solve this problem we can use chaining of mapreduce. 
        This works like this output of first mapreduce 
        is given as output for second mapreduce.
        
        Step-1 In the first step the Data is given as input to first  JOB,
                   and output of this step will be given as input to second Step

        Step-2 Second JOB will wait for output from first JOB then first output will work as 
               input for Second JOB and will provide final output at end
MapReduce Job-1 : Its job is to collect all user rated items 
MapReduce Job-2 : Its job is to find similarity between items using correlation formula

MAPREDUCE JOB 1:

Algorithm-1
        Map-1's Job is to Emit the user_id as key and (item and rating) as value
        
        Input:-
                key-line offset
                value- row of input file contains (item_id,user_id,rating)
        Output:- 
                Key- user_id
                Value-(item_id,rating) pair
        Require: Input dataset containing User_id, Item_id, rating fields

        Procedure:
                user_id, item_id, rating = line.split('\t')
                key=user_id
                value=item_id
                value.append(rating)
                emit(key,value)


Algorithm-2
        Reducer-1's Job is to For each user, emit a row containing their "postings"(item , rating pairs)
        Input:-
                key- user_id
                value- Sequence of (user_id , rating)
        Output:-
                Key- user_id
                Value-row contain all posting of user (item_id , rating)
        Procedure:
                item_count = 0
                item_sum = 0
                final = []
                for item_id, rating in values
                {
                        item_count += 1
                        item_sum += rating
                        final.append((item_id, rating))
                        Key=user_id
                        Value= item_count, item_sum, final
                        Emit (key,value)
                }


MAPREDUCE JOB : 2

Algorithm-3
        Map-2: The output drops the user from the key entirely, instead it emits the pair of items as the key
        Job-2: Require Output of job-1 as input to the job-2.
        Input:-
                key-user_id
                value- row containing all the posting of user (user_id , rating)
        Output:-
                Key- (item_id , item_id)
                Value-(rating , rating)
        Procedure:
                item_count , item_sum, ratings = values
                for item1, item2 in combinations(ratings, 2)
                {
                        key=(item1[0], item2[0])
                        value=( item1[1], item2[1])
                        Emit=(key , value)
                }


Algorithm-4
        Reduce-2: Sum components of each co rating pair across all users who rated both item x and
                          item y, then calculate correlation similarity.
        Input:-
                key- (item_id , item_id)
                value- sequence of rating pair(rating , rating)
        Output:-
                Key- (item_id , item_id)
                Value-(similarity , n)
        Procedure
                sum_xx, sum_xy, sum_yy, sum_x, sum_y, n = (0.0, 0.0, 0.0, 0.0, 0.0, 0)
                sum_x += item_x
                n += 1
                item_pair, co_ratings = pair_key, lines
                item_xname, item_yname = item_pair
                for item_x, item_y in lines:
                {
                        sum_xx += item_x * item_x
                        sum_yy += item_y * item_y
                        sum_xy += item_x * item_y
                        sum_y += item_y
                        similarity = normalized_correlation(n, sum_xy, sum_x, sum_y,sum_xx, sum_yy)
                        Key=(item_xname,item_yname)
                        Value= (similarity)
                        Emit(key,value)
                }

Related Solutions

Write a python script to solve the 4-queens problem using. The code should allow for random...
Write a python script to solve the 4-queens problem using. The code should allow for random starting, and for placed starting. "The 4-Queens Problem[1] consists in placing four queens on a 4 x 4 chessboard so that no two queens can capture each other. That is, no two queens are allowed to be placed on the same row, the same column or the same diagonal." Display the iterations until the final solution Hill Climbing (your choice of variant)
For each of the following write the line(s) of code that: Declares and initializes (creates) an...
For each of the following write the line(s) of code that: Declares and initializes (creates) an ArrayList that holds String objects Adds to your ArrayList the words "Too" and "Fun" Verifies that your ArrayList now contains 2 elements Sets the second String in the ArrayList to "No" Verifies that your ArrayList still contains exactly 2 elements Prints the contents of the ArrayList to the screen in the following format: <element>, <element>, . . . , <element>
Using python as the coding language please write the code for the following problem. Write a...
Using python as the coding language please write the code for the following problem. Write a function called provenance that takes two string arguments and returns another string depending on the values of the arguments according to the table below. This function is based on the geologic practice of determining the distance of a sedimentary rock from the source of its component grains by grain size and smoothness. First Argument Value Second Argument Value Return Value "coarse" "rounded" "intermediate" "coarse"...
What will be the expected output of the following pseudo code? Write exactly what would display...
What will be the expected output of the following pseudo code? Write exactly what would display when you execute the statements. Module main() Declare Integer a = 5 Declare Integer b = 2 Declare Integer c = 3 Declare Integer result = 0 Display "The value of result is" Display result Set result = a + b * c - a Display "Changed value is: ", result End Module
(Artificial Intelligence) Write a pseudo code for the following: Regular Hill Climbing with steepest ascent
(Artificial Intelligence) Write a pseudo code for the following: Regular Hill Climbing with steepest ascent
For the following program descriptions, write step by step pseudo code that shows you understand the...
For the following program descriptions, write step by step pseudo code that shows you understand the problem and what it takes to solve it. The first one is done for you as an example. Please answer the questions in the same format as the example problem below so it is the same. Example #1 Problem A customer is purchasing five items. Design a program where you collect the amount of each item, calculate the subTotal of the items, the tax...
Write a MIPS assembly language program that implements the following pseudo-code operation: result = x +...
Write a MIPS assembly language program that implements the following pseudo-code operation: result = x + y – z + A[j] x and y should be in reserved memory words using the .word directive and labeled as x and y. Initialize x=10 and y=200. Read in z from the console. Input the value -8. This is the value for z, not for –z. Store this value in memory with the label z. To begin, you could just initialize z to...
A customer in a grocery store is purchasing three items. Write the pseudo code that will:...
A customer in a grocery store is purchasing three items. Write the pseudo code that will: • Ask the user to enter the name of the first item purchased. Then ask the user to enter the cost of the first item purchased. Make your program user friendly. If the user says the first item purchased is milk, then ask: “What is the cost of milk.” [This should work no matter what item is entered by the user. I might buy...
I was not sure how to utilize this line because I made code that works but...
I was not sure how to utilize this line because I made code that works but not with this line specifically. C++ Function 2: bool exists_trio_within_distance(int*,int,int); //Input:    //an integer array (param 1), its size (param 2), and    //a distance (param 3) //Output:    //True or false //Behavior:    //Returns true is there exists    //a sequence of 3 values in the array    //such that sum of the first two elements    //is equal to the third element...
Write Java code for each of the following problem a. Two players, a and b, try...
Write Java code for each of the following problem a. Two players, a and b, try to guess the cost of an item. Whoever gets closest to the price without going over is the winner. Return the value of the best bid. If both players guessed too high, return -1. example: closestGuess(97, 91, 100) → 97 closestGuess(3, 51, 50) → 3 closestGuess(12, 11, 10) → -1 b. Given a non-empty string, return true if at least half of the characters...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT