Question

In: Computer Science

Write pseudo-code to solve the following problem using MapReduce and explain how it works. Each line...

Write pseudo-code to solve the following problem using MapReduce and explain how it works.

Each line in the file lists a user ID, the ID of the movie the user watched, the rating the user gave for the movie, and the timestamp. For example line 1 indicates that the user’s ID is 196, the movie ID is 242, the user gave this movie a rating of 3, and the timestamp is 881250949. Given the file, find out the top similar movies for each movie. Hint: Similarity is defined as how similarly two movies are rated by the same user across all users who have watched both movies.

196 242 3 881250949

186 302 3 891717742

22 377 1 878887116

244 51 2 880606923

166 346 1 886397596

298 474 4 884182806

Solutions

Expert Solution

HELLO I AM PROVIDING STEP BY STEP SOLUTION AS PER YOUR REQUIREMENT

PLEASE GO THROUGH IT

Solution:
        To solve this problem we can use chaining of mapreduce. 
        This works like this output of first mapreduce 
        is given as output for second mapreduce.
        
        Step-1 In the first step the Data is given as input to first  JOB,
                   and output of this step will be given as input to second Step

        Step-2 Second JOB will wait for output from first JOB then first output will work as 
               input for Second JOB and will provide final output at end
MapReduce Job-1 : Its job is to collect all user rated items 
MapReduce Job-2 : Its job is to find similarity between items using correlation formula

MAPREDUCE JOB 1:

Algorithm-1
        Map-1's Job is to Emit the user_id as key and (item and rating) as value
        
        Input:-
                key-line offset
                value- row of input file contains (item_id,user_id,rating)
        Output:- 
                Key- user_id
                Value-(item_id,rating) pair
        Require: Input dataset containing User_id, Item_id, rating fields

        Procedure:
                user_id, item_id, rating = line.split('\t')
                key=user_id
                value=item_id
                value.append(rating)
                emit(key,value)


Algorithm-2
        Reducer-1's Job is to For each user, emit a row containing their "postings"(item , rating pairs)
        Input:-
                key- user_id
                value- Sequence of (user_id , rating)
        Output:-
                Key- user_id
                Value-row contain all posting of user (item_id , rating)
        Procedure:
                item_count = 0
                item_sum = 0
                final = []
                for item_id, rating in values
                {
                        item_count += 1
                        item_sum += rating
                        final.append((item_id, rating))
                        Key=user_id
                        Value= item_count, item_sum, final
                        Emit (key,value)
                }


MAPREDUCE JOB : 2

Algorithm-3
        Map-2: The output drops the user from the key entirely, instead it emits the pair of items as the key
        Job-2: Require Output of job-1 as input to the job-2.
        Input:-
                key-user_id
                value- row containing all the posting of user (user_id , rating)
        Output:-
                Key- (item_id , item_id)
                Value-(rating , rating)
        Procedure:
                item_count , item_sum, ratings = values
                for item1, item2 in combinations(ratings, 2)
                {
                        key=(item1[0], item2[0])
                        value=( item1[1], item2[1])
                        Emit=(key , value)
                }


Algorithm-4
        Reduce-2: Sum components of each co rating pair across all users who rated both item x and
                          item y, then calculate correlation similarity.
        Input:-
                key- (item_id , item_id)
                value- sequence of rating pair(rating , rating)
        Output:-
                Key- (item_id , item_id)
                Value-(similarity , n)
        Procedure
                sum_xx, sum_xy, sum_yy, sum_x, sum_y, n = (0.0, 0.0, 0.0, 0.0, 0.0, 0)
                sum_x += item_x
                n += 1
                item_pair, co_ratings = pair_key, lines
                item_xname, item_yname = item_pair
                for item_x, item_y in lines:
                {
                        sum_xx += item_x * item_x
                        sum_yy += item_y * item_y
                        sum_xy += item_x * item_y
                        sum_y += item_y
                        similarity = normalized_correlation(n, sum_xy, sum_x, sum_y,sum_xx, sum_yy)
                        Key=(item_xname,item_yname)
                        Value= (similarity)
                        Emit(key,value)
                }

Related Solutions

Write the pseudo code for this problem based on what you learned from the video. The...
Write the pseudo code for this problem based on what you learned from the video. The purpose is to design a modular program that asks the user to enter the length and width, and then calculates the area. The formula is as follows: Area = Width x Length
- explain how the pseudo one-time pad works? What are its limitations?
- explain how the pseudo one-time pad works? What are its limitations?
Write a python script to solve the 4-queens problem using. The code should allow for random...
Write a python script to solve the 4-queens problem using. The code should allow for random starting, and for placed starting. "The 4-Queens Problem[1] consists in placing four queens on a 4 x 4 chessboard so that no two queens can capture each other. That is, no two queens are allowed to be placed on the same row, the same column or the same diagonal." Display the iterations until the final solution Hill Climbing (your choice of variant)
For each of the following write the line(s) of code that: Declares and initializes (creates) an...
For each of the following write the line(s) of code that: Declares and initializes (creates) an ArrayList that holds String objects Adds to your ArrayList the words "Too" and "Fun" Verifies that your ArrayList now contains 2 elements Sets the second String in the ArrayList to "No" Verifies that your ArrayList still contains exactly 2 elements Prints the contents of the ArrayList to the screen in the following format: <element>, <element>, . . . , <element>
Using python as the coding language please write the code for the following problem. Write a...
Using python as the coding language please write the code for the following problem. Write a function called provenance that takes two string arguments and returns another string depending on the values of the arguments according to the table below. This function is based on the geologic practice of determining the distance of a sedimentary rock from the source of its component grains by grain size and smoothness. First Argument Value Second Argument Value Return Value "coarse" "rounded" "intermediate" "coarse"...
Write a pseudo code for an O (n7log3n) algorithm. Please write in C++.
Write a pseudo code for an O (n7log3n) algorithm. Please write in C++.
What will be the expected output of the following pseudo code? Write exactly what would display...
What will be the expected output of the following pseudo code? Write exactly what would display when you execute the statements. Module main() Declare Integer a = 5 Declare Integer b = 2 Declare Integer c = 3 Declare Integer result = 0 Display "The value of result is" Display result Set result = a + b * c - a Display "Changed value is: ", result End Module
(Artificial Intelligence) Write a pseudo code for the following: Regular Hill Climbing with steepest ascent
(Artificial Intelligence) Write a pseudo code for the following: Regular Hill Climbing with steepest ascent
For the following program descriptions, write step by step pseudo code that shows you understand the...
For the following program descriptions, write step by step pseudo code that shows you understand the problem and what it takes to solve it. The first one is done for you as an example. Please answer the questions in the same format as the example problem below so it is the same. Example #1 Problem A customer is purchasing five items. Design a program where you collect the amount of each item, calculate the subTotal of the items, the tax...
1D List Practice Could you write the code to solve the following problem that uses 1D...
1D List Practice Could you write the code to solve the following problem that uses 1D lists? You have been tasked with writing a Python program that will assist the CAU Registrar’s Office with determining the following: the average GPA of the freshman class after their first semester the top five GPAs the number of students who are eligible for the Dean’s List (i.e., their GPA is 3.25 or higher) Your program will contain at least three (3) functions -...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT