Question

In: Computer Science

All along we have been discussing processing similar kind of data in a MapReduce job. In...

All along we have been discussing processing similar kind of data in a MapReduce job. In other words lots of data of same kind. But sometimes, you have to process two different kinds of data sets together, join them and produce a joined output. This is called joins. We take joins for granted in RDBMS, but it takes an effort to implement in MapReduce. Research various Join techniques/algorithms available for MapReduce. In your opinion, what are the pros and cons of the various MapReduce join options. Please keep it to your own words rater than paraphrasing sentences from websites.

Solutions

Expert Solution

Google made MapReduce to process enormous amounts of unstructured or semi-organized information. MapReduce programming system is utilized to perform circulated and parallel handling with huge informational indexes in a conveyed domain. Guide and Reduce are the two particular undertakings of a guide lessen program. From the outset in the guide stage, the information is perused, and key-esteem sets are created out of it. At that point these key-esteem sets are nourished into decreasing errand which totals the key-esteem pair information into the littler arrangement of qualities creating the last yield. In this manner, a diminish undertaking is constantly executed after a guide assignment has been finished. It is exceptionally simple to scale information preparing over various processing hubs.

The three stages in the program are as follows:

  1. Map Stage
  2. Shuffle Stage
  3. Reduce Stage

ADVANTAGES / PROS:

1. Versatility

Hadoop as a stage that is exceptionally versatile and is to a great extent a direct result of its capacity that it stores and disseminates huge informational indexes crosswise over loads of servers. The servers utilized here are very modest and can work in parallel. The preparing intensity of the framework can be improved with the expansion of more servers. The conventional social database the board frameworks or RDBMS were not ready to scale to process enormous informational indexes.

2. Adaptability

Hadoop MapReduce programming model offers adaptability to process structure or unstructured information by different business associations who can utilize the information and can work on various kinds of information. Along these lines, they can create a business incentive out of those information which are important and valuable for the business associations for examination. Regardless of the information source whether it be an internet based life, clickstream, email, and so forth. Hadoop offers support for a great deal of dialects utilized for information handling. Alongside this, Hadoop MapReduce programming permits numerous applications, for example, advertising examination, suggestion framework, information distribution center, and misrepresentation identification.

3. Security and Authentication

On the off chance that any outcast individual gains admittance to every one of the information of the association and can control numerous petabytes of the information it can do a lot of mischief as far as business managing in activity to the business association. This hazard is tended to by the MapReduce programming model by working with hdfs and HBase that permits high security permitting just the endorsed client to work on the put away information in the framework.

4. Quick

Hadoop disseminated record framework HDFS is a key component utilized in Hadoop which is fundamentally actualizing a mapping framework to find information in a group. MapReduce writing computer programs is the instrument utilized for information preparing and it is found likewise in a similar server permitting quicker handling of information. Hadoop MapReduce forms enormous volumes of information that is unstructured or semi-organized in less time.

5. A straightforward model of programming

MapReduce writing computer programs depends on an extremely straightforward programming model which essentially enables the software engineers to build up a MapReduce program that can deal with a lot more errands without hardly lifting a finger and effectiveness. MapReduce programming model is composed utilizing Java language is exceptionally famous and extremely simple to learn. It is simple for individuals to learn Java programming and structure information handling model that meets their business need.

6. Accessibility and strong nature

Hadoop MapReduce programming model procedures the information by sending the information to an individual hub just as forward a similar arrangement of information to different hubs dwelling in the system. Thus, if there should arise an occurrence of disappointment in a specific hub, similar information duplicate is as yet accessible on different hubs which can be utilized at whatever point it is required guaranteeing the accessibility of information.

There are numerous organizations over the globe utilizing map-diminish like facebook, hurray, and so forth.

DISADVANTAGES / CONS:

1. Information preparing is Slow in MapReduce-

Despite the fact that MapReduce forms huge datasets, however it likewise expends more opportunity to perform errands. Likewise, in Mapreduce, information is handled and circulated over the group which expands time and decreases preparing speed.

2. Restricted up to Batch handling as it were-

You will know with the way that Hadoop supports group preparing just, and it doesn't process stream information, coming about in more slow execution.

3. Utilizing Mapreduce isn't a simple activity

The designers of MapReduce need to hand code for every single activity which makes it hard to work. MapReduce doesn't have an intelligent mode, along these lines making it difficult to work with.

4. Storing is confined in MapReduce-

One of the fundamental hindrance of MapReduce is that it can't store middle of the road information in memory for a further prerequisite which decreases the exhibition.

5. Latency

Map and Reduce sets aside part of effort to change over and break the information into another set and key-esteem pair, which consequently builds Latency.


Related Solutions

So far we far we have been discussing the advantages of HIT, but have not considered...
So far we far we have been discussing the advantages of HIT, but have not considered its disadvantages. Discuss at least three (3) of the disadvantages or challenges that can be encountered by adoption of the HIT, such as EHR (Electronic Health Record) or HIE (health Information Exchange).
we have been discussing drugs and how they are categorized into schedules although they have a...
we have been discussing drugs and how they are categorized into schedules although they have a high risk potential for abuse schedule 11 drugs do have legitimate medical use it is important as the medical assistant to be aware of any signs of abuse when it comes to prescribing medications what are some drug seeking behaviors to watch for
What is seeing? This week, discussing the basics of visual processing, how images have been received...
What is seeing? This week, discussing the basics of visual processing, how images have been received at various points in history, and how images are inseparable from the culture(s) in which they are produced. In the essay posting, answer this question: do images show the truth? What is in an image? Why or why not? Draw on the readings to support your position. This week, discussing the basics of visual processing, how images have been received at various points in...
We have been discussing inference in class for a while now. One of the aspects of...
We have been discussing inference in class for a while now. One of the aspects of the discussion is estimation using a confidence interval. Going into as much detail as you like, explain in your own words the purpose of confidence interval estimation. As part of the explanation, indicate what is being estimated, possibly using an example of some kind.
in java please: You have been given the job of creating a new order processing system...
in java please: You have been given the job of creating a new order processing system for the Yummy Fruit CompanyTM. The system reads pricing information for the various delicious varieties of fruit stocked by YFC, and then processes invoices from customers, determining the total amount for each invoice based on the type and quantity of fruit for each line item in the invoice. The program input starts with the pricing information. Each fruit price (single quantity) is specified on...
We have been discussing the parallel between Newton’s Universal Law of Gravity and Coulomb’s force law:...
We have been discussing the parallel between Newton’s Universal Law of Gravity and Coulomb’s force law: • Both laws define force. Coulomb's law describes the force between electric charges whereas Newton’s law describes the force between masses. • Both are inverse square laws. The forces are proportional to the inverse square of the distance between masses for Newton’s law and the inverse square of the distance between charges for Coulomb’s law. • The forces defined by both laws are central...
Azure has a job order and the following data have been recorded on its job cost...
Azure has a job order and the following data have been recorded on its job cost sheet: Direct material                          $50,000 Direct labour hours                   1,000 Direct labour wage rate             $25 Machine hours                          750 hours Number of units completed       800 The company applies manufacturing overhead on the basis of machine hours and the predetermined overhead rate is $20 per machine hour. Management is now considering whether this job order is profitable or not and how does this job order fare compared to the industry benchmark. Required Compute...
Azure has a job order and the following data have been recorded on its job cost...
Azure has a job order and the following data have been recorded on its job cost sheet: Direct material $50,000 Direct labour hours 1,000 Direct labour wage rate $25 Machine hours 750 hours Number of units completed 800 The company applies manufacturing overhead on the basis of machine hours and the predetermined overhead rate is $20 per machine hour. Management is now considering whether this job order is profitable or not and how does this job order fare compared to...
We have all had supervisors that we thought could do a better job at managing their...
We have all had supervisors that we thought could do a better job at managing their staff and we also hope to be better managers than what we have experienced at certain places of employment. Discuss the following with your classmates and instructors. How should employees relate to supervisors who are inaccessible and who hoard information? What can employees do about this type of supervisor? What characteristics are present in an individual with leadership potential?
Job 434 was recently completed. The following data have been recorded on its job cost sheet:...
Job 434 was recently completed. The following data have been recorded on its job cost sheet: Direct materials $ 45,000 Direct labor-hours 630 labor-hours Direct labor wage rate $ 13 per labor-hour Machine-hours 390 machine-hours Number of units completed 3,000 units The company applies manufacturing overhead on the basis of machine-hours. The predetermined overhead rate is $12 per machine-hour. Required: Compute the unit product cost that would appear on the job cost sheet for this job.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT