Question

In: Computer Science

Explain in details Hadoop architecture. Give an example of Hadoop MapReduce example using real data

Expert Solution

Hadoop is a distributed framework for processing big data. Hadoop has HDFS (Hadoop Distributed File System) for big data storage and MapReduce framework for processing. Hadoop has three major versions till date i.e. Hadoop 1, 2 and 3.

Hadoop 1.0 Architecture: Hadoop 1.0 has HDFS as file system and Map Reduce as a faster processing framework. It has following components:

1. Job Tracker

2. Task Tracker

3. Name Node

4. Data Nodes

5. Secondary Namenode

Here, Master-Slave architecture is followed where master is name node and slaves are data nodes. Master name node keep tracks of data using FS table and assigns tasks to data nodes for that it uses Job Tracker. In short, Job tracker assigns tasks to task trackers on every data node. Secondary name node is used if name node fails. To keep track of health of name node and other nodes, heart bit is checked frequently (a signal). Rack awareness is used while assigning tasks so that data locality should be followed to make architecture more effective.

Hadoop 2.0 Architecture: Hadoop 2.0 also has HDFS as file system and Map Reduce as a faster processing framework.But it has following components:

1. Resource Manager

2. Application Master

3. Containers

4. Node Manager

Basically it is also a master-slave architecture, it has different components as well. Resource manager takes care of resource scheduling and availability. Application master negotiates with Resource manager for resources and assigns it to different node managers for different data nodes. Containers are the storage available on data nodes. Hadoop 2 is superior over Hadoop 1.

How map reduce works?

Data is divided into blocks of data. Block is unit of data. For Hadoop 1 it is 64 MB and for Hadoop 2 is 128 MB.

Then these blocks are assigned to different task trackers on various nodes. This is called distribution or mapping. Now the transformations are performed in map, then shuffle sort phase. Sorting will sort the data from various data nodes which also requires shuffling in between nodes. Once the operations are performed, reduce phase will collect the data from various nodes and forms the output.

Real life example for map reduce framework:

For example, our production data query is calculating maximum of salaries from specific department and departments are active which involves:

- Grouping on department

- Having clause with active flag

- Where clause for other predicates

- Aggregation function MAX on this result set

So below are the steps taken in map reduce phase:

1. Input splitter will split the data into blocks of data and blocks are assigned to number of data nodes based on number of mappers property. Here map phase works.

2. Once data is mapped, where clause operations are performed while shuffle sort phase, group also starts along with other operations

3. While reduce phase, aggregation gets performed to find maximum value, then output is written finally onto HDFS

venereology answered 8 months ago

Give an example of how you learn and describe in details by using Kolb’s Learning Cycle....

Give an example of how you learn and describe in details by using Kolb’s Learning Cycle. Based on that, illustrate how you lead people to learn with reference to Kolb’s Learning Cycle. (500words)

Discuss impact of Big data on databases and database design (Hadoop). Give examples of application.

Give an example of a real world issue that can be explained using the prisoner's dilemma...

Give an example of a real world issue that can be explained using the prisoner's dilemma game. Specify the players in the game, the actions available to them and the payoff to the players in the game given outcome. Explain how the incentives for the players are similar to that of the prisoners in the prisoner's dilemma.

Explain the usefulness of a flexible budget in specific business cases. Give a real example of...

Explain the usefulness of a flexible budget in specific business cases. Give a real example of a flexible budget in an organization

Give me a real-life example of a data distribution that youthink is approximately normal, but...

Give me a real-life example of a data distribution that you *think* is approximately normal, but with a large spread (i.e. large standard deviation). Use Google or other research methods to find the mean and standard deviation to support your claim. What reasons or conditions attribute to the large variation? Do some research, but do not give me a numerical example. a data distribution that you *think* is approximately normal, but with a very small spread (i.e. small standard deviation)....

please give me an example with the rectangular arbitrage based on the real world data thank...

please give me an example with the rectangular arbitrage based on the real world data thank you

Give a real-life data example for each of the following three cases: (a) False negatives are...

Give a real-life data example for each of the following three cases: (a) False negatives are less tolerable than false positives. (b) False positives are less tolerable than false negatives. (c) False positives and false negatives are of equivalent importance.

Question

Explain in details Hadoop architecture. Give an example of Hadoop MapReduce example using real data

Solutions

Expert Solution

Related Solutions