Question

In: Computer Science

Explain in details Hadoop architecture. Give an example of Hadoop MapReduce example using real data

Explain in details Hadoop architecture. Give an example of Hadoop MapReduce example using real data

Solutions

Expert Solution

Hadoop is a distributed framework for processing big data. Hadoop has HDFS (Hadoop Distributed File System) for big data storage and MapReduce framework for processing. Hadoop has three major versions till date i.e. Hadoop 1, 2 and 3.

Hadoop 1.0 Architecture: Hadoop 1.0 has HDFS as file system and Map Reduce as a faster processing framework. It has following components:

1. Job Tracker

2. Task Tracker

3. Name Node

4. Data Nodes

5. Secondary Namenode

Here, Master-Slave architecture is followed where master is name node and slaves are data nodes. Master name node keep tracks of data using FS table and assigns tasks to data nodes for that it uses Job Tracker. In short, Job tracker assigns tasks to task trackers on every data node. Secondary name node is used if name node fails. To keep track of health of name node and other nodes, heart bit is checked frequently (a signal). Rack awareness is used while assigning tasks so that data locality should be followed to make architecture more effective.

Hadoop 2.0 Architecture: Hadoop 2.0 also has HDFS as file system and Map Reduce as a faster processing framework.But it has following components:

1. Resource Manager

2. Application Master

3. Containers

4. Node Manager

Basically it is also a master-slave architecture, it has different components as well. Resource manager takes care of resource scheduling and availability. Application master negotiates with Resource manager for resources and assigns it to different node managers for different data nodes. Containers are the storage available on data nodes. Hadoop 2 is superior over Hadoop 1.

How map reduce works?

Data is divided into blocks of data. Block is unit of data. For Hadoop 1 it is 64 MB and for Hadoop 2 is 128 MB.

Then these blocks are assigned to different task trackers on various nodes. This is called distribution or mapping. Now the transformations are performed in map, then shuffle sort phase. Sorting will sort the data from various data nodes which also requires shuffling in between nodes. Once the operations are performed, reduce phase will collect the data from various nodes and forms the output.

Real life example for map reduce framework:

For example, our production data query is calculating maximum of salaries from specific department and departments are active which involves:

- Grouping on department

- Having clause with active flag

- Where clause for other predicates

- Aggregation function MAX on this result set

So below are the steps taken in map reduce phase:

1. Input splitter will split the data into blocks of data and blocks are assigned to number of data nodes based on number of mappers property. Here map phase works.

2. Once data is mapped, where clause operations are performed while shuffle sort phase, group also starts along with other operations

3. While reduce phase, aggregation gets performed to find maximum value, then output is written finally onto HDFS


Related Solutions

Discuss impact of Big data on databases and database design (Hadoop). Give examples of application.
Discuss impact of Big data on databases and database design (Hadoop). Give examples of application.
Give an example of a real world issue that can be explained using the prisoner's dilemma...
Give an example of a real world issue that can be explained using the prisoner's dilemma game. Specify the players in the game, the actions available to them and the payoff to the players in the game given outcome. Explain how the incentives for the players are similar to that of the prisoners in the prisoner's dilemma.
Explain the usefulness of a flexible budget in specific business cases. Give a real example of...
Explain the usefulness of a flexible budget in specific business cases. Give a real example of a flexible budget in an organization
Give me a real-life example of a data distribution that you*think* is approximately normal, but...
Give me a real-life example of a data distribution that you *think* is approximately normal, but with a large spread (i.e. large standard deviation). Use Google or other research methods to find the mean and standard deviation to support your claim. What reasons or conditions attribute to the large variation? Do some research, but do not give me a numerical example. a data distribution that you *think* is approximately normal, but with a very small spread (i.e. small standard deviation)....
please give me an example with the rectangular arbitrage based on the real world data thank...
please give me an example with the rectangular arbitrage based on the real world data thank you
Give a real-life data example for each of the following three cases: (a) False negatives are...
Give a real-life data example for each of the following three cases: (a) False negatives are less tolerable than false positives. (b) False positives are less tolerable than false negatives. (c) False positives and false negatives are of equivalent importance.
Give and example of a real life situation that includes: DATA STATISTIC PARAMETER How does the...
Give and example of a real life situation that includes: DATA STATISTIC PARAMETER How does the statistic and the parametere differ?
Periodic & Perpetual -explain -compare -Give an example by using manmap
Periodic & Perpetual -explain -compare -Give an example by using manmap
2. Is climate change real? d. Give example of climate (data) e. Do anthropogenic causes (data)...
2. Is climate change real? d. Give example of climate (data) e. Do anthropogenic causes (data) lead to climate change?
Give a real-life example of combination or permutation.
Give a real-life example of combination or permutation.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT