Question

In: Computer Science

Introduction to Big Data 1-Explain the two main method of MAPREDUCE? 2-What are the different node...

Introduction to Big Data

1-Explain the two main method of MAPREDUCE?

2-What are the different node can be found in Hadoop eco-system?

3-Advantage of using Hadoop?

Solutions

Expert Solution

Answer 1:

Mapreduce is a style of computing or processing technique as well as a programming model relevant to distributed computing. It is used to process large amounts of data parallelly on large clusters in the most reliable manner. It can be used in such a way so as to make the large-scale computations tolerant to hardware faults.

The two main methods of MapReduce are as follows:

1) Map

The Map function accepts an input element as its argument and produces zero or more key-value pairs. The types of keys and values are each arbitrary. A Map task can produce several key-value pairs with the same key, even from the same element. These elements given as input could be anything like a tuple or even an entire document.

2) Reduce

The Reduce function takes as argument a pair comprising of a key and its list of values. The yield of the Reduce function is a sequence of zero or more key-value pairs. A Reduce task executes one or more reducers and outputs from all the Reduce tasks are merged into a single file.

Answer 2:

The different nodes that can be found in the Hadoop ecosystem are:

1) NameNode

It is the most important Hadoop daemon. For distributed storage, Hadoop employs a master/slave architecture. This storage system is called the Hadoop file system (HDFS). The NameNode is the master of HDFS that directs the slave DataNode daemons to perform the low-level I/O tasks. It is also referred to as the book-keeper of HDFS. The function of the NameNode is memory and I/O intensive. But the negative side to this is that it is a single point of failure for the Hadoop cluster.

2) DataNode

This node is hosted in each slave machine to perform grunt work of the distributed filesystem. This work is reading and writing HDFS blocks to actual files on the local filesystem. DataNodes are constantly reporting to the NameNode. If any DataNode crashes or becomes inaccessible over the network, you’ll still be able to read the files.

Answer 3:

The advantages of using Hadoop are:

1) It is an open-source framework and therefore it is free. It uses commodity hardware to store and process the data, so the cost is low.

2) If any node fails then tasks running are automatically redirected to other nodes. Moreover, multiple copies of all data are stored automatically which is useful when nodes fail.

3) The administration required is very less as nodes can be easily added and removed along with that failed nodes are detected with ease.

4) Hadoop has the provision for huge and flexible storage due to thousands of nodes in the cluster.

5) Hadoop is advantageous because of its high computing power which is received due to the thousands of nodes in the clsuter.


Related Solutions

1.    What is Big Data? Why Is Big Data Different? (from data mart, data warehouse) 2.    What Are...
1.    What is Big Data? Why Is Big Data Different? (from data mart, data warehouse) 2.    What Are the Benefits of Big Data? 3.    Some of the potential business benefits from implementing an effective big data analytics 4.    How can organization leverage Big Data? For example, Big Data can be used to develop the next generation of products and services. For instance, manufacturers are using data obtained from sensors embedded in products to create innovative after-sales service offerings such as proactive maintenance to avoid...
1) What is Big Data? 2) What challenges and risks does Big Data present to business?...
1) What is Big Data? 2) What challenges and risks does Big Data present to business? 3) Provide one or more example of the use and benefits of Big Data
What are the the three characteristics of Big Data, and what are the main considerations in...
What are the the three characteristics of Big Data, and what are the main considerations in processing Big Data?
Explain what is meant by “big data”. Write three examples for Big Data.
Explain what is meant by “big data”. Write three examples for Big Data.
What is Big Data? Why is it important? Where does Big Data come from? EXplain with...
What is Big Data? Why is it important? Where does Big Data come from? EXplain with your own words please
1. What are the main data types in Python? Which data types are sequences? 2. What...
1. What are the main data types in Python? Which data types are sequences? 2. What are some general guidelines for naming a ‘variable’ in Python? Give at least three examples of variable names that follow these guidelines 3. Provide an example of the Python syntax to create a variable to hold the age of a building (for example,40 years old). a. Provide the Python syntax to test whether the variable created above is more than 35, and print the...
1. Discuss What is the Role of Technology and Big Data in Innovation & Entrepreneurship 2....
1. Discuss What is the Role of Technology and Big Data in Innovation & Entrepreneurship 2. What Does Innovative Entrepreneurship means to you ? 3. What is your Innovative Entrepreneuship dream ? 4. how do you going to start this dream ?
Explain what is data and list the different types of data? List and explain the different...
Explain what is data and list the different types of data? List and explain the different methods to collect data.
1) What are the main differences between the NPV method and the IRR? 2) When does...
1) What are the main differences between the NPV method and the IRR? 2) When does IRR give you the wrong answer? 3) How does the MIRR avoid the IRR shortcomings? Please answer all sections with 5-7 sentences for each question.
1) In a two-way ANOVA, what are main effects? 2) In a two-way ANOVA, what are...
1) In a two-way ANOVA, what are main effects? 2) In a two-way ANOVA, what are interaction effects?
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT