Question

In: Computer Science

explain in short for the following a) hadoop is not fit for lot of small files...

explain in short for the following
a) hadoop is not fit for lot of small files why?
b) what is the good upper limit for input split? why?
c) what happens if the input splits are too small for a small file ?
d) The amount of data is less so why to prevent map reduce ?

Solutions

Expert Solution

a)hadoop is not fit for lot of small files why?

Answer: Hadoop is used for big data and is not suited fro small data.Hadoop has some limitations with small files or data because of its high capacity design.Small files are significantly smaller than the Hadoop File System which is default 128 MB.

b) what is the good upper limit for input split? why?

Answer: By default the split size is approximately equal to block size of HDFS. Input split is user defined and the user can control split size based on the size of data in MapReduce program.The reason for this is to minimize the cost of seek and reduce the meta data information generated per block.

c) what happens if the input splits are too small for a small file ?

Answer: The amount of processing time per file will be huge.So we need to reduce the split size so that we can utilize more nodes.

d) The amount of data is less so why to prevent map reduce ?

Answer: Hadoop is highly scalable.This is largely beacause of its ability to distrubute large data sets.Map reduce allows the storage and processing the data in very affordable way.That it can also be used for later times.Mapreduce provide security for the data storage.


Related Solutions

Java Chapter 12.29 (Rename files) suppose you have a lot of files in a directory named...
Java Chapter 12.29 (Rename files) suppose you have a lot of files in a directory named Exercisei_j, where i and j are digits. Write a program that pads a 0 before j if j is a single digit. For example, a file named Exercise2_1 in a directory will be renamed to Exercise2_01. In Java, when you pass the symbol * from the command line, it refers to all files in the directory (see Supplement III.V). Use the following command to...
. The Hadoop framework includes many parts. Research more about following topics and describe briefly. Explain...
. The Hadoop framework includes many parts. Research more about following topics and describe briefly. Explain the use of Hadoop.
Write a short paragraph on the following topic: Using social media such as Facebook consumes lot...
Write a short paragraph on the following topic: Using social media such as Facebook consumes lot of valuable time that results in decreased productivity both personally and professionally
The Coronavirus will change a lot of things for the entire world in the short and...
The Coronavirus will change a lot of things for the entire world in the short and long term. In order to be able to survive trough this tough time, big organizations and companies will need to have strong leaders. What kind of leaders will be needed the most between Path Goal Leaders, Transactional Leaders, Situational Leaders, Principle centered leaders, Servant leaders, Transformational leaders, Democratic leaders, Pace Setting Leaders, Coercive Leaders, and Authoritative Leaders? Explain Why. ( You can use 3...
Topic: Data Models Explain what Hadoop is in detail, and what are its basic components?
Topic: Data Models Explain what Hadoop is in detail, and what are its basic components?
Explain the overlapping use cases between data warehousing and distributed computing technologies such as Hadoop
Explain the overlapping use cases between data warehousing and distributed computing technologies such as Hadoop
Explain what Windows prefetch files are.
Explain what Windows prefetch files are.
How do long and short term objectives fit in with operational and investment plans?
How do long and short term objectives fit in with operational and investment plans?
An Oxxo has a small parking lot with three spaces reserved for customers. If the store...
An Oxxo has a small parking lot with three spaces reserved for customers. If the store is open cars arrive and use a space with an average rate of 2 per hour. For n = 0, 1, 2, 3, the probability P n of that there are exactly n spaces occupied is P 0 = 0.1, P 1 = 0.2, P 2 = 0.4, P 3 = 0.3. a) Describe the interpretation of this parking lot as a queuing system,...
I know it's a lot but they are all small questions related to the same case...
I know it's a lot but they are all small questions related to the same case so I didn't know how to split it into multiple questions. I would really appreciate the help, as I need a way to compare my answers. Dallas & Associates Financial Statement Preparation & Analysis You have been hired as a senior financial analyst for Dallas and Associates and you are in charge of preparing the financial statements and presenting an annual analysis at the...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT