In: Computer Science
a)hadoop is not fit for lot of small files why?
Answer: Hadoop is used for big data and is not suited fro small data.Hadoop has some limitations with small files or data because of its high capacity design.Small files are significantly smaller than the Hadoop File System which is default 128 MB.
b) what is the good upper limit for input split? why?
Answer: By default the split size is approximately equal to block size of HDFS. Input split is user defined and the user can control split size based on the size of data in MapReduce program.The reason for this is to minimize the cost of seek and reduce the meta data information generated per block.
c) what happens if the input splits are too small for a small file ?
Answer: The amount of processing time per file will be huge.So we need to reduce the split size so that we can utilize more nodes.
d) The amount of data is less so why to prevent map reduce ?
Answer: Hadoop is highly scalable.This is largely beacause of its ability to distrubute large data sets.Map reduce allows the storage and processing the data in very affordable way.That it can also be used for later times.Mapreduce provide security for the data storage.