In: Computer Science
. The Hadoop framework includes many parts. Research more about following topics and describe briefly. Explain the use of Hadoop.
Before going to explore how hadoop can use and help to today's world.
Example...
Consider the Google (The Gaint Company ) ,most popularly used as search engine, that contains a lot of information had to store in the database but most of the data was an unstructured format like videos, images, audios, web logs etc resulted in a humongous amount of data. so for every query that we search on the google, the data should be managed quickly and provide results. Not only google, facebook , whatsapp, and other social networking platforms contains the lot of information/data as in unstructured format. so, managing the data is very crucial and the increasing in data at a rapid pace is significant importance. Thus the concept of BIGDATA came in to picture.
When we have large of data, we need to have high end servers like huge amount of storage and memory should be needed. If the amount of data increases, the storage capacity need to be increase or the data should be moved to higher capacity and memory that have massive processing power. However the servers (Rack servers) were very Expensive and not feasible as having in a single server would result in allowing rapid failures in processing the data.
if we place a multiple servers that have high storage and memory but, we cannot process the data at a time in multple servers. if we still have a chance we cannot process the data as fast as possible so this approach may have drawbacks such that the concept of "Distributed Computing" Came to existence. if there is huge data (Peta Bytes), the transferring of data would time consuming process. By considering the all these problems, the concept of Commodity Hardware introduced.
In the concept of BigData, the processing would be directed to huge amount of data and manage the data, the use of Commodity hardware ( it is an economical. low performance systems ) and we can also apply the distributed computing. so if there any failure in one system, the data replication would be possible that can be consider and executed some other node.
Thus Hadoop project is used as distributed computing that were currently handled by the APACHE Foundation. Basically, Hadoop is java based open source framework which is used as one of the way of storing enormous data sets across distributed cluster of servers.
The data processing framework is the tool used to work with the data itself.
Hadoop is majorly classified into two categories.
HADOOP 1.X cluster:
it contains the NameNode , DataNode , JobTracker, TaskTracker
HADOOP 2.X cluster:
As the 1.X has its limitations due to scalability, low availability. To overcome these drawbacks, Apache Hadoop Community has recently released the Hadoop 2.X
Hadoop provides three layers as a backbone. they are
in addition that , there are other important components
Resource Manager:
it only responsible to schedules required resources to Apllications
Node Manager:
it is responsible for managing the life cycle of the container.