In: Computer Science
4. What is ZooKeeper? Who developed it? Describe its main functions.
`Hey,
Note: Brother if you have any queries related the answer please do comment. I would be very happy to resolve all your queries.
ZooKeeper is an open source Apache project that provides a centralized service for providing configuration information, naming, synchronization and group services over large clusters in distributed systems. The goal is to make these systems easier to manage with improved, more reliable propagation of changes.
ZooKeeper was originally
developed at Yahoo! for streamlining the processes
running on big-data clusters by storing the status in local log
files on the ZooKeeper servers.
If you had a Hadoop cluster spanning 500 or more commodity servers, you would need centralized management of the entire cluster in terms of name, group and synchronization services, configuration management, and more. Other open source projects using Hadoop clusters require cross-cluster services. Embedding ZooKeeper means you don’t have to build synchronization services from scratch. Interaction with ZooKeeper occurs by way of Java™ or C interface time.
For applications, ZooKeeper provides an infrastructure for cross-node synchronization by maintaining status type information in memory on ZooKeeper servers. A ZooKeeper server keeps a copy of the state of the entire system and persists this information in local log files. Large Hadoop clusters are supported by multiple ZooKeeper servers, with a master server synchronizing the top-level servers.
Within ZooKeeper, an application can create what is called a znode, which is a file that persists in memory on the ZooKeeper servers. The znode can be updated by any node in the cluster, and any node in the cluster can register to be notified of changes to that znode.
Put simply, applications can synchronize their tasks across the distributed cluster by updating their status in a ZooKeeper znode. The znode then informs the rest of the cluster of a specific node’s status change. This cluster-wide status centralization service is critical for management and serialization tasks across a large distributed set of servers.
Kindly revert for any queries
Thanks.