Introduction
Apache ZooKeeper is a distributed coordination service for managing configuration, synchronization, and naming services in large-scale distributed systems. It helps maintain consistent and fault-tolerant cluster management.
Key Features of ZooKeeper:
✅ Leader Election
✅ Configuration Management
✅ Distributed Synchronization
✅ Naming Service
✅ Failure Detection
ZooKeeper in HBase
HBase is a distributed NoSQL database that relies on ZooKeeper for coordination.
🔹 HBase uses ZooKeeper for:
✅ Master Election – Ensures only one HMaster is active.
✅ RegionServer Coordination – Tracks active RegionServers.
✅ Failure Detection – Detects RegionServer crashes and triggers reassignments.
✅ Metadata Storage – Stores HBase root metadata, such as table structure and regions.
👉 Without ZooKeeper: HBase cannot assign or manage regions effectively, leading to inconsistency and failure.
ZooKeeper in Kafka
Kafka is a distributed event streaming platform that requires ZooKeeper to manage brokers.
🔹 Kafka uses ZooKeeper for:
✅ Broker Coordination – Tracks active brokers in the cluster.
✅ Topic Management – Stores metadata about topics, partitions, and replicas.
✅ Leader Election – Selects the leader for each partition.
✅ Consumer Group Management – Keeps track of consumer offsets.
👉 Without ZooKeeper: Kafka brokers cannot coordinate, leading to potential data loss or unavailability.
ZooKeeper in Sqoop
Sqoop is a tool for importing and exporting data between HDFS and RDBMS.
🔹 Sqoop uses ZooKeeper for:
✅ Job Coordination – When used with Sqoop Metastore, ZooKeeper helps track job status.
✅ Fault Tolerance – Ensures jobs resume correctly if interrupted.
✅ Load Balancing – Helps manage parallel data transfer across multiple nodes.
👉 Without ZooKeeper: Distributed Sqoop jobs might fail due to lack of synchronization.
ZooKeeper Installation
Step 1: Install Java
sudo apt updatesudo apt install default-jdk
java --version
Step 2: Create a Dedicate user for Zookeeper
Step 3: Download and Install Zookeeper
wget https://archive.apache.org/dist/zookeeper/zookeeper-3.5.9/apache-zookeeper-3.5.9-bin.tar.gz
sudo tar -xzf apache-zookeeper-*.tar.gz
ln -s /home/hdoop/apache-zookeeper-3.5.9-bin /home/hdoop/zookeeper
sudo chown -R hdoop:hdoop /home/hdoop/zookeeper
sudo chown -R hdoop:hdoop /home/hdoop/apache-zookeeper-3.5.9-bin
Step 4: Setup the Zookeeper Data Directory
sudo mkdir -p /home/hdoop/zookeeper/data
sudo chown hdoop:hdoop /home/hdoop/zookeeper/data
Step 5: Configure ZooKeeper
Create a configuration file by using the provided sample in config folder.
cp /home/hdoop/zookeeper/conf/zoo_sample.cfg /home/hdoop/zookeeper/conf/zoo.cfg
nano /home/hdoop/zookeeper/conf/zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/home/hdoop/zookeeper/data
clientPort=2181
maxClientCnxns=60
6. Perform System Service Setup
sudo nano /etc/systemd/system/zookeeper.service
[Unit]
Description=Apache ZooKeeper Service
After=network.target
[Service]
Type=forking
User=hdoop
Group=hdoop
ExecStart=/home/hdoop/zookeeper/bin/zkServer.sh start /home/hdoop/zookeeper/conf/zoo.cfg
ExecStop=/home/hdoop/zookeeper/bin/zkServer.sh stop
Restart=always
WorkingDirectory=/home/hdoop/zookeeper
#PIDFile=/home/hdoop/zookeeper/zookeeper_server.pid
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl start zookeeper
sudo systemctl enable zookeeper
sudo systemctl status zookeeper
zookeeper/bin/zkCli.sh -server 127.0.0.1:2181
ls /
No comments:
Post a Comment