Spark can be integrated with hive metastore to have a common metastore layer between hive and spark. In this blog I will detail out steps on how to reuse hive metastore for spark engine.
Prerequisite:
1. Existing Hadoop installation
2. Existing Hive Installation
3. Existing Spark Installation: Steps to install Spark can be found here
Step 1: Copy the Hive Metastore RDBMS Driver from hive/lib to spark/jars folder
Command: cp hive/lib/mysql-connector-java-8.0.28.jar spark/jars/
Note: Assuming the Hive Metastore is MySQL database.
Step 2: Ensure MySQL and Hive Metastore Services are running
command:
sudo systemctl start mysql
hive --service metastore &
Step 3: Edit $SPARK_HOME/conf/spark-defaults.conf (create it if missing):
Add the following line.
spark.sql.catalogImplementation=hive
Start Spark Shell: spak-shell
And execute below line at the Scala prompt: spark.sql("SHOW DATABASES").show()
If it shows all the hive databases, then the integration is successful.
Command:
To verify: jps
To start:
cd /home/hdoop/hadoop/sbin
./start-all.sh
Step 6: Run a HQL to read the data (which is in HDFS) from a table
Step 7: Accessing HIVE Databases and Tables from spark-sql
If the above configuration is working fine, hive databases and tables can be access direct from the spark-sql.
Command: spark-sql
No comments:
Post a Comment