https://github.com/adampaternostro/azure-databricks-external-hive-and-adls
Shows how to use an External Hive (SQL Server) along with ADLS Gen 1 as part of a Databricks initialization script that runs when the cluster is created.
https://github.com/adampaternostro/azure-databricks-external-hive-and-adls
azure azure-data-lake azure-databricks hive-metastore
Last synced: 7 months ago
JSON representation
Shows how to use an External Hive (SQL Server) along with ADLS Gen 1 as part of a Databricks initialization script that runs when the cluster is created.
- Host: GitHub
- URL: https://github.com/adampaternostro/azure-databricks-external-hive-and-adls
- Owner: AdamPaternostro
- Created: 2018-10-12T19:26:36.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2018-11-12T18:40:42.000Z (almost 7 years ago)
- Last Synced: 2025-01-31T04:32:41.716Z (8 months ago)
- Topics: azure, azure-data-lake, azure-databricks, hive-metastore
- Language: PowerShell
- Size: 13.7 KB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Azure-Databricks-External-Hive-and-ADLS
Shows how to use an External Hive (SQL Server) along with ADLS Gen 1 as part of a Databricks initialization script that runs when the cluster is created.### Stpes
1. Create an Azure Databrick Workspace in the Azure portal
2. Open the workspace and click on your name
3. Then select User Settings
4. Create a new token (save the value)
5. Download Databricks CLI https://docs.databricks.com/user-guide/dev-tools/databricks-cli.html
6. Configure CLI ```databricks configure --token```
7. Update the values in the external-metastore.sh
* This assumes you have a Hive metastore in SQL Server, a ADLS Gen 1 account and a Service Principle
* You can create an HDI Cluster and let that create your metastore
* Please check your Hive version and update the external-metastore.sh
* The external-metastore.sh tells Databricks not to check the schema (validation)
8. Upload the external metastore (run the lines below, you can run over and over if you'd like)```
dbfs mkdirs dbfs:/databricks/init/
dbfs rm dbfs:/databricks/init/external-metastore.sh
dbfs cp external-metastore.sh dbfs:/databricks/init/external-metastore.sh
```### To Test ADLS (run this in a notebook)
This assume you have a file (CSV to view or just do a directory listing)
```
%scala
dbutils.fs.ls("adl://YOUR-DATA-LAKE-HERE.azuredatalakestore.net/DIRECTORY-PATH")
val df = spark.read.text("adl://YOUR-DATA-LAKE-HERE.azuredatalakestore.net/DIRECTORY-PATH/SAMPLE-FILE.csv")
df.show()
```### To Test External Hive (run this in a notebook)
```
%sql
show tables;
```### Reference
https://docs.databricks.com/user-guide/advanced/external-hive-metastore.html
https://docs.databricks.com/spark/latest/data-sources/azure/azure-datalake.html