An open API service indexing awesome lists of open source software.

https://github.com/adampaternostro/azure-databricks-external-hive-and-adls

Shows how to use an External Hive (SQL Server) along with ADLS Gen 1 as part of a Databricks initialization script that runs when the cluster is created.
https://github.com/adampaternostro/azure-databricks-external-hive-and-adls

azure azure-data-lake azure-databricks hive-metastore

Last synced: 7 months ago
JSON representation

Shows how to use an External Hive (SQL Server) along with ADLS Gen 1 as part of a Databricks initialization script that runs when the cluster is created.

Awesome Lists containing this project

README

          

# Azure-Databricks-External-Hive-and-ADLS
Shows how to use an External Hive (SQL Server) along with ADLS Gen 1 as part of a Databricks initialization script that runs when the cluster is created.

### Stpes
1. Create an Azure Databrick Workspace in the Azure portal
2. Open the workspace and click on your name
3. Then select User Settings
4. Create a new token (save the value)
5. Download Databricks CLI https://docs.databricks.com/user-guide/dev-tools/databricks-cli.html
6. Configure CLI ```databricks configure --token```
7. Update the values in the external-metastore.sh
* This assumes you have a Hive metastore in SQL Server, a ADLS Gen 1 account and a Service Principle
* You can create an HDI Cluster and let that create your metastore
* Please check your Hive version and update the external-metastore.sh
* The external-metastore.sh tells Databricks not to check the schema (validation)
8. Upload the external metastore (run the lines below, you can run over and over if you'd like)

```
dbfs mkdirs dbfs:/databricks/init/
dbfs rm dbfs:/databricks/init/external-metastore.sh
dbfs cp external-metastore.sh dbfs:/databricks/init/external-metastore.sh
```

### To Test ADLS (run this in a notebook)
This assume you have a file (CSV to view or just do a directory listing)
```
%scala
dbutils.fs.ls("adl://YOUR-DATA-LAKE-HERE.azuredatalakestore.net/DIRECTORY-PATH")
val df = spark.read.text("adl://YOUR-DATA-LAKE-HERE.azuredatalakestore.net/DIRECTORY-PATH/SAMPLE-FILE.csv")
df.show()
```

### To Test External Hive (run this in a notebook)
```
%sql
show tables;
```

### Reference
https://docs.databricks.com/user-guide/advanced/external-hive-metastore.html
https://docs.databricks.com/spark/latest/data-sources/azure/azure-datalake.html