https://github.com/adampaternostro/azure-databricks-hdinsight-hive-metastore

How to share an HDInsight Hive Metastore with Azure Databricks
https://github.com/adampaternostro/azure-databricks-hdinsight-hive-metastore

azure azure-database azure-hdinsight databricks hdinsight hive-metastore

Last synced: 7 months ago
JSON representation

How to share an HDInsight Hive Metastore with Azure Databricks

Host: GitHub
URL: https://github.com/adampaternostro/azure-databricks-hdinsight-hive-metastore
Owner: AdamPaternostro
Created: 2018-09-27T16:13:11.000Z (about 7 years ago)
Default Branch: master
Last Pushed: 2018-09-27T16:20:07.000Z (about 7 years ago)
Last Synced: 2025-01-31T04:32:40.614Z (9 months ago)
Topics: azure, azure-database, azure-hdinsight, databricks, hdinsight, hive-metastore
Size: 3.91 KB
Stars: 1
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Azure-Databricks-HDInsight-Hive-Metastore

How to share an HDInsight Hive Metastore with Azure Databricks

## Steps

1. Create a SQL Database (PaaS) server in Azure

2. Create an empty database in under the server

3. Create an HDInsight cluster in Azure and point the external Hive metastore to your database

4. Delete the HDInsight cluster after it has been created

5. Create a Databricks workspace in Azure

6. Launch the workspace and open a notebook

7. Run Step 1 in a cell

8. Run Step 2 in a cell (replace your server name, database name, username and password)

9. Terminate your Databricks cluster

10. Restart your cluster

11. In a SQL notebook you can run SHOW TABLES (you should see the sample table from HDInsight)

    * NOTE: You will not be able to select the data from this table.  The table is probably pointing to blob or ADLS storage.  You should create all your tables in HDI/Databricks as EXTERNAL and makes sure you LOCATION property is set to blob or ADLS.  You Databricks cluster will also need to be authorized to access ADLS or blob.

### Step 1

```

dbutils.fs.mkdirs("dbfs:/databricks/init/")

```

### Step 2

```

dbutils.fs.put(

    "/databricks/init/external-metastore.sh",

    """#!/bin/sh

      |# Loads environment variables to determine the correct JDBC driver to use.

      |source /etc/environment

      |# Quoting the label (i.e. EOF) with single quotes to disable variable interpolation.

      |cat << 'EOF' > /databricks/driver/conf/00-custom-spark.conf

      |[driver] {

      |    # Hive specific configuration options for metastores in the local mode.

      |    "spark.hadoop.javax.jdo.option.ConnectionURL" = "jdbc:sqlserver://<>.database.windows.net:1433;database=<>;encrypt=true;trustServerCertificate=true;create=false;loginTimeout=300"

      |    "spark.hadoop.javax.jdo.option.ConnectionUserName" = "<>"

      |    "spark.hadoop.javax.jdo.option.ConnectionPassword" = "<>"

      |    "hive.metastore.schema.verification.record.version" = "true"

      |    "spark.sql.hive.metastore.jars" = "maven"

      |    "hive.metastore.schema.verification" = "true"

      |    "spark.sql.hive.metastore.version" = "2.1.1"

      |EOF

      |# Add the JDBC driver separately since must use variable expansion to choose the correct

      |# driver version.

      |cat << EOF >> /databricks/driver/conf/00-custom-spark.conf

      |    "spark.hadoop.javax.jdo.option.ConnectionDriverName" = "com.microsoft.sqlserver.jdbc.SQLServerDriver"

      |}

      |EOF

      |""".stripMargin,

    overwrite = true

)

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/adampaternostro/azure-databricks-hdinsight-hive-metastore

Awesome Lists containing this project

README