Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/aphp/hiveqlkernel

HiveQL Jupyter Kernel
https://github.com/aphp/hiveqlkernel

hive hiveql jupyter kernel notebook

Last synced: about 1 month ago
JSON representation

HiveQL Jupyter Kernel

Awesome Lists containing this project

README

        

# HiveQL Kernel

### Requirements

If you are going to connect using kerberos:

```
sudo apt-get install python3-dev libsasl2-dev libsasl2-2 libsasl2-modules-gssapi-mit
```

### Installation

To install the kernel:

```
pip install --upgrade hiveqlKernel
jupyter hiveql install --user
```

### Connection configuration

Two methods are available to connect to a Hive server:

* Directly inside the notebook
* Using a configuration file

If the configuration file is present, everytime you run a new HiveQL kernel it uses it, else you must configure your connection inside the notebook. The configuration in the notebook overwrites the one in the configuration file if present.

#### Configure directly in the notebook cells

Inside a Notebook cell, copy&paste this, change the configuration to match your needs, and run it.

```
$$ url=hive://@:/
$$ connect_args={"auth": "KERBEROS", "kerberos_service_name": "hive", "configuration": {"tez.queue.name": "myqueue"}}
$$ pool_size=5
$$ max_overflow=10
```

These args are passed to sqlalchemy, who registered pyHive as the 'hive' SQL back-end.
See [github.com/dropbox/PyHive](https://github.com/dropbox/PyHive/#sqlalchemy).

#### Configure using a configuration file

The HiveQL kernel is looking for the configuration file at `~/.hiveql_kernel.conf` by default. You can specify another path using `HIVE_KERNEL_CONF_FILE`.

The contents must be like this (in json format):

```
{ "url": "hive://@:/", "connect_args" : { "auth": "KERBEROS", "kerberos_service_name":"hive", "configuration": {"tez.queue.name": "myqueue"}}, "pool_size": 5, "max_overflow": 10, "default_limit": 20, "display_mode": "be" }
```

### Usage

Inside a HiveQL kernel you can type HiveQL directly in the cells and it displays a HTML table with the results.

You also have other options, like changing the default display limit (=20) like this :

```
$$ default_limit=50
```

Some hive functions are extended. They allow to filter with some patterns.

```
SHOW TABLES
SHOW DATABASES
```

### Run tests

```
python -m pytest
```

Have fun!