Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/GlareDB/glaredb

GlareDB: An analytics DBMS for distributed data
https://github.com/GlareDB/glaredb

analytics database rust sql

Last synced: about 1 month ago
JSON representation

GlareDB: An analytics DBMS for distributed data

Awesome Lists containing this project

README

        















## About

Data exists everywhere: your laptop, [Postgres], [Snowflake] and as
[files in S3]. It exists in various formats such as Parquet, CSV and JSON.
Regardless, there will always be multiple steps spanning several destinations to
get the insights you need.

**GlareDB is designed to query your data wherever it lives using SQL that you
already know.**

## Install

Install/update `glaredb` in the **current directory**:

```shell
curl https://glaredb.com/install.sh | sh
```

It may be helpful to install the binary in a location on your `PATH`. For
example, `~/.local/bin`.

If you prefer manual installation, download, extract and run the GlareDB binary
from a release in our [releases page].

## Getting Started

After [Installing](#install), get up and running with:

- [**CLI**](#local-cli)
- [Run GlareDB server](#local-server)
- [**Hybrid Execution**](#hybrid-execution)
- [**Python**](#using-glaredb-in-python)

### Local CLI

To start a local session, run the binary:

```shell
./glaredb
```

Or, you can execute SQL and immediately return (**try it out!**):

```shell
# Query a CSV on Hugging Face
./glaredb --query "SELECT * FROM \
'https://huggingface.co/datasets/fka/awesome-chatgpt-prompts/raw/main/prompts.csv';"
```

To see all options use `--help`:

```sh
./glaredb --help
```

### Hybrid Execution

1. Sign up at for a **free** fully-managed
deployment of GlareDB
2. Copy the connection string from GlareDB Cloud, for example:

```shell
./glaredb --cloud-url="glaredb://user:pass@host:port/deployment"
# or
./glaredb
> \open "glaredb://user:pass@host:port/deployment
```

Read our [announcement on Hybrid Execution] for more information.

### Using GlareDB in Python

1. Install the official [GlareDB Python library]

```shell
pip install glaredb
```

2. Import and use `glaredb`.

```python
import glaredb
con = glaredb.connect()
con.sql("select 'hello world';").show()
```

To use **Hybrid Execution**, sign up at and
use the connection string for your deployment. For example:

```python
import glaredb
con = glaredb.connect("glaredb://user:pass@host:port/deployment")
con.sql("select 'hello hybrid exec';").show()
```

GlareDB work with [Pandas] and [Polars] DataFrames out of the box:

```python
import glaredb
import polars as pl

df = pl.DataFrame(
{
"A": [1, 2, 3, 4, 5],
"fruits": ["banana", "banana", "apple", "apple", "banana"],
"B": [5, 4, 3, 2, 1],
"cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
}
)

con = glaredb.connect()

df = con.sql("select * from df where fruits = 'banana'").to_polars();

print(df)
```

### Local Server

The `server` subcommand can be used to launch a server process for GlareDB:

```shell
./glaredb server
```

To see all options for running in server mode, use `--help`:

```sh
./glaredb server --help
```

When launched as a server process, GlareDB can be reached on port `6543` using a
Postgres client. The following example uses `psql` to connect to a locally
running server:

```shell
psql "host=localhost user=glaredb dbname=glaredb port=6543"
```

## Configure the First Data Source

You can use a demo Postgres instance at `pg.demo.glaredb.com`. Adding this
Postgres instance as data source is as easy as running the following command:

```sql
CREATE EXTERNAL DATABASE my_pg
FROM postgres
OPTIONS (
host = 'pg.demo.glaredb.com',
port = '5432',
user = 'demo',
password = 'demo',
database = 'postgres',
);
```

Once the data source has been added, it can be queried using fully qualified
table names:

```sql
SELECT *
FROM my_pg.public.lineitem
WHERE l_shipdate <= date '1998-12-01' - INTERVAL '90'
LIMIT 5;
```

Check out the docs to learn about all [supported data sources]. Many data
sources can be connected to the same GlareDB instance.

Done with this data source? Remove it with the following command:

```sql
DROP DATABASE my_pg;
```

## Supported Data Sources

| Source | Read | `INSERT INTO` | `COPY TO` | Table Function | External Table | External Database |
|-----------------------|:----:|:-------------:|-----------|:--------------:|:--------------:|-------------------|
| **Databases** | -- | -- | -- | -- | -- | -- |
| MySQL | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| PostgreSQL | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| MariaDB _(via mysql)_ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| MongoDB | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Microsoft SQL Server | ✅ | 🚧 | 🚧 | ✅ | ✅ | ✅ |
| Snowflake | ✅ | 🚧 | 🚧 | ✅ | ✅ | ✅ |
| BigQuery | ✅ | 🚧 | 🚧 | ✅ | ✅ | ✅ |
| Cassandra/ScyllaDB | ✅ | 🚧 | 🚧 | ✅ | ✅ | ✅ |
| ClickHouse | ✅ | 🚧 | 🚧 | ✅ | ✅ | ✅ |
| Oracle | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 |
| ADBC | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 |
| ODBC | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 |
| **Database Files** | -- | -- | -- | -- | -- | -- |
| SQLite | ✅ | ✅ | 🚧 | ✅ | ✅ | ✅ |
| Microsoft Excel | ✅ | 🚧 | 🚧 | ✅ | ✅ | ➖ |
| DuckDB | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 |
| **File Formats** | -- | -- | -- | -- | -- | -- |
| Apache Arrow | ✅ | 🚧 | ✅ | ✅ | ✅ | ➖ |
| Apache Parquet | ✅ | 🚧 | ✅ | ✅ | ✅ | ➖ |
| CSV | ✅ | 🚧 | ✅ | ✅ | ✅ | ➖ |
| JSON | ✅ | 🚧 | ✅ | ✅ | ✅ | ➖ |
| BSON | ✅ | 🚧 | ✅ | ✅ | ✅ | ➖ |
| Apache Avro | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 | ➖ |
| Apache ORC | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 | ➖ |
| **Table Formats** | -- | -- | -- | -- | -- | -- |
| Lance | ✅ | ✅ | ✅ | ✅ | ✅ | ➖ |
| Delta | ✅ | ✅ | ✅ | ✅ | ✅ | ➖ |
| Iceberg | ✅ | 🚧 | 🚧 | ✅ | ✅ | ➖ |

✅ = Supported
➖ = Not Applicable
🚧 = Not Yet Supported

## Building from Source

Building GlareDB requires Rust/Cargo to be installed. Check out [rustup](https://rustup.rs/) for
an easy way to install Rust on your system.

Running the following command will build a release binary:

```shell
just build --release
```

The compiled release binary can be found in `target/release/glaredb`.

## Documentation

Browse GlareDB documentation on our [docs.glaredb.com](https://docs.glaredb.com).

## Contributing

Contributions welcome! Check out [CONTRIBUTING.md](CONTRIBUTING.md) for how to get started.

## License

See [LICENSE](./LICENSE). Unless otherwise noted, this license applies to all files in
this repository.

## Acknowledgements

GlareDB is proudly powered by [Apache Datafusion](https://arrow.apache.org/datafusion/) and [Apache Arrow](https://arrow.apache.org/). We are grateful for the work of the Apache Software Foundation and the community around these projects.

[Postgres]: https://docs.glaredb.com/data-sources/postgres.html
[Snowflake]: https://docs.glaredb.com/data-sources/snowflake.html
[files in S3]: https://docs.glaredb.com/data-sources/s3.html
[releases page]: https://github.com/GlareDB/glaredb/releases
[announcement on Hybrid Execution]: https://glaredb.com/blog/hybrid-execution
[GlareDB Python library]: https://pypi.org/project/glaredb/
[Pandas]: https://github.com/pandas-dev/pandas
[Polars]: https://github.com/pola-rs/polars
[supported data sources]: https://docs.glaredb.com/data-sources/