An open API service indexing awesome lists of open source software.

https://github.com/gigapi/gigapi

GigAPI: The Infinite Timeseries Data Lake. DuckDB OLAP + Parquet Query Engine & Cloud-Native Storage API. Drop-in compatible InfluxDB3 alternative.
https://github.com/gigapi/gigapi

api clickhouse clickhouse-server data-lake database datalake duckdb duckdb-api duckdb-engine duckdb-server ducklake gigapipe golang olap parquet qryn query-engine rest-api s3 sql

Last synced: 15 days ago
JSON representation

GigAPI: The Infinite Timeseries Data Lake. DuckDB OLAP + Parquet Query Engine & Cloud-Native Storage API. Drop-in compatible InfluxDB3 alternative.

Awesome Lists containing this project

README

        

#

# GigAPI: The Infinite Timeseries Lakehouse

Like a durable parquet floor, GigAPI provides rock-solid data foundation for your queries and analytics

### **Problem**
> Traditional "always-on" OLAP databases such as ClickHouse are fast but expensive to operate, complex to manage and scale, often promoting a cloud product. Data lakes and Lake houses are cheaper but can't always handle real-time ingestion or compaction and querying growing datasets such as timeseries brings back costly operations and complexity. Various _"opencore"_ poison solutions out there.

### **Solution**
> GigAPI is a timeseries optimized "lakehouse" designed for realtime data - lots of it - and returning queries as fast as possible. By combining DuckDB's performance, FlightSQL efficiency and Parquet's reliablity with smart metadata we've created a simple, lightweight solution ready to decimate complexity and infrastructure costs for ourselves and others.
> GigAPI is _100% opensource - no open core or cloud product gimmicks_.

### GigAPI Features

* Fast: DuckDB SQL + Parquet powered OLAP API Engine
* Flexible: Schema-less Parquet Ingestion & Compaction
* Simple: Low Maintenance, Portable Catalog, Infinitely Scalable
* Smart: Independent storage/write and compute/read components
* Extensible: Built-In Query Engine _(DuckDB)_ or BYODB _(ClickHouse, Datafusion, etc)_

> [!WARNING]
> GigAPI is an open beta developed in public. Bugs and changes should be expected. Use at your own risk.

## Usage

> Here's the most basic example. For more complex usage samples see the [examples](/examples) directory
```yml
services:
gigapi:
image: ghcr.io/gigapi/gigapi:latest
container_name: gigapi
hostname: gigapi
restart: unless-stopped
volumes:
- ./data:/data
ports:
- "7971:7971"
environment:
- GIGAPI_ROOT=/data
```
### Settings

| Env Var Name | Description | Default Value |
|--------------------------|--------------------------------------------------------------|---------------|
| `GIGAPI_ROOT` | Root folder for all the data files | |
| `GIGAPI_MERGE_TIMEOUT_S` | Base timeout between merges (in seconds) | `10` |
| `GIGAPI_SAVE_TIMEOUT_S` | Timeout before saving the new data to the disk (in seconds)| `1` |
| `GIGAPI_NO_MERGES` | Disable merging | `false` |
| `GIGAPI_UI` | Enable UI for querier | `true` |
| `GIGAPI_MODE` | Execution mode (`readonly`, `writeonly`, `compaction`, `aio`) | `"aio"` |
| `GIGAPI_METADATA_TYPE` | Metadata Type (`json` for local, `redis` for distributed) | `"json"` |
| `GIGAPI_METADATA_URL` | Metadata Type URL for redis (ie: `redis://redis:6379/0` | |
| `HTTP_PORT` | Port to listen on for HTTP server | `7971` |
| `HTTP_HOST` | Host to bind to for HTTP server | `"0.0.0.0"` |
| `HTTP_BASIC_AUTH_USERNAME` | Username for HTTP basic authentication | |
| `HTTP_BASIC_AUTH_PASSWORD` | Password for HTTP basic authentication | |
| `FLIGHTSQL_PORT` | Port to run FlightSQL server | `8082` |
| `FLIGHTSQL_ENABLE` | Enable FlightSQL server | `true` |
| `LOGLEVEL` | Log level (debug, info, warn, error, fatal) | `"info"` |
| `DUCKDB_MEM_LIMIT` | DuckDB memory limit (e.g. 1GB) | `"1GB"` |
| `DUCKDB_THREAD_LIMIT` | DuckDB thread limit (int) | `1` |

> You can override the defaults by setting these environment variables before starting the service.


## Write Support
As write requests come in to GigAPI they are parsed and progressively appeanded to parquet files alongside their metadata. The ingestion buffer is flushed to disk at configurable intervals using a hive partitioning schema. Generated parquet files and their respective metadata are progressively compacted and sorted over time based on configuration parameters.

### API
GigAPI provides an HTTP API for clients to write, currently supporting the InfluxDB Line Protocol format

```bash
cat < FlightSQL

> [!NOTE]
> _FlightSQL ingestion is coming soon!_

### Data Schema
GigAPI is a schema-on-write database managing databases, tables and schemas on the fly. New columns can be added or removed over time, leaving reconciliation up to readers.

```bash
/data
/mydb
/weather
/date=2025-04-10
/hour=14
*.parquet
metadata.json
/hour=15
*.parquet
metadata.json
```

GigAPI managed parquet files use the following naming schema:
```
{UUID}.{LEVEL}.parquet
```

### Parquet Compactor
GigAPI files are progressively compacted based on the following logic _(subject to future changes)_

| Merge Level | Source | Target | Frequency | Max Size |
|---------------|--------|--------|------------------------|----------|
| Level 1 -> 2 | `.1` | `.2` | `MERGE_TIMEOUT_S` = `10` | 100 MB |
| Level 2 -> 3 | `.2` | `.3` | `MERGE_TIMEOUT_S` * `10` | 400 MB |
| Level 3 -> 4 | `.3` | `.3` | `MERGE_TIMEOUT_S` * `10` * `10` | 4 GB |

## Read Support
As read requests come in to GigAPI they are parsed and transpiled using the GigAPI Metadata catalog to resolve data location based on database, table and timerange in requests. Series can be used with or without time ranges, ie for calculating averages, etc.

Query Data
```bash
$ curl -X POST "http://localhost:7972/query?db=mydb" \
-H "Content-Type: application/json" \
-d {"query": "SELECT time, temperature FROM weather WHERE time >= epoch_ns('2025-04-24T00:00:00'::TIMESTAMP)"}
```

Series can be used with or without time ranges, ie for counting, calculating averages, etc.

```bash
$ curl -X POST "http://localhost:7972/query?db=mydb" \
-H "Content-Type: application/json" \
-d '{"query": "SELECT count(*), avg(temperature) FROM weather"}'
```
```json
{"results":[{"avg(temperature)":87.025,"count_star()":"40"}]}
```

#### FlightSQL
GigAPI data can be accessed using FlightSQL GRPC clients in any language
```python
from flightsql import connect, FlightSQLClient
client = FlightSQLClient(host='localhost',port=8082,insecure=True,metadata={'bucket':'hep'})
conn = connect(client)
cursor = conn.cursor()
cursor.execute('SELECT count(*), avg(temperature) FROM weather')
print("rows:", [r for r in cursor])
```

#### GigAPI UI
The embedded GigAPI UI can be used to explore and query data using SQL with advanced features

![gigapi_preview](https://github.com/user-attachments/assets/8d550803-daa3-43dc-a4b3-b0779498fce5)

#### Grafana
GigAPI can be used from Grafana using the InfluxDB3 Flight GRPC Datasource

![image](https://github.com/user-attachments/assets/a7849ff4-b8f6-433b-8458-1c47394c5e5f)

> GigAPI readers can be implemented in any language and with any OLAP engine supporting Parquet files.


## GigAPI Diagram

```mermaid
%%{
init: {
'theme': 'base',
'themeVariables': {
'primaryColor': '#6a329f',
'primaryTextColor': '#fff',
'primaryBorderColor': '#7C0000',
'lineColor': '#6f329f',
'secondaryColor': '#006100',
'tertiaryColor': '#fff'
}
}
}%%
graph TD
subgraph "GigAPI System"
HTTP["HTTP API"] --> DataIngestion["Data Ingestion Pipeline"]
GRPC["GRPC API"] --> FlightSQL["FlightSQL Service"]

Configuration["Metadata Store"] --> Storage
Configuration --> DataIngestion
Configuration --> Storage
Configuration --> MergeProcess
MergeProcess --> Configuration

FlightSQL["FlightSQL Service"] --> Storage["Storage System"]
FlightSQL["FlightSQL Service"] --> DuckDB["DuckDB Engine"]

DataIngestion --> Storage["Storage System"]
Storage --> MergeProcess["Merge Process"]
Storage --> QueryEngine["Query Engine"]

DuckDB["DuckDB Engine"] --> Configuration


end

Client["Client Applications"] --> HTTP
Client["Client Applications"] --> GRPC

Storage --> LocalFS["Local Filesystem"]
Storage --> S3["S3 Storage"]

QueryEngine --> DuckDB["DuckDB Engine"]
FlightSQL["FlightSQL Service"] --> Configuration
```

### Got Questions?
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/gigapi/gigapi)

### Contributors

    [![Contributors @metrico/quackpipe](https://contrib.rocks/image?repo=gigapi/gigapi)](https://github.com/gigapi/gigapi/graphs/contributors)

### Community

[![Stargazers for @metrico/quackpipe](https://reporoster.com/stars/gigapi/gigapi)](https://github.com/gigapi/gigapi/stargazers)

###### :black_joker: Disclaimers

[^1]: DuckDB ® is a trademark of DuckDB Foundation. All rights reserved by their respective owners. [^1]
[^2]: ClickHouse ® is a trademark of ClickHouse Inc. No direct affiliation or endorsement. [^2]
[^3]: InfluxDB ® is a trademark of InfluxData. No direct affiliation or endorsement. [^3]
[^4]: Released under the MIT license. See LICENSE for details. All rights reserved by their respective owners. [^4]