Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/roapi/roapi

Create full-fledged APIs for slowly moving datasets without writing a single line of code.
https://github.com/roapi/roapi

analytics arrow blob-storage cloud-native columnar datafusion datasets delta-lake graphql in-memory-database parquet query query-frontends rest-api rust s3 sql static-datasets

Last synced: about 1 month ago
JSON representation

Create full-fledged APIs for slowly moving datasets without writing a single line of code.

Awesome Lists containing this project

README

        

# ROAPI

![build](https://github.com/roapi/roapi/workflows/build/badge.svg)
[![Documentation](https://img.shields.io/badge/-documentation-blue)](https://roapi.github.io/docs/index.html)
[![discord](https://dcbadge.vercel.app/api/server/t3pjfZkZkZ?compact=true&style=flat)](https://discord.gg/FchMNYTm7G)

ROAPI automatically spins up read-only APIs for static datasets without
requiring you to write a single line of code. It builds on top of [Apache
Arrow](https://github.com/apache/arrow) and
[Datafusion](https://github.com/apache/arrow-datafusion). The
core of its design can be boiled down to the following:

- [Query frontends](https://roapi.github.io/docs/api/query/index.html) to
translate SQL, FlightSQL, GraphQL and REST API queries into
Datafusion plans.
- Datafusion for query plan execution.
- [Data layer](https://roapi.github.io/docs/config/dataset-formats/index.html)
to load datasets from a variety of sources and formats with automatic schema
inference.
- [Response encoding layer](https://roapi.github.io/docs/api/response.html) to
serialize intermediate Arrow record batch into various formats requested by
client.

See below for a high level diagram:

roapi-design-diagram

## Installation

### Install pre-built binary

```bash
# if you are using homebrew
brew install roapi
# or if you prefer pip
pip install roapi
```

Check out [Github release page](https://github.com/roapi/roapi/releases) for
pre-built binaries for each platform. Pre-built docker images are also available at
[ghcr.io/roapi/roapi](https://github.com/orgs/roapi/packages/container/package/roapi).

### Install from source

```bash
cargo install --locked --git https://github.com/roapi/roapi --branch main --bins roapi
```

## Usage

### Quick start

Spin up APIs for `test_data/uk_cities_with_headers.csv` and
`test_data/spacex_launches.json`:

```bash
roapi \
--table "uk_cities=test_data/uk_cities_with_headers.csv" \
--table "test_data/spacex_launches.json"
```

For windows, full scheme(file:// or filesystem://) must filled, and use double quote(") instead of single quote(') to escape windows cmdline limit:

```bash
roapi \
--table "uk_cities=file://d:/path/to/uk_cities_with_headers.csv" \
--table "file://d:/path/to/test_data/spacex_launches.json"
```

Or using docker:

```bash
docker run -t --rm -p 8080:8080 ghcr.io/roapi/roapi:latest --addr-http 0.0.0.0:8080 \
--table "uk_cities=test_data/uk_cities_with_headers.csv" \
--table "test_data/spacex_launches.json"
```

For MySQL and SQLite, use parameters like this.
```
--table "table_name=mysql://username:password@localhost:3306/database"
--table "table_name=sqlite://path/to/database"
```

Want dynamic register data? Add parameter `-d` to command. `--table` parameter cannot be ignored for now.
```bash
roapi \
--table "uk_cities=test_data/uk_cities_with_headers.csv" \
-d
```

Then post config to `/api/table` register data.
```bash
curl -X POST http://172.24.16.1:8080/api/table \
-H 'Content-Type: application/json' \
-d '[
{
"tableName": "uk_cities2",
"uri": "./test_data/uk_cities_with_headers.csv"
},
{
"tableName": "table_name",
"uri": "sqlite://path/to/database"
}
]'
```

Query tables using SQL, GraphQL or REST:

```bash
curl -X POST -d "SELECT city, lat, lng FROM uk_cities LIMIT 2" localhost:8080/api/sql
curl -X POST -d "query { uk_cities(limit: 2) {city, lat, lng} }" localhost:8080/api/graphql
curl "localhost:8080/api/tables/uk_cities?columns=city,lat,lng&limit=2"
```

Get inferred schema for all tables:

```bash
curl 'localhost:8080/api/schema'
```

### Config file

You can also configure multiple table sources using YAML or Toml config, which supports more
advanced format specific table options:

```yaml
addr:
http: 0.0.0.0:8084
postgres: 0.0.0.0:5433
tables:
- name: "blogs"
uri: "test_data/blogs.parquet"

- name: "ubuntu_ami"
uri: "test_data/ubuntu-ami.json"
option:
format: "json"
pointer: "/aaData"
array_encoded: true
schema:
columns:
- name: "zone"
data_type: "Utf8"
- name: "name"
data_type: "Utf8"
- name: "version"
data_type: "Utf8"
- name: "arch"
data_type: "Utf8"
- name: "instance_type"
data_type: "Utf8"
- name: "release"
data_type: "Utf8"
- name: "ami_id"
data_type: "Utf8"
- name: "aki_id"
data_type: "Utf8"

- name: "spacex_launches"
uri: "https://api.spacexdata.com/v4/launches"
option:
format: "json"

- name: "github_jobs"
uri: "https://web.archive.org/web/20210507025928if_/https://jobs.github.com/positions.json"
```

To run serve tables using config file:

```bash
roapi -c ./roapi.yml # or .toml
```

See [config
documentation](https://roapi.github.io/docs/config/config-file.html) for more
options including [using Google spreadsheet as a table
source](https://roapi.github.io/docs/config/dataset-formats/gsheet.html).

### Response serialization

By default, ROAPI encodes responses in JSON format, but you can request
different encodings by specifying the `ACCEPT` header:

```bash
curl -X POST \
-H 'ACCEPT: application/vnd.apache.arrow.stream' \
-d "SELECT launch_library_id FROM spacex_launches WHERE launch_library_id IS NOT NULL" \
localhost:8080/api/sql
```

### REST API query interface

You can query tables through REST API by sending `GET` requests to
`/api/tables/{table_name}`. Query operators are specified as query params.

REST query frontend currently supports the following query operators:

- columns
- sort
- limit
- filter

To sort column `col1` in ascending order and `col2` in descending order, set
query param to: `sort=col1,-col2`.

To find all rows with `col1` equal to string `'foo'`, set query param to:
`filter[col1]='foo'`. You can also do basic comparisons with filters, for
example predicate `0 <= col2 < 5` can be expressed as
`filter[col2]gte=0&filter[col2]lt=5`.

### GraphQL query interface

To query tables using GraphQL, send the query through `POST` request to
`/api/graphql` endpoint.

GraphQL query frontend supports the same set of operators supported by [REST
query frontend](https://roapi.github.io/docs/api/query/rest.html). Here how is
you can apply various operators in a query:

```graphql
{
table_name(
filter: { col1: false, col2: { gteq: 4, lt: 1000 } }
sort: [{ field: "col2", order: "desc" }, { field: "col3" }]
limit: 100
) {
col1
col2
col3
}
}
```

### SQL query interface

To query tables using a subset of standard SQL, send the query through `POST`
request to `/api/sql` endpoint. This is the only query interface that supports
table joins.

### Key value lookup

You can pick two columns from a table to use a key and value to create a quick
keyvalue store API by adding the following lines to the config:

```yaml
kvstores:
- name: "launch_name"
uri: "test_data/spacex_launches.json"
key: id
value: name
```

Key value lookup can be done through simple HTTP GET requests:

```bash
curl -v localhost:8080/api/kv/launch_name/600f9a8d8f798e2a4d5f979e
Starlink-21 (v1.0)%
```

### Query through Postgres wire protocol

ROAPI can present itself as a Postgres server so users can use Postgres clients
to issue SQL queries.

```bash
$ psql -h 127.0.0.1
psql (12.10 (Ubuntu 12.10-0ubuntu0.20.04.1), server 13)
WARNING: psql major version 12, server major version 13.
Some psql features might not work.
Type "help" for help.

houqp=> select count(*) from uk_cities;
COUNT(UInt8(1))
-----------------
37
(1 row)
```

## Features

Query layer:

- [x] REST API GET
- [x] GraphQL
- [x] SQL
- [x] join between tables
- [x] access to array elements by index
- [x] access to nested struct fields by key
- [ ] column index
- protocol
- [x] Postgres
- [x] FlightSQL
- [x] Key value lookup

Response serialization:

- [x] JSON `application/json`
- [x] Arrow `application/vnd.apache.arrow.stream`
- [x] Parquet `application/vnd.apache.parquet`
- [ ] msgpack

Data layer:

- [x] filesystem
- [x] HTTP/HTTPS
- [x] S3
- [x] GCS
- [x] Azure Storage
- [x] Google spreadsheet
- [x] MySQL
- [x] SQLite
- [x] Postgres
- [ ] Airtable
- Data format
- [x] CSV
- [x] JSON
- [x] NDJSON
- [x] parquet
- [x] xls, xlsx, xlsb, ods: https://github.com/tafia/calamine
- [x] [DeltaLake](https://delta.io/)

Misc:

- [ ] auto gen OpenAPI doc for rest layer
- [ ] query input type conversion based on table schema
- [ ] stream arrow encoding response
- [ ] authentication layer

## Development

The core of ROAPI, including query front-ends and data layer, lives in the
self-contained [columnq](https://github.com/roapi/roapi/tree/main/columnq)
crate. It takes queries and outputs Arrow record batches. Data sources will
also be loaded and stored in memory as Arrow record batches.

The [roapi](https://github.com/roapi/roapi/tree/main/roapi) crate wraps
`columnq` with a multi-protocol query layer. It serializes Arrow record batches
produced by `columnq` into different formats based on client request.

### Debug

To log all FlightSQL requests in console, set `RUST_LOG=tower_http=trace`.

### Build Docker image

```bash
docker build --rm -t ghcr.io/roapi/roapi:latest .
```
### VS Code DevContainer

#### Requirements
- [x] Vscode
- [x] Ensure this extension is installed on your vs code `ms-vscode-remote.remote-containers`
---
Once done you will see prompt from left side to reopen the project in dev container or open command palette and search for open with remote container:

1. install dependencies
```bash
apt-get update && apt-get install --no-install-recommends -y cmake
```
2. connect to database from your local using db client of choice using the following credentials
```
username: user
password: user
database: test
```
once done create table so you can map it in `-t` arg or consider using sample in `.devcontainer/db-migration.sql` to populate some tables with data

3. run cargo command with mysql db as feature
```bash
cargo run --bin roapi --features database -- -a localhost:8080 -t posts=mysql://user:user@db:3306/test
```
otherwise if you are looking for other features
you have to select appropriate one from `roapi/Cargo.toml`