Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/roapi/roapi
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
https://github.com/roapi/roapi
analytics arrow blob-storage cloud-native columnar datafusion datasets delta-lake graphql in-memory-database parquet query query-frontends rest-api rust s3 sql static-datasets
Last synced: 3 months ago
JSON representation
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
- Host: GitHub
- URL: https://github.com/roapi/roapi
- Owner: roapi
- License: apache-2.0
- Created: 2020-12-11T00:40:36.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2024-10-24T04:50:29.000Z (3 months ago)
- Last Synced: 2024-10-24T21:57:21.170Z (3 months ago)
- Topics: analytics, arrow, blob-storage, cloud-native, columnar, datafusion, datasets, delta-lake, graphql, in-memory-database, parquet, query, query-frontends, rest-api, rust, s3, sql, static-datasets
- Language: Rust
- Homepage: https://roapi.github.io/docs
- Size: 1.17 MB
- Stars: 3,204
- Watchers: 43
- Forks: 180
- Open Issues: 48
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# ROAPI
![build](https://github.com/roapi/roapi/workflows/build/badge.svg)
[![Documentation](https://img.shields.io/badge/-documentation-blue)](https://roapi.github.io/docs/index.html)
[![discord](https://dcbadge.vercel.app/api/server/t3pjfZkZkZ?compact=true&style=flat)](https://discord.gg/FchMNYTm7G)ROAPI automatically spins up read-only APIs for static datasets without
requiring you to write a single line of code. It builds on top of [Apache
Arrow](https://github.com/apache/arrow) and
[Datafusion](https://github.com/apache/arrow-datafusion). The
core of its design can be boiled down to the following:- [Query frontends](https://roapi.github.io/docs/api/query/index.html) to
translate SQL, FlightSQL, GraphQL and REST API queries into
Datafusion plans.
- Datafusion for query plan execution.
- [Data layer](https://roapi.github.io/docs/config/dataset-formats/index.html)
to load datasets from a variety of sources and formats with automatic schema
inference.
- [Response encoding layer](https://roapi.github.io/docs/api/response.html) to
serialize intermediate Arrow record batch into various formats requested by
client.See below for a high level diagram:
## Installation
### Install pre-built binary
```bash
# if you are using homebrew
brew install roapi
# or if you prefer pip
pip install roapi
```Check out [Github release page](https://github.com/roapi/roapi/releases) for
pre-built binaries for each platform. Pre-built docker images are also available at
[ghcr.io/roapi/roapi](https://github.com/orgs/roapi/packages/container/package/roapi).### Install from source
```bash
cargo install --locked --git https://github.com/roapi/roapi --branch main --bins roapi
```## Usage
### Quick start
Spin up APIs for `test_data/uk_cities_with_headers.csv` and
`test_data/spacex_launches.json`:```bash
roapi \
--table "uk_cities=test_data/uk_cities_with_headers.csv" \
--table "test_data/spacex_launches.json"
```For windows, full scheme(file:// or filesystem://) must filled, and use double quote(") instead of single quote(') to escape windows cmdline limit:
```bash
roapi \
--table "uk_cities=file://d:/path/to/uk_cities_with_headers.csv" \
--table "file://d:/path/to/test_data/spacex_launches.json"
```Or using docker:
```bash
docker run -t --rm -p 8080:8080 ghcr.io/roapi/roapi:latest --addr-http 0.0.0.0:8080 \
--table "uk_cities=test_data/uk_cities_with_headers.csv" \
--table "test_data/spacex_launches.json"
```For MySQL and SQLite, use parameters like this.
```
--table "table_name=mysql://username:password@localhost:3306/database"
--table "table_name=sqlite://path/to/database"
```Want dynamic register data? Add parameter `-d` to command. `--table` parameter cannot be ignored for now.
```bash
roapi \
--table "uk_cities=test_data/uk_cities_with_headers.csv" \
-d
```Then post config to `/api/table` register data.
```bash
curl -X POST http://172.24.16.1:8080/api/table \
-H 'Content-Type: application/json' \
-d '[
{
"tableName": "uk_cities2",
"uri": "./test_data/uk_cities_with_headers.csv"
},
{
"tableName": "table_name",
"uri": "sqlite://path/to/database"
}
]'
```Query tables using SQL, GraphQL or REST:
```bash
curl -X POST -d "SELECT city, lat, lng FROM uk_cities LIMIT 2" localhost:8080/api/sql
curl -X POST -d "query { uk_cities(limit: 2) {city, lat, lng} }" localhost:8080/api/graphql
curl "localhost:8080/api/tables/uk_cities?columns=city,lat,lng&limit=2"
```Get inferred schema for all tables:
```bash
curl 'localhost:8080/api/schema'
```### Config file
You can also configure multiple table sources using YAML or Toml config, which supports more
advanced format specific table options:```yaml
addr:
http: 0.0.0.0:8084
postgres: 0.0.0.0:5433
tables:
- name: "blogs"
uri: "test_data/blogs.parquet"- name: "ubuntu_ami"
uri: "test_data/ubuntu-ami.json"
option:
format: "json"
pointer: "/aaData"
array_encoded: true
schema:
columns:
- name: "zone"
data_type: "Utf8"
- name: "name"
data_type: "Utf8"
- name: "version"
data_type: "Utf8"
- name: "arch"
data_type: "Utf8"
- name: "instance_type"
data_type: "Utf8"
- name: "release"
data_type: "Utf8"
- name: "ami_id"
data_type: "Utf8"
- name: "aki_id"
data_type: "Utf8"- name: "spacex_launches"
uri: "https://api.spacexdata.com/v4/launches"
option:
format: "json"- name: "github_jobs"
uri: "https://web.archive.org/web/20210507025928if_/https://jobs.github.com/positions.json"
```To run serve tables using config file:
```bash
roapi -c ./roapi.yml # or .toml
```See [config
documentation](https://roapi.github.io/docs/config/config-file.html) for more
options including [using Google spreadsheet as a table
source](https://roapi.github.io/docs/config/dataset-formats/gsheet.html).### Response serialization
By default, ROAPI encodes responses in JSON format, but you can request
different encodings by specifying the `ACCEPT` header:```bash
curl -X POST \
-H 'ACCEPT: application/vnd.apache.arrow.stream' \
-d "SELECT launch_library_id FROM spacex_launches WHERE launch_library_id IS NOT NULL" \
localhost:8080/api/sql
```### REST API query interface
You can query tables through REST API by sending `GET` requests to
`/api/tables/{table_name}`. Query operators are specified as query params.REST query frontend currently supports the following query operators:
- columns
- sort
- limit
- filterTo sort column `col1` in ascending order and `col2` in descending order, set
query param to: `sort=col1,-col2`.To find all rows with `col1` equal to string `'foo'`, set query param to:
`filter[col1]='foo'`. You can also do basic comparisons with filters, for
example predicate `0 <= col2 < 5` can be expressed as
`filter[col2]gte=0&filter[col2]lt=5`.### GraphQL query interface
To query tables using GraphQL, send the query through `POST` request to
`/api/graphql` endpoint.GraphQL query frontend supports the same set of operators supported by [REST
query frontend](https://roapi.github.io/docs/api/query/rest.html). Here how is
you can apply various operators in a query:```graphql
{
table_name(
filter: { col1: false, col2: { gteq: 4, lt: 1000 } }
sort: [{ field: "col2", order: "desc" }, { field: "col3" }]
limit: 100
) {
col1
col2
col3
}
}
```### SQL query interface
To query tables using a subset of standard SQL, send the query through `POST`
request to `/api/sql` endpoint. This is the only query interface that supports
table joins.### Key value lookup
You can pick two columns from a table to use a key and value to create a quick
keyvalue store API by adding the following lines to the config:```yaml
kvstores:
- name: "launch_name"
uri: "test_data/spacex_launches.json"
key: id
value: name
```Key value lookup can be done through simple HTTP GET requests:
```bash
curl -v localhost:8080/api/kv/launch_name/600f9a8d8f798e2a4d5f979e
Starlink-21 (v1.0)%
```### Query through Postgres wire protocol
ROAPI can present itself as a Postgres server so users can use Postgres clients
to issue SQL queries.```bash
$ psql -h 127.0.0.1
psql (12.10 (Ubuntu 12.10-0ubuntu0.20.04.1), server 13)
WARNING: psql major version 12, server major version 13.
Some psql features might not work.
Type "help" for help.houqp=> select count(*) from uk_cities;
COUNT(UInt8(1))
-----------------
37
(1 row)
```## Features
Query layer:
- [x] REST API GET
- [x] GraphQL
- [x] SQL
- [x] join between tables
- [x] access to array elements by index
- [x] access to nested struct fields by key
- [ ] column index
- protocol
- [x] Postgres
- [x] FlightSQL
- [x] Key value lookupResponse serialization:
- [x] JSON `application/json`
- [x] Arrow `application/vnd.apache.arrow.stream`
- [x] Parquet `application/vnd.apache.parquet`
- [ ] msgpackData layer:
- [x] filesystem
- [x] HTTP/HTTPS
- [x] S3
- [x] GCS
- [x] Azure Storage
- [x] Google spreadsheet
- [x] MySQL
- [x] SQLite
- [x] Postgres
- [ ] Airtable
- Data format
- [x] CSV
- [x] JSON
- [x] NDJSON
- [x] parquet
- [x] xls, xlsx, xlsb, ods: https://github.com/tafia/calamine
- [x] [DeltaLake](https://delta.io/)Misc:
- [ ] auto gen OpenAPI doc for rest layer
- [ ] query input type conversion based on table schema
- [ ] stream arrow encoding response
- [ ] authentication layer## Development
The core of ROAPI, including query front-ends and data layer, lives in the
self-contained [columnq](https://github.com/roapi/roapi/tree/main/columnq)
crate. It takes queries and outputs Arrow record batches. Data sources will
also be loaded and stored in memory as Arrow record batches.The [roapi](https://github.com/roapi/roapi/tree/main/roapi) crate wraps
`columnq` with a multi-protocol query layer. It serializes Arrow record batches
produced by `columnq` into different formats based on client request.### Debug
To log all FlightSQL requests in console, set `RUST_LOG=tower_http=trace`.
### Build Docker image
```bash
docker build --rm -t ghcr.io/roapi/roapi:latest .
```
### VS Code DevContainer#### Requirements
- [x] Vscode
- [x] Ensure this extension is installed on your vs code `ms-vscode-remote.remote-containers`
---
Once done you will see prompt from left side to reopen the project in dev container or open command palette and search for open with remote container:1. install dependencies
```bash
apt-get update && apt-get install --no-install-recommends -y cmake
```
2. connect to database from your local using db client of choice using the following credentials
```
username: user
password: user
database: test
```
once done create table so you can map it in `-t` arg or consider using sample in `.devcontainer/db-migration.sql` to populate some tables with data3. run cargo command with mysql db as feature
```bash
cargo run --bin roapi --features database -- -a localhost:8080 -t posts=mysql://user:user@db:3306/test
```
otherwise if you are looking for other features
you have to select appropriate one from `roapi/Cargo.toml`