Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/alash3al/xyr
Query any data source using SQL, works with the local filesystem, s3, and more. It should be a very tiny and lightweight alternative to AWS Athena, Presto ... etc.
https://github.com/alash3al/xyr
analytics aws-athena csv distributed-query mysql presto s3 sql sqlite3
Last synced: 2 months ago
JSON representation
Query any data source using SQL, works with the local filesystem, s3, and more. It should be a very tiny and lightweight alternative to AWS Athena, Presto ... etc.
- Host: GitHub
- URL: https://github.com/alash3al/xyr
- Owner: alash3al
- License: apache-2.0
- Created: 2021-11-09T20:05:59.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2022-01-20T07:27:20.000Z (about 3 years ago)
- Last Synced: 2024-06-20T02:13:30.258Z (8 months ago)
- Topics: analytics, aws-athena, csv, distributed-query, mysql, presto, s3, sql, sqlite3
- Language: Go
- Homepage:
- Size: 85.9 KB
- Stars: 64
- Watchers: 3
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
xyr
====
> `xyr` is a very lightweight, simple, and powerful data ETL platform that helps you to query available data sources using `SQL`.Example (Local Filesystem)
===========================
> here we define a new table called `users` which will load all json files in that directory (recursive) with any of the following json formats: (object/object[] per-file, newline delimited json objects/object[], or event no delimiter json objects/object[] like what kinesis firehose json output format).> Let's image we have a directory of json files called `/tmp/data/users` and here is an example of a json file there:
```json
{"id":10,"email":"[email protected]"}{"id":20,"email":"[email protected]"}{"id": 3,"email":"[email protected]"}{"id": 4,"email":"[email protected]"}
```> Then we can define its schema as following
```hcl
# where xyr should store its internal database
data_dir = "./tmp/db/"# this file is `./config.xyr.hcl`
table "users" {
// the driver we want
driver = "jsondir"// the data source directory
source = "/tmp/data/users"// xyr will try to create a table into its internal storage, so it needs
// to know at least what are the required columns names of your data.
// i.e: {"id": 1, "email": "[email protected]", "age": 20}
// but we only need "id" and "email", so we defined both in the below columns array
// and not that the ordering is the same as our example.
columns = ["id", "email"]// what do you want to load
// in case of jsondir, we can specify a regex pattern to filter the files
// using the filename
// but if we're using an SQL driver we can provide an sql statement that reads the data
// from the source SQL based database.
// i.e: "SELECT * FROM SOME_TABLE"
filter = ".*"
}
```> Now its the time to load it
```bash
$ xyr table:import users
```> Now let's query it
```bash
$ xyr exec "SELECT * FROM users"
```> All tables you define could be joined in the same query easily, let's imagine that we have the following defination
```hcl
# debug mode "affects the log level"
debug = true# how many workers to use to write into our sqlite db
# 0 means current cpu cores count
workers_count = 0# where xyr should store its internal database
data_dir = "./tmp/db/"table "users" {
driver = "s3jsondir"
source = "s3://ACCESS_KEY:SECRET_KEY@/BUCKET_NAME?region=&ssl=false&path=true&perpage=1000"# which prefix we want to select
filter = "xyr/users/"columns = ["id", "email"]
}table "user_vists" {
driver = "postgres"
source = "postgresql://username:password@server:port/dbname?option1=value1"
columns = ["user_id", "vists"]
filter = "SELECT user_id, count(vists) FROM USERS GROUP BY user_id"
}
```> Now let's join them
```bash
$ xyr exec "SELECT * FROM users LEFT JOIN user_vists ON users_vists.user_id = users.id"
```Installation
============
> use this [docker package](https://github.com/alash3al/xyr/pkgs/container/xyr)Supported Drivers
=================
| Driver | Source Connection String |
---------| ------------------------ |
| `jsondir` | `/PATH/TO/JSON/DATA/DIR`|
| `s3jsondir` | `s3://[access_key_url_encoded]:[secret_key_url_encoded]@[endpoint_url]/bucket_name?region=&ssl=false&path=true&perpage=1000&downloaders_count=8&downloader_concurrency=8`|
| `mysql` | `usrname:password@tcp(server:port)/dbname?option1=value1&...`|
| `postgres` | `postgresql://username:password@server:port/dbname?option1=value1`|
| `sqlite3` | `/path/to/db.sqlite?option1=value1`|
| `sqlserver` | `sqlserver://username:password@host/instance?param1=value¶m2=value` |
| | `sqlserver://username:password@host:port?param1=value¶m2=value`|
| | `sqlserver://sa@localhost/SQLExpress?database=master&connection+timeout=30`|
| `hana` | `hdb://user:password@host:port` |
| `clickhouse` | `tcp://host1:9000?username=user&password=qwerty&database=clicks&read_timeout=10&write_timeout=20&alt_hosts=host2:9000,host3:9000` |
| `oracle` | `oracle://user:pass@server1/service?server=server2&server=server3` |Use Cases
=========
- Simple Presto Alternative.
- Simple AWS Athena Alternative.
- Convert your JSON documents into a SQL DB.How does it work?
==================
> internaly `xyr` utilizes `SQLite` as an embeded sql datastore (it may be changed in future and we can add multiple data stores), when you define a table in `XYRCONFIG` file then run `$ xyr table:import` you will be able to import all defined tables as well querying them via `$ xyr exec "SELECT * FROM TABLE_NAME_HERE"` which outputs json result by default.Plan
====
- [x] Building the initial core.
- [x] Add the basic `import` command for importing the tables into `xyr`.
- [x] Add the `exec` command to execute SQL query.
- [x] Add well known SQL drivers
- [x] mysql
- [x] postgres
- [x] sqlite3
- [x] clickhouse
- [x] oracle
- [x] hana
- [x] sqlserver
- [x] Add an S3 driver
- [ ] Adding/Improving documentations
- [ ] Expose another API beside the `CLI` to enable external Apps to query `xyr`.
- [ ] JSON Endpoint?
- [ ] Mysql Protocol?
- [ ] Redis Protocol?
- [ ] Improving the code base (iteration 1).
- [ ] Add another backend instead of sqlite3 as internal datastore?