{"id":21973654,"url":"https://github.com/alash3al/xyr","last_synced_at":"2025-08-24T12:45:45.280Z","repository":{"id":57636350,"uuid":"426370027","full_name":"alash3al/xyr","owner":"alash3al","description":"Query any data source using SQL, works with the local filesystem, s3, and more. It should be a very tiny and lightweight alternative to AWS Athena, Presto ... etc.","archived":false,"fork":false,"pushed_at":"2022-01-20T07:27:20.000Z","size":88,"stargazers_count":65,"open_issues_count":0,"forks_count":3,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-30T10:33:41.159Z","etag":null,"topics":["analytics","aws-athena","csv","distributed-query","mysql","presto","s3","sql","sqlite3"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alash3al.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-11-09T20:05:59.000Z","updated_at":"2024-08-06T15:35:40.000Z","dependencies_parsed_at":"2022-09-26T20:21:54.466Z","dependency_job_id":null,"html_url":"https://github.com/alash3al/xyr","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alash3al%2Fxyr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alash3al%2Fxyr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alash3al%2Fxyr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alash3al%2Fxyr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alash3al","download_url":"https://codeload.github.com/alash3al/xyr/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251328697,"owners_count":21571964,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analytics","aws-athena","csv","distributed-query","mysql","presto","s3","sql","sqlite3"],"created_at":"2024-11-29T15:33:53.499Z","updated_at":"2025-04-28T14:27:01.439Z","avatar_url":"https://github.com/alash3al.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"xyr\n====\n\u003e `xyr` is a very lightweight, simple, and powerful data ETL platform that helps you to query available data sources using `SQL`.\n\nExample (Local Filesystem)\n===========================\n\u003e here we define a new table called `users` which will load all json files in that directory (recursive) with any of the following json formats: (object/object[] per-file, newline delimited json objects/object[], or event no delimiter json objects/object[] like what kinesis firehose json output format).\n\n\u003e Let's image we have a directory of json files called `/tmp/data/users` and here is an example of a json file there:\n```json\n{\"id\":10,\"email\":\"u10@example.com\"}{\"id\":20,\"email\":\"u20@example.com\"}{\"id\": 3,\"email\":\"u3@example.com\"}{\"id\": 4,\"email\":\"u4@example.com\"}\n```\n\n\u003e Then we can define its schema as following\n```hcl\n# where xyr should store its internal database\ndata_dir = \"./tmp/db/\"\n\n# this file is `./config.xyr.hcl`\ntable \"users\" {\n    // the driver we want\n    driver = \"jsondir\"\n\n    // the data source directory\n    source = \"/tmp/data/users\"\n\n    // xyr will try to create a table into its internal storage, so it needs\n    // to know at least what are the required columns names of your data.\n    // i.e: {\"id\": 1, \"email\": \"user@example.com\", \"age\": 20}\n    // but we only need \"id\" and \"email\", so we defined both in the below columns array\n    // and not that the ordering is the same as our example.\n    columns = [\"id\", \"email\"]\n\n    // what do you want to load\n    // in case of jsondir, we can specify a regex pattern to filter the files \n    // using the filename\n    // but if we're using an SQL driver we can provide an sql statement that reads the data\n    // from the source SQL based database.\n    // i.e: \"SELECT * FROM SOME_TABLE\"\n    filter = \".*\"\n}\n```\n\n\u003e Now its the time to load it\n```bash\n$ xyr table:import users\n```\n\n\u003e Now let's query it\n```bash\n$ xyr exec \"SELECT * FROM users\"\n```\n\n\u003e All tables you define could be joined in the same query easily, let's imagine that we have the following defination\n```hcl\n# debug mode \"affects the log level\"\ndebug = true\n\n# how many workers to use to write into our sqlite db\n# 0 means current cpu cores count\nworkers_count = 0\n\n# where xyr should store its internal database\ndata_dir = \"./tmp/db/\"\n\n\ntable \"users\" {\n    driver = \"s3jsondir\"\n    source = \"s3://ACCESS_KEY:SECRET_KEY@/BUCKET_NAME?region=\u0026ssl=false\u0026path=true\u0026perpage=1000\"\n\n    # which prefix we want to select\n    filter = \"xyr/users/\"\n\n    columns = [\"id\", \"email\"]\n}\n\ntable \"user_vists\" {\n    driver = \"postgres\"\n    source = \"postgresql://username:password@server:port/dbname?option1=value1\"\n    columns = [\"user_id\", \"vists\"]\n    filter = \"SELECT user_id, count(vists) FROM USERS GROUP BY user_id\"\n}\n```\n\n\u003e Now let's join them\n```bash\n$ xyr exec \"SELECT * FROM users LEFT JOIN user_vists ON users_vists.user_id = users.id\"\n```\n\nInstallation\n============\n\u003e use this [docker package](https://github.com/alash3al/xyr/pkgs/container/xyr)\n\nSupported Drivers\n=================\n| Driver | Source Connection String |\n---------| ------------------------ |\n| `jsondir`     | `/PATH/TO/JSON/DATA/DIR`|\n| `s3jsondir`   | `s3://[access_key_url_encoded]:[secret_key_url_encoded]@[endpoint_url]/bucket_name?region=\u0026ssl=false\u0026path=true\u0026perpage=1000\u0026downloaders_count=8\u0026downloader_concurrency=8`|\n| `mysql`       | `usrname:password@tcp(server:port)/dbname?option1=value1\u0026...`|\n| `postgres`    | `postgresql://username:password@server:port/dbname?option1=value1`|\n| `sqlite3`     | `/path/to/db.sqlite?option1=value1`|\n| `sqlserver`   | `sqlserver://username:password@host/instance?param1=value\u0026param2=value` |\n|               | `sqlserver://username:password@host:port?param1=value\u0026param2=value`|\n|               | `sqlserver://sa@localhost/SQLExpress?database=master\u0026connection+timeout=30`|\n| `hana`        | `hdb://user:password@host:port` |\n| `clickhouse`  | `tcp://host1:9000?username=user\u0026password=qwerty\u0026database=clicks\u0026read_timeout=10\u0026write_timeout=20\u0026alt_hosts=host2:9000,host3:9000` |\n| `oracle`      | `oracle://user:pass@server1/service?server=server2\u0026server=server3` |\n\nUse Cases\n=========\n- Simple Presto Alternative.\n- Simple AWS Athena Alternative.\n- Convert your JSON documents into a SQL DB.\n\nHow does it work?\n==================\n\u003e internaly `xyr` utilizes `SQLite` as an embeded sql datastore (it may be changed in future and we can add multiple data stores), when you define a table in `XYRCONFIG` file then run `$ xyr table:import` you will be able to import all defined tables as well querying them via `$ xyr exec \"SELECT * FROM TABLE_NAME_HERE\"` which outputs json result by default.\n\nPlan\n====\n- [x] Building the initial core.\n- [x] Add the basic `import` command for importing the tables into `xyr`.\n- [x] Add the `exec` command to execute SQL query.\n- [x] Add well known SQL drivers\n    - [x] mysql\n    - [x] postgres\n    - [x] sqlite3\n    - [x] clickhouse\n    - [x] oracle\n    - [x] hana\n    - [x] sqlserver\n- [x] Add an S3 driver\n- [ ] Adding/Improving documentations\n- [ ] Expose another API beside the `CLI` to enable external Apps to query `xyr`.\n    - [ ] JSON Endpoint?\n    - [ ] Mysql Protocol?\n    - [ ] Redis Protocol?\n- [ ] Improving the code base (iteration 1).\n- [ ] Add another backend instead of sqlite3 as internal datastore?\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falash3al%2Fxyr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falash3al%2Fxyr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falash3al%2Fxyr/lists"}