{"id":22767171,"url":"https://github.com/sal-openlab/datafusion-server","last_synced_at":"2025-04-15T00:36:39.657Z","repository":{"id":195960999,"uuid":"694155546","full_name":"sal-openlab/datafusion-server","owner":"sal-openlab","description":"Rust DataFusion Server","archived":false,"fork":false,"pushed_at":"2024-05-11T07:01:52.000Z","size":1558,"stargazers_count":6,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-05-11T11:22:52.294Z","etag":null,"topics":["arrow","datafusion","rust","sql"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sal-openlab.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-20T12:44:31.000Z","updated_at":"2024-05-30T13:18:59.290Z","dependencies_parsed_at":null,"dependency_job_id":"979ff5cb-4fdf-4423-8cd9-360e8bbaf976","html_url":"https://github.com/sal-openlab/datafusion-server","commit_stats":null,"previous_names":["neural-runner/datafusion-server","sal-openlab/datafusion-server"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sal-openlab%2Fdatafusion-server","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sal-openlab%2Fdatafusion-server/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sal-openlab%2Fdatafusion-server/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sal-openlab%2Fdatafusion-server/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sal-openlab","download_url":"https://codeload.github.com/sal-openlab/datafusion-server/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":229229755,"owners_count":18040509,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arrow","datafusion","rust","sql"],"created_at":"2024-12-11T13:17:22.927Z","updated_at":"2025-04-15T00:36:39.650Z","avatar_url":"https://github.com/sal-openlab.png","language":"Rust","readme":"# datafusion-server crate\n\n[![crates.io](https://img.shields.io/crates/v/datafusion-server?color=blue)](https://crates.io/crates/datafusion-server)\n[![license](https://img.shields.io/github/license/sal-openlab/datafusion-server?color=blue)](./LICENSE)\n[![build](https://img.shields.io/github/actions/workflow/status/sal-openlab/datafusion-server/push-trigger.yml?logo=github)](https://github.com/sal-openlab/datafusion-server/actions?query=workflow%3Apush-trigger)\n[![pages](https://img.shields.io/github/actions/workflow/status/sal-openlab/datafusion-server/doc-trigger.yml?logo=github\u0026label=docs)](https://sal-openlab.github.io/datafusion-server/)\n\nMultiple session, variety of data sources query server implemented by Rust.\n\n* Asynchronous architecture used by [Tokio](https://tokio.rs/) ecosystem\n* [Apache Arrow](https://arrow.apache.org/) with [Apache DataFusion](https://arrow.apache.org/datafusion/)\n    + Supports multiple data source with SQL queries\n* Python plugin feature for data source connector and post processor\n* Horizontal scaling architecture between servers using\n  the [Arrow Flight](https://arrow.apache.org/docs/format/Flight.html) gRPC feature\n\nPlease see the **[Documentation](https://sal-openlab.github.io/datafusion-server/introduction/)** for an introductory\ntutorial and a full usage guide. Additionally,\nthe [REST API documentation](https://sal-openlab.github.io/datafusion-server/api/v1/) is available according to the\nOpenAPI specification. Also, refer to\nthe [CHANGELOG](https://github.com/sal-openlab/datafusion-server/blob/main/CHANGELOG.md) for the latest information.\n\n## System Overview\n\n![System Diagram](https://github.com/sal-openlab/datafusion-server/blob/main/doc/system-diagram.svg?raw=true)\n\n## License\n\nLicense under the [MIT](LICENSE)\n\nCopyright \u0026copy; 2022 - 2025 SAL Ltd. - https://sal.co.jp\n\n## Supported environment\n\n* Linux\n* BSD based Unix incl. macOS / Mac OSX\n* SVR based Unix\n* Windows incl. WSL2 / Cygwin\n\nand other [LLVM](https://llvm.org/docs/GettingStarted.html#hardware) supported environment.\n\n## Using pre-built Docker image (Currently available amd64 architecture only)\n\n### Pre-require\n\n* Docker CE / EE v20+\n\n### Pull container image from GitHub container registry\n\n```sh\n$ docker pull ghcr.io/sal-openlab/datafusion-server/datafusion-server:latest\n```\n\nor built without Python plugin version.\n\n```sh\n$ docker pull ghcr.io/sal-openlab/datafusion-server/datafusion-server-without-plugin:latest\n```\n\n### Executing container\n\n```sh\n$ docker run -d --rm \\\n    -p 4000:4000 \\\n    -v ./data:/var/datafusion-server/data \\\n    --name datafusion-server \\\n    ghcr.io/sal-openlab/datafusion-server/datafusion-server:latest\n```\n\nIf you are only using sample data in a container, omit the `-v ./data:/var/xapi-server/data`.\n\n## Build container your self\n\n### Pre-require\n\n* Docker CE / EE v20+\n\n### Build two containers, datafusion-server and datafusion-server-without-plugin\n\n```sh\n$ cd \u003crepository-root-dir\u003e\n$ ./make-containers.sh\n```\n\n### Executing container\n\n```sh\n$ docker run -d --rm \\\n    -p 4000:4000 \\\n    -v ./bin/data:/var/datafusion-server/data \\\n    --name datafusion-server \\\n    datafusion-server:0.19.8\n```\n\nIf you are only using sample data in a container, omit the `-v ./bin/data:/var/xapi-server/data`.\n\n## Build from source code for use in your project\n\n### Pre-require\n\n* Rust Toolchain 1.81+ (Edition 2021) from https://www.rust-lang.org\n* _or_ the Rust official container from https://hub.docker.com/_/rust\n\n### How to run\n\n```sh\n$ cargo init server-executor\n$ cd server-executor\n```\n\n#### Example of Cargo.toml\n\n```toml\n[package]\nname = \"server-executor\"\nversion = \"0.1.0\"\nedition = \"2021\"\n\n[dependencies]\ndatafusion-server = \"0.19.8\"\nclap = { version = \"4.5\", features = [\"derive\"] }\n```\n\n#### Example of src/main.rs\n\n```rust\nuse std::path::PathBuf;\n\nuse clap::Parser;\nuse datafusion_server::settings::Settings;\n\n#[derive(Parser)]\n#[clap(author, version, about = \"Arrow and other large datasets web server\", long_about = None)]\nstruct Args {\n    #[clap(\n        long,\n        value_parser,\n        short = 'f',\n        value_name = \"FILE\",\n        help = \"Configuration file\",\n        default_value = \"./config.toml\"\n    )]\n    config: PathBuf,\n}\n\nfn main() -\u003e Result\u003c(), Box\u003cdyn std::error::Error\u003e\u003e {\n    let args = Args::parse();\n    let settings = Settings::new_with_file(\u0026args.config)?;\n    datafusion_server::execute(settings)?;\n    Ok(())\n}\n```\n\nFor details, further reading [main.rs](bin/src/main.rs) and [Config.toml](bin/Cargo.toml).\n\n#### Example of config.toml\n\n```toml\n# Configuration file of datafusion-server\n\n[server]\nport = 4000\nflight_grpc_port = 50051\nbase_url = \"/\"\ndata_dir = \"./data\"\nplugin_dir = \"./plugins\"\n\n[session]\ndefault_keep_alive = 3600 # in seconds\nupload_limit_size = 20 # MB\n\n[log]\n# trace, debug, info, warn, error\nlevel = \"debug\"\n```\n\n#### Debug build and run\n\n```sh\n$ cargo run\n```\n\n## datafusion-server with Python plugins feature\n\nRequire Python interpreter v3.7+\n\n### How to run\n\n#### Example of Cargo.toml\n\n```toml\n[dependencies]\ndatafusion-server = { version = \"0.19.8\", features = [\"plugin\"] }\n```\n\n#### Debug build and run\n\n```sh\n$ cargo run\n```\n\n### Release build with full optimization\n\n#### Example of Cargo.toml\n\n```toml\n[profile.release]\nopt-level = 'z'\nstrip = true\nlto = \"fat\"\ncodegen-units = 1\n\n[dependencies]\ndatafusion-server = { version = \"0.19.8\", features = [\"plugin\"] }\n```\n\n#### Build for release\n\n```sh\n$ cargo build --release\n```\n\n### Clean workspace\n\n```sh\n$ cargo clean\n```\n\n## Usage\n\n### Multiple data sources with SQL query\n\n* Can be used many kind of data source format (Parquet, JSON, ndJSON, CSV, ...).\n* Data can be retrieved from the local file system and from external REST services.\n    + Processing by JSONPath can be performed if necessary.\n* Query execution across multiple data sources.\n    + SQL query engine uses Arrow DataFusion.\n        - Details https://arrow.apache.org/datafusion/user-guide/sql/index.html for more information.\n* Arrow, JSON and CSV formats to response.\n\n#### Example (local file)\n\n```sh\n$ curl -X \"POST\" \"http://localhost:4000/dataframe/query\" \\\n     -H 'Content-Type: application/json' \\\n     -d $'\n{\n  \"dataSources\": [\n    {\n      \"format\": \"csv\",\n      \"name\": \"sales\",\n      \"location\": \"file:///superstore.csv\",\n      \"options\": {\n        \"inferSchemaRows\": 100,\n        \"hasHeader\": true\n      }\n    }\n  ],\n  \"query\": {\n    \"sql\": \"SELECT * FROM sales\"\n  },\n  \"response\": {\n    \"format\": \"json\"\n  }\n}'\n```\n\n#### Example (remote REST API)\n\n```sh\n$ curl -X \"POST\" \"http://localhost:4000/dataframe/query\" \\\n     -H 'Content-Type: application/json' \\\n     -H 'Accept: text/csv' \\\n     -d $'\n{\n  \"dataSources\": [\n    {\n      \"format\": \"json\",\n      \"name\": \"population\",\n      \"location\": \"https://datausa.io/api/data?drilldowns=State\u0026measures=Population\",\n      \"options\": {\n        \"jsonPath\": \"$.data[*]\"\n      }\n    }\n  ],\n  \"query\": {\n    \"sql\": \"SELECT * FROM population WHERE \\\"ID Year\\\"\u003e=2020\"\n  }\n}'\n```\n\n#### Example (Python datasource connector plugin)\n\n```sh\n$ curl -X \"POST\" \"http://localhost:4000/dataframe/query\" \\\n     -H 'Content-Type: application/json' \\\n     -H 'Accept: application/json' \\\n     -d $'\n{\n  \"dataSources\": [\n    {\n      \"format\": \"arrow\",\n      \"name\": \"example\",\n      \"location\": \"excel://example-workbook.xlsx/Sheet1\",\n      \"pluginOptions\": {\n        \"skipRows\": 2\n      }\n    }\n  ],\n  \"query\": {\n    \"sql\": \"SELECT * FROM example\"\n  }\n}'\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsal-openlab%2Fdatafusion-server","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsal-openlab%2Fdatafusion-server","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsal-openlab%2Fdatafusion-server/lists"}