{"id":20602023,"url":"https://github.com/datafusion-contrib/datafusion-dft","last_synced_at":"2025-05-10T00:31:27.440Z","repository":{"id":38085290,"uuid":"471244441","full_name":"datafusion-contrib/datafusion-dft","owner":"datafusion-contrib","description":"Batteries included CLI, TUI, and server implementations for DataFusion.","archived":false,"fork":false,"pushed_at":"2025-05-04T12:58:39.000Z","size":16627,"stargazers_count":151,"open_issues_count":33,"forks_count":13,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-05-04T13:42:43.719Z","etag":null,"topics":["arrow","cli","data","database","datafusion","tui"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datafusion-contrib.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-03-18T05:17:13.000Z","updated_at":"2025-05-04T12:58:42.000Z","dependencies_parsed_at":"2024-08-24T07:43:53.032Z","dependency_job_id":"6ae33a49-41ea-4a0c-93a2-d7f3a77743c7","html_url":"https://github.com/datafusion-contrib/datafusion-dft","commit_stats":{"total_commits":95,"total_committers":4,"mean_commits":23.75,"dds":"0.13684210526315788","last_synced_commit":"d3fd671924cb91d0c65a81502595ac5388532271"},"previous_names":["datafusion-contrib/datafusion-dft","datafusion-contrib/datafusion-tui"],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datafusion-contrib%2Fdatafusion-dft","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datafusion-contrib%2Fdatafusion-dft/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datafusion-contrib%2Fdatafusion-dft/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datafusion-contrib%2Fdatafusion-dft/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datafusion-contrib","download_url":"https://codeload.github.com/datafusion-contrib/datafusion-dft/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253346454,"owners_count":21894264,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arrow","cli","data","database","datafusion","tui"],"created_at":"2024-11-16T09:12:44.579Z","updated_at":"2025-05-10T00:31:27.433Z","avatar_url":"https://github.com/datafusion-contrib.png","language":"Rust","funding_links":[],"categories":["Uncategorized"],"sub_categories":["Uncategorized"],"readme":"# dft - Batteries included DataFusion\n\n## 🚧 DOCS UNDER CONSTRUCTION\nDocumentation is undergoing a significant revamp - the new documentation will be finalized as part of the v0.3 release in the late Spring or early Summer of 2025.\n\n## Overview\n\n`dft` is a batteries-included suite of [DataFusion](https://github.com/apache/arrow-datafusion) applications that provides:\n\n- **Data Source Integration**: Query files from S3, local filesystems, or HuggingFace datasets\n- **Table Format Support**: Native support for Delta Lake\n- **Extensibility**: UDFs defined in WASM (and soon Python)\n- **Helper Functions**: Built-in functions for JSON and Parquet data processing\n\nThe project offers four complementary interfaces:\n\n1. **Text User Interface (TUI)**: An interactive SQL IDE with real-time query analysis, benchmarking, and catalog exploration\n2. **Command Line Interface (CLI)**: A scriptable engine for executing queries from files or command line\n3. **FlightSQL Server**: A standards-compliant SQL interface for programmatic access\n4. **HTTP Server**: A REST API for SQL queries and catalog exploration\n\nAll interfaces share the same execution engine, allowing you to develop locally with the TUI and then seamlessly deploy with the server implementations.\n\n`dft` builds upon [`datafusion-cli`](https://datafusion.apache.org/user-guide/cli/overview.html) with enhanced interactivity, additional integrations, and ready-to-use server implementations.\n\n## User Guide\n\n### Installation\n\n#### From crates.io (Recommended)\n```sh\n# If you have Rust installed\ncargo install datafusion-dft\n\n# For full functionality with all features\ncargo install datafusion-dft --all-features\n```\n\nIf you don't have Rust installed, follow the [installation instructions](https://www.rust-lang.org/tools/install).\n\n#### Feature Flags\nCommon feature combinations:\n```sh\n# Core with S3 support\ncargo install datafusion-dft --features=s3\n\n# Data lake formats\ncargo install datafusion-dft --features=deltalake\n\n# With JSON and Parquet functions\ncargo install datafusion-dft --features=function-json,functions-parquet\n```\n\nSee the [Features documentation](docs/features.md) for all available features.\n\n### Running the apps\n\n```sh\n# Interactive TUI (default)\ndft\n\n# CLI with direct query execution\ndft -c \"SELECT 1 + 2\"\n\n# CLI with file-based query\ndft -f query.sql\n\n# Benchmark a query (with stats)\ndft -c \"SELECT * FROM my_table\" --bench\n\n# Start FlightSQL Server (requires `flightsql` feature)\ndft serve-flightsql\n\n# Start HTTP Server (requires `http` feature)\ndft serve-http\n\n# Generate TPC-H data in the configured DB path\ndft generate-tpch\n```\n\n### Setting Up Tables with DDL\n\n`dft` can automatically load table definitions at startup, giving you a persistent \"database-like\" experience.\n\n#### Using DDL Files\n\n1. Create a DDL file (default: `~/.config/dft/ddl.sql`)\n2. Add your table and view definitions:\n\n```sql\n-- S3 data source (requires s3 feature)\nCREATE EXTERNAL TABLE users \nSTORED AS NDJSON \nLOCATION 's3://bucket/users';\n\n-- Parquet files\nCREATE EXTERNAL TABLE transactions \nSTORED AS PARQUET \nLOCATION 's3://bucket/transactions';\n\n-- Local files\nCREATE EXTERNAL TABLE listings \nSTORED AS PARQUET \nLOCATION 'file://folder/listings';\n\n-- Create views from tables\nCREATE VIEW users_listings AS \nSELECT * FROM users \nLEFT JOIN listings USING (user_id);\n\n-- Delta Lake table (requires deltalake feature)\nCREATE EXTERNAL TABLE delta_table \nSTORED AS DELTATABLE \nLOCATION 's3://bucket/delta_table';\n```\n\n#### Loading DDL\n\n- **TUI**: DDL is automatically loaded at startup\n- **CLI**: Add `--run-ddl` flag to execute DDL before your query\n- **Custom Path**: Configure a custom DDL path in your config file\n  ```toml\n  [execution]\n  ddl_path = \"/path/to/my/ddl.sql\"\n  ```\n\n## Quick Reference\n\n| Feature | Documentation |\n|---------|---------------|\n| **Core Features** | [Features Guide](docs/features.md) |\n| **Database** | [Database Guide](docs/db.md) |\n| **TUI Interface** | [TUI Guide](docs/tui.md) |\n| **CLI Usage** | [CLI Guide](docs/cli.md) |\n| **FlightSQL Server** | [FlightSQL Guide](docs/flightsql_server.md) |\n| **HTTP Server** | [HTTP Guide](docs/http_server.md) |\n| **Configuration Options** | [Config Reference](docs/config.md) |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatafusion-contrib%2Fdatafusion-dft","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatafusion-contrib%2Fdatafusion-dft","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatafusion-contrib%2Fdatafusion-dft/lists"}