{"id":33937144,"url":"https://github.com/datafusion-contrib/datafusion-objectstore-s3","last_synced_at":"2026-04-06T06:31:32.266Z","repository":{"id":39659715,"uuid":"444484587","full_name":"datafusion-contrib/datafusion-objectstore-s3","owner":"datafusion-contrib","description":"S3 as an ObjectStore for DataFusion","archived":false,"fork":false,"pushed_at":"2023-03-12T05:54:55.000Z","size":75,"stargazers_count":66,"open_issues_count":12,"forks_count":14,"subscribers_count":10,"default_branch":"main","last_synced_at":"2025-10-12T00:58:18.000Z","etag":null,"topics":["datafusion","rust"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datafusion-contrib.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-01-04T16:14:51.000Z","updated_at":"2025-10-06T14:07:40.000Z","dependencies_parsed_at":"2023-01-31T10:45:24.571Z","dependency_job_id":null,"html_url":"https://github.com/datafusion-contrib/datafusion-objectstore-s3","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/datafusion-contrib/datafusion-objectstore-s3","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datafusion-contrib%2Fdatafusion-objectstore-s3","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datafusion-contrib%2Fdatafusion-objectstore-s3/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datafusion-contrib%2Fdatafusion-objectstore-s3/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datafusion-contrib%2Fdatafusion-objectstore-s3/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datafusion-contrib","download_url":"https://codeload.github.com/datafusion-contrib/datafusion-objectstore-s3/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datafusion-contrib%2Fdatafusion-objectstore-s3/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31463011,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-05T21:22:52.476Z","status":"online","status_checked_at":"2026-04-06T02:00:07.287Z","response_time":112,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["datafusion","rust"],"created_at":"2025-12-12T14:48:02.373Z","updated_at":"2026-04-06T06:31:32.261Z","avatar_url":"https://github.com/datafusion-contrib.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DataFusion-ObjectStore-S3\n\nS3 as an ObjectStore for [Datafusion](https://github.com/apache/arrow-datafusion).\n\n## Querying files on S3 with DataFusion\n\nThis crate implements the DataFusion `ObjectStore` trait on AWS S3 and implementers of the S3 standard. We leverage the official [AWS Rust SDK](https://github.com/awslabs/aws-sdk-rust) for interacting with S3. While it is our understanding that the AWS APIs we are using a relatively stable, we can make no assurances on API stability either on AWS' part or within this crate. This crates API is tightly connected with DataFusion, a fast moving project, and as such we will make changes inline with those upstream changes.\n\n## Examples\n\nExamples for querying AWS and other implementors, such as MinIO, are shown below.\n\nLoad credentials from default AWS credential provider (such as environment or ~/.aws/credentials)\n\n```rust\nlet s3_file_system = Arc::new(S3FileSystem::default().await);\n```\n\n`S3FileSystem::default()` is a convenience wrapper for `S3FileSystem::new(None, None, None, None, None, None)`.\n\nConnect to implementor of S3 API (MinIO, in this case) using access key and secret.\n\n```rust\n// Example credentials provided by MinIO\nconst ACCESS_KEY_ID: \u0026str = \"AKIAIOSFODNN7EXAMPLE\";\nconst SECRET_ACCESS_KEY: \u0026str = \"wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY\";\nconst PROVIDER_NAME: \u0026str = \"Static\";\nconst MINIO_ENDPOINT: \u0026str = \"http://localhost:9000\";\n\nlet s3_file_system = S3FileSystem::new(\n    Some(SharedCredentialsProvider::new(Credentials::new(\n        MINIO_ACCESS_KEY_ID,\n        MINIO_SECRET_ACCESS_KEY,\n        None,\n        None,\n        PROVIDER_NAME,\n    ))), // Credentials provider\n    None, // Region\n    Some(Endpoint::immutable(Uri::from_static(MINIO_ENDPOINT))), // Endpoint\n    None, // RetryConfig\n    None, // AsyncSleep\n    None, // TimeoutConfig\n)\n.await;\n```\n\nUsing DataFusion's `ListingTableConfig` we register a table into a DataFusion `ExecutionContext` so that it can be queried.\n\n```rust\nlet filename = \"data/alltypes_plain.snappy.parquet\";\n\nlet config = ListingTableConfig::new(s3_file_system, filename).infer().await?;\n\nlet table = ListingTable::try_new(config)?;\n\nlet mut ctx = ExecutionContext::new();\n\nctx.register_table(\"tbl\", Arc::new(table))?;\n\nlet df = ctx.sql(\"SELECT * FROM tbl\").await?;\ndf.show()\n```\n\nWe can also register the `S3FileSystem` directly as an `ObjectStore` on an `ExecutionContext`. This provides an idiomatic way of creating `TableProviders` that can be queried.\n\n```rust\nexecution_ctx.register_object_store(\n    \"s3\",\n    Arc::new(S3FileSystem::default().await),\n);\n\nlet input_uri = \"s3://parquet-testing/data/alltypes_plain.snappy.parquet\";\n\nlet (object_store, _) = ctx.object_store(input_uri)?;\n\nlet config = ListingTableConfig::new(s3_file_system, filename).infer().await?;\n\nlet mut table_provider: Arc\u003cdyn TableProvider + Send + Sync\u003e = Arc::new(ListingTable::try_new(config)?);\n```\n\n## Testing\n\nTests are run with [MinIO](https://min.io/) which provides a containerized implementation of the Amazon S3 API.\n\nFirst clone the test data repository:\n\n```bash\ngit submodule update --init --recursive\n```\n\nThen start the MinIO container:\n\n```bash\ndocker run \\\n--detach \\\n--rm \\\n--publish 9000:9000 \\\n--publish 9001:9001 \\\n--name minio \\\n--volume \"$(pwd)/parquet-testing:/data\" \\\n--env \"MINIO_ROOT_USER=AKIAIOSFODNN7EXAMPLE\" \\\n--env \"MINIO_ROOT_PASSWORD=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY\" \\\nquay.io/minio/minio server /data \\\n--console-address \":9001\"\n```\n\nOnce started, run tests in normal fashion:\n\n```bash\ncargo test\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatafusion-contrib%2Fdatafusion-objectstore-s3","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatafusion-contrib%2Fdatafusion-objectstore-s3","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatafusion-contrib%2Fdatafusion-objectstore-s3/lists"}