{"id":14065507,"url":"https://github.com/dacort/athena-sqlite","last_synced_at":"2025-10-31T01:31:32.083Z","repository":{"id":66459856,"uuid":"227300707","full_name":"dacort/athena-sqlite","owner":"dacort","description":"A SQLite driver for S3 and Amazon Athena 😳","archived":false,"fork":false,"pushed_at":"2019-12-16T06:19:07.000Z","size":87,"stargazers_count":95,"open_issues_count":1,"forks_count":5,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-04-07T01:23:38.106Z","etag":null,"topics":["amazon-athena","athena","aws","lambda-layer","s3","sar","serverless","sqlite","vfs"],"latest_commit_sha":null,"homepage":"https://serverlessrepo.aws.amazon.com/#/applications/arn:aws:serverlessrepo:us-east-1:689449560910:applications~AthenaSQLITEConnector","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dacort.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2019-12-11T07:14:58.000Z","updated_at":"2025-01-02T10:23:30.000Z","dependencies_parsed_at":"2024-02-15T00:53:50.234Z","dependency_job_id":"19620768-e2f1-49d8-80bd-3ff95c36db4d","html_url":"https://github.com/dacort/athena-sqlite","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dacort/athena-sqlite","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dacort%2Fathena-sqlite","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dacort%2Fathena-sqlite/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dacort%2Fathena-sqlite/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dacort%2Fathena-sqlite/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dacort","download_url":"https://codeload.github.com/dacort/athena-sqlite/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dacort%2Fathena-sqlite/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281914522,"owners_count":26583082,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-30T02:00:06.501Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["amazon-athena","athena","aws","lambda-layer","s3","sar","serverless","sqlite","vfs"],"created_at":"2024-08-13T07:04:31.790Z","updated_at":"2025-10-31T01:31:31.791Z","avatar_url":"https://github.com/dacort.png","language":"Python","readme":"# Athena SQLite Driver\n\nUsing Athena's new [Query Federation](https://github.com/awslabs/aws-athena-query-federation/) functionality, read SQLite databases from S3.\n\nInstall it from the Serverless Application Repository: [AthenaSQLiteConnector](https://serverlessrepo.aws.amazon.com/#/applications/arn:aws:serverlessrepo:us-east-1:689449560910:applications~AthenaSQLITEConnector).\n\n## Why?\n\nI occasionally like to put together fun side projects over Thanksgiving and Christmas holidays.\n\nI'd always joked it would a crazy idea to be able to read SQLite using Athena, so...here we are!\n\n## How?\n\n- I decided to use Python as I'm most familiar with it and because of the next point\n- Using [APSW](https://rogerbinns.github.io/apsw/), we can implement a [Virtual File System](https://rogerbinns.github.io/apsw/vfs.html) (VFS) for S3\n- Using the [Athena query federation example](https://github.com/awslabs/aws-athena-query-federation/blob/master/athena-example/), we can see what calls need to be implemented\n\nThe PyArrow library unfortunately weighs in over 250MB, so we have to use a custom compilation step to build a Lambda Layer.\n\n## What?\n\nDrop SQLite databases in a single prefix in S3, and Athena will list each file as a database and automatically detect tables and schemas.\n\nCurrently, all data types are strings. I'll fix this eventually.  All good things in time.\n\n## Status\n\nThis project is under active development and very much in it's infancy.\n\nMany things are hard-coded or broken into various pieces as I experiment and figure out how everything works.\n\n## Building\n\nThe documentation for this is a work in progress. It's currently in between me creating the resources manually and building the assets for the AWS SAR,\nand most of the docs will be automated away.\n\n### Requirements\n\n- Docker\n- Python 3.7\n\n### Lambda layer\n\nFirst you need to build Lambda layer. There are two Dockerfiles and build scripts in the `lambda-layer/` directory.\n\nWe'll execute each of the build scripts and copy the results to the target directory. This is referenced by the SAR template, [`athena-sqlite.yaml`](athena-sqlite.yaml).\n\n```\ncd lambda-layer\n./build.sh\n./build-pyarrow.sh\ncp -R layer/ ../target/\n```\n\n### Upload sample data\n\nFor the purpose of this test, we just have a sample sqlite database you can upload.\n\n`aws s3 cp sample-data/sample_data.sqlite s3://\u003cTARGET_BUCKET\u003e/\u003cTARGET_PREFIX\u003e/`\n\nFeel free to upload your own SQLite databases as well!\n\n### Lambda function\n\nThere are three components to the Lambda code:\n\n- `vfs.py` - A SQLite Virtual File System implementation for S3\n- `s3qlite.py` - The actual Lambda function that handles Athena metadata/data requests\n- `sqlite_db.py` - Helper functions for access SQLite databases on S3\n\nCreate a function with the code in [lambda-function/s3qlite.py](lambda-function/s3qlite.py) that uses the previously created layer.\nThe handler will be `s3qlite.lambda_handler`\nAlso include the `vfs.py` and `sqlite_db.py` files in your Lambda function\n\nConfigure two environment variables for your lambda function:\n- `TARGET_BUCKET` - The name of your S3 bucket where SQLite files live\n- `TARGET_PREFIX` - The prefix (e.g. `data/sqlite`) that you uploaded the sample sqlite database to\n\nNote that the IAM role you associate the function with will also need `s3:GetObject` and `s3:ListBucket` access to wherever your lovely SQLite databases are stored.\n\n### Configure Athena\n\nFollow the Athena documentation for [Connecting to a data source](https://docs.aws.amazon.com/athena/latest/ug/connect-to-a-data-source.html).\nThe primary thing to note here is that you need to create a workgroup named `AmazonAthenaPreviewFunctionality` and use that for your testing.\nSome functionality will work in the primary workgroup, but you'll get weird errors when you try to query data.\n\nI named my function `s3qlite` :)\n\n### Run queries!\n\nHere's a couple basic queries that should work:\n\n```sql\nSELECT * FROM \"s3qlite\".\"sample_data\".\"records\" limit 10;\n\nSELECT COUNT(*) FROM \"s3qlite\".\"sample_data\".\"records\";\n```\n\nIf you deploy the SAR app, the data catalog isn't registered automatically, but you can still run queries by using the special `lambda:` schema:\n\n```sql\nSELECT * FROM \"lambda:s3qlite\".sample_data.records LIMIT 10;\n```\n\nWhere `s3qlite` is the value you provided for the `AthenaCatalogName` parameter.\n\n## TODO\n\n- Move these into issues :)\n- Move vfs.py into it's own module\n    - Maybe add write support to it someday :scream:\n- Publish to SAR\n- Add tests...always tests\n- struct types, probably\n- Don't read the entire file every time :)\n- Escape column names with invalid characters\n- Implement recursive listing\n\n## Serverless App Repo\n\nThese are mostly notes I made while figuring out how to get SAR working.\n\nNeed to grant SAR access to the bucket\n\n```shell\naws s3api put-bucket-policy --bucket \u003cBUCKET\u003e --region us-east-1 --policy '{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Principal\": {\n        \"Service\":  \"serverlessrepo.amazonaws.com\"\n      },\n      \"Action\": \"s3:GetObject\",\n      \"Resource\": \"arn:aws:s3:::\u003cBUCKET\u003e/*\"\n    }\n  ]\n}'\n```\n\nFor publishing to the SAR, we just execute two commands\n\n```shell\nsam package --template-file athena-sqlite.yaml --s3-bucket \u003cBUCKET\u003e --output-template-file target/out.yaml\nsam publish --template target/out.yaml --region us-east-1\n```\n\nIf you want to deploy using CloudFormation, use this command:\n\n```shell\nsam deploy --template-file ./target/out.yaml --stack-name athena-sqlite --capabilities CAPABILITY_IAM --parameter-overrides 'DataBucket=\u003cBUCKET\u003e DataPrefix=tmp/sqlite' --region us-east-1\n```\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdacort%2Fathena-sqlite","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdacort%2Fathena-sqlite","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdacort%2Fathena-sqlite/lists"}