{"id":13725021,"url":"https://github.com/litements/s3sqlite","last_synced_at":"2025-05-07T19:32:27.213Z","repository":{"id":59251404,"uuid":"535609864","full_name":"litements/s3sqlite","owner":"litements","description":"Query SQLite files in S3 using s3fs","archived":false,"fork":false,"pushed_at":"2022-09-14T23:36:13.000Z","size":37,"stargazers_count":502,"open_issues_count":1,"forks_count":8,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-21T08:04:21.971Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/litements.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-09-12T10:11:44.000Z","updated_at":"2025-04-11T03:48:41.000Z","dependencies_parsed_at":"2022-09-17T15:41:06.614Z","dependency_job_id":null,"html_url":"https://github.com/litements/s3sqlite","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/litements%2Fs3sqlite","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/litements%2Fs3sqlite/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/litements%2Fs3sqlite/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/litements%2Fs3sqlite/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/litements","download_url":"https://codeload.github.com/litements/s3sqlite/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252943861,"owners_count":21829326,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T01:02:10.055Z","updated_at":"2025-05-07T19:32:26.848Z","avatar_url":"https://github.com/litements.png","language":"Python","readme":"# s3sqlite\n\n\u003e Query SQLite databases in S3 using s3fs\n\n[APSW](https://rogerbinns.github.io/apsw/) SQLite VFS. This VFS enables reading\ndatabases from S3 using\n[s3fs](https://s3fs.readthedocs.io/en/latest/index.html). This only supports\nreading operations, any operation that tries to modify the DB file is ignored.\n\nInspired by [sqlite-s3vfs](https://github.com/uktrade/sqlite-s3vfs) and\n[sqlite-s3-query](https://github.com/michalc/sqlite-s3-query).\n\n## Notes about journal mode\n\nThis VFS will only work when the DB file is in any journal mode that is **not**\n[WAL](https://sqlite.org/wal.html). However, it will work if you set the journal\nmode to something else just before uploading the file to S3. You can (and\nprobably should) use WAL mode to generate the DB. Then you can change the\njournal mode (and the page size if you neeed) before uploading it to S3.\n\nThe test suite\n[includes](https://github.com/litements/s3sqlite/blob/3719f1ce50a7b5cfae754776bc9b2c17292f8d72/test.py#L198)\ntests for that use case. Take into account that the page size can't be changed\nwhen the database is in WAL mode. You need to change it before setting the WAL\nmode or by setting the database to rollback journal mode. [You need to execute\n`VACUUM;` after changing the page\nsize](https://www.sqlite.org/pragma.html#pragma_page_size) in a SQLite database.\n\n## Example usage\n\n```py\nimport s3fs\nimport s3sqlite\nimport apsw\n\n# Create an S3 filesystem. Check the s3fs docs for more examples:\n# https://s3fs.readthedocs.io/en/latest/\ns3 = s3fs.S3FileSystem(\n    key=\"somekey\",\n    secret=\"secret\",\n    client_kwargs={\"endpoint_url\": \"http://...\"},\n)\n\ns3vfs = s3sqlite.S3VFS(name=\"s3-vfs\", fs=s3)\n\n# Define the S3 location\nkey_prefix = \"mybucket/awesome.sqlite3\"\n\n# Upload the file to S3\ns3vfs.upload_file(\"awesome.sqlite3\", dest=key_prefix)\n\n# Create a database and query it\nwith apsw.Connection(\n    key_prefix, vfs=s3vfs.name, flags=apsw.SQLITE_OPEN_READONLY\n) as conn:\n\n    cursor = conn.execute(\"...\")\n    print(cursor.fetchall())\n\n```\n\n## Installation\n\n```\npython3 -m pip install s3sqlite\n```\n\n## Run tests\n\nThe testing script will use the [Chinook\ndatabase](https://github.com/lerocha/chinook-database/), it will modify (and\n`VACUUM;`) the file to use all the possible combinations of journal modes and\npage sizes\n\n1. Download the chinook database:\n\n```sh\ncurl https://raw.githubusercontent.com/lerocha/chinook-database/master/ChinookDatabase/DataSources/Chinook_Sqlite_AutoIncrementPKs.sqlite -o chinook.sqlite3\n```\n\n2. Make sure you have Docker installed.\n\nThe testing script will start a [MinIO](https://min.io/) container to run the\ntests locally. After the tests finish, the container will be stopped\natuomatically.\n\n3. Run the tests:\n\n```sh\npython3 -m pytest test.py\n```\n\n## Alternatives\n\n- [sqlite-s3vfs](https://github.com/uktrade/sqlite-s3vfs): This VFS stores the\n  SQLite file as separate DB pages. This enables having a single writer without\n  having to overwrite the whole file. `s3sqlite`'s main difference is that this\n  just needs uploading a single file to S3. `sqlite-s3vfs` will split the\n  database in pages and upload the pages separately to a bucket prefix. Having\n  just a single file has some advantages, like making use of object [versioning\n  in the\n  bucket](https://s3fs.readthedocs.io/en/latest/index.html?highlight=version#bucket-version-awareness).\n  I also think that relying on\n  [s3fs](https://s3fs.readthedocs.io/en/latest/index.html) makes the VFS more\n  [flexible](https://s3fs.readthedocs.io/en/latest/index.html#s3-compatible-storage)\n  than calling `boto3` as `sqlite3-s3vfs` does. `s3fs` should also handle\n  retries automatically.\n- [sqlite-s3-query](https://github.com/michalc/sqlite-s3-query): This VFS is very\n  similar to `s3sqlit`, but it uses `ctypes` directly to create the VFS and uses\n  `httpx` to make requests to S3.\n\nI decided to create a new VFS that didn't require using `ctypes` so that it's\neasier to understand and maintain, but I still want to have a single file in S3\n(vs. separate DB pages). At the same time, by using\n[s3f3](https://s3fs.readthedocs.io/en/latest/) I know I can use any S3\nstorage supported by that library.\n\n## Other\n\nThe Chinook database used for testing can be obtained from: https://github.com/lerocha/chinook-database/\n\nThe testing section in this README contains a command you can run to get the file.\n\n## License\n\nDistributed under the Apache 2.0 license. See `LICENSE` for more information.\n","funding_links":[],"categories":["Python","backup and replicate"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flitements%2Fs3sqlite","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flitements%2Fs3sqlite","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flitements%2Fs3sqlite/lists"}