{"id":13626089,"url":"https://github.com/uktrade/sqlite-s3vfs","last_synced_at":"2025-04-16T11:31:14.128Z","repository":{"id":46453183,"uuid":"411745858","full_name":"uktrade/sqlite-s3vfs","owner":"uktrade","description":"Python writable virtual filesystem for SQLite on S3","archived":false,"fork":false,"pushed_at":"2024-09-03T22:28:04.000Z","size":163,"stargazers_count":175,"open_issues_count":3,"forks_count":10,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-04-09T21:24:11.177Z","etag":null,"topics":["data-infrastructure","diapp"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/uktrade.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-09-29T16:17:16.000Z","updated_at":"2025-03-31T13:02:45.000Z","dependencies_parsed_at":"2024-06-13T14:26:18.421Z","dependency_job_id":"0228a162-e5b6-4231-9544-7608c96acc29","html_url":"https://github.com/uktrade/sqlite-s3vfs","commit_stats":{"total_commits":146,"total_committers":2,"mean_commits":73.0,"dds":0.006849315068493178,"last_synced_commit":"b20ff2e572edead955f1e482a39f6cb75cfbfc74"},"previous_names":[],"tags_count":36,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uktrade%2Fsqlite-s3vfs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uktrade%2Fsqlite-s3vfs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uktrade%2Fsqlite-s3vfs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uktrade%2Fsqlite-s3vfs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/uktrade","download_url":"https://codeload.github.com/uktrade/sqlite-s3vfs/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249235048,"owners_count":21235137,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-infrastructure","diapp"],"created_at":"2024-08-01T21:02:09.679Z","updated_at":"2025-04-16T11:31:12.506Z","avatar_url":"https://github.com/uktrade.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# sqlite-s3vfs\n\n[![PyPI package](https://img.shields.io/pypi/v/sqlite-s3vfs?label=PyPI%20package\u0026color=%234c1)](https://pypi.org/project/sqlite-s3vfs/) [![Test suite](https://img.shields.io/github/actions/workflow/status/uktrade/sqlite-s3vfs/test.yml?label=Test%20suite)](https://github.com/uktrade/sqlite-s3vfs/actions/workflows/test.yml) [![Code coverage](https://img.shields.io/codecov/c/github/uktrade/sqlite-s3vfs?label=Code%20coverage)](https://app.codecov.io/gh/uktrade/sqlite-s3vfs)\n\nPython virtual filesystem for SQLite to read from and write to S3.\n\nNo locking is performed, so client code _must_ ensure that writes do not overlap with other writes or reads. If multiple writes happen at the same time, the database will probably become corrupt and data be lost.\n\nBased on [simonwo's gist](https://gist.github.com/simonwo/b98dc75feb4b53ada46f224a3b26274c), and inspired by [phiresky's sql.js-httpvfs](https://github.com/phiresky/sql.js-httpvfs), [dacort's Stack Overflow answer](https://stackoverflow.com/a/59434097/1319998), and [michalc's sqlite-s3-query](https://github.com/michalc/sqlite-s3-query).\n\n\n## How does it work?\n\nsqlite-s3vfs stores the SQLite database in fixed-sized _blocks_, and each is stored as a separate object in S3. SQLite stores its data in fixed-size _pages_, and always writes exactly a page at a time. This virtual filesystem translates  page reads and writes to block reads and writes. In the case of SQLite pages being the same size as blocks, which is the case by default, each page write results in exactly one block write.\n\nSeparate objects are required since S3 does not support the partial replace of an object; to change even 1 byte, it must be re-uploaded in full.\n\n\n## Installation\n\nsqlite-s3vfs can be installed from PyPI using `pip`.\n\n```bash\npip install sqlite-s3vfs\n```\n\nThis will automatically install [boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html), [APSW](https://rogerbinns.github.io/apsw/), and any of their dependencies.\n\n\n## Usage\n\nsqlite-s3vfs is an [APSW](https://rogerbinns.github.io/apsw/) virtual filesystem that requires [boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) for its communication with S3.\n\n```python\nimport apsw\nimport boto3\nimport sqlite_s3vfs\n\n# A boto3 bucket resource\nbucket = boto3.Session().resource('s3').Bucket('my-bucket')\n\n# An S3VFS for that bucket\ns3vfs = sqlite_s3vfs.S3VFS(bucket=bucket)\n\n# sqlite-s3vfs stores many objects under this prefix\n# Note that it's not typical to start a key prefix with '/'\nkey_prefix = 'my/path/cool.sqlite'\n\n# Connect, insert data, and query\nwith apsw.Connection(key_prefix, vfs=s3vfs.name) as db:\n    cursor = db.cursor()\n    cursor.execute('''\n        CREATE TABLE foo(x,y);\n        INSERT INTO foo VALUES(1,2);\n    ''')\n    cursor.execute('SELECT * FROM foo;')\n    print(cursor.fetchall())\n```\n\nSee the [APSW documentation](https://rogerbinns.github.io/apsw/) for more examples.\n\n\n### Serializing (getting a regular SQLite file out of the VFS)\n\nThe bytes corresponding to a regular SQLite file can be extracted with the `serialize_iter` function, which returns an iterable,\n\n```python\nfor chunk in s3vfs.serialize_iter(key_prefix=key_prefix):\n    print(chunk)\n```\n\nor with `serialize_fileobj`, which returns a non-seekable file-like object. This can be passed to Boto3's `upload_fileobj` method to upload a regular SQLite file to S3.\n\n```python\ntarget_obj = boto3.Session().resource('s3').Bucket('my-target-bucket').Object('target/cool.sqlite')\ntarget_obj.upload_fileobj(s3vfs.serialize_fileobj(key_prefix=key_prefix))\n```\n\n\n### Deserializing (getting a regular SQLite file into the VFS)\n\n```python\n# Any iterable that yields bytes can be used. In this example, bytes come from\n# a regular SQLite file already in S3\nsource_obj = boto3.Session().resource('s3').Bucket('my-source-bucket').Object('source/cool.sqlite')\nbytes_iter = source_obj.get()['Body'].iter_chunks()\n\ns3vfs.deserialize_iter(key_prefix='my/path/cool.sqlite', bytes_iter=bytes_iter)\n```\n\n\n### Block size and page size\n\nSQLite writes data in _pages_, which are 4096 bytes by default. sqlite-s3vfs stores data in _blocks_, which are also 4096 bytes by default. If you change one you should change the other to match for performance reasons.\n\n```python\ns3vfs = sqlite_s3vfs.S3VFS(bucket=bucket, block_size=65536)\nwith apsw.Connection(key_prefix, vfs=s3vfs.name) as db:\n    cursor = db.cursor()\n    cursor.execute('''\n        PRAGMA page_size = 65536;\n    ''')\n```\n\n\n## Tests\n\nThe tests require the dev dependencies and MinIO started\n\n```bash\npip install -e \".[dev]\"\n./start-minio.sh\n```\n\ncan be run with pytest\n\n```bash\npytest\n```\n\nand finally Minio stopped\n\n```bash\n./stop-minio.sh\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fuktrade%2Fsqlite-s3vfs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fuktrade%2Fsqlite-s3vfs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fuktrade%2Fsqlite-s3vfs/lists"}