{"id":18037518,"url":"https://github.com/kevmo314/appendable","last_synced_at":"2025-03-27T09:31:43.144Z","repository":{"id":209219848,"uuid":"717489870","full_name":"kevmo314/appendable","owner":"kevmo314","description":"Appendable is an append-only, schema-less, daemon-less database.","archived":false,"fork":false,"pushed_at":"2024-05-22T17:21:34.000Z","size":57876,"stargazers_count":11,"open_issues_count":9,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-05-22T17:44:36.723Z","etag":null,"topics":["database"],"latest_commit_sha":null,"homepage":"https://kevmo314.github.io/appendable/","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kevmo314.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-11T16:24:33.000Z","updated_at":"2024-05-28T16:14:00.398Z","dependencies_parsed_at":"2024-01-05T17:44:21.759Z","dependency_job_id":"6197eb83-83c1-4700-aa6c-b3e9532bcd0a","html_url":"https://github.com/kevmo314/appendable","commit_stats":null,"previous_names":["kevmo314/appendable"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kevmo314%2Fappendable","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kevmo314%2Fappendable/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kevmo314%2Fappendable/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kevmo314%2Fappendable/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kevmo314","download_url":"https://codeload.github.com/kevmo314/appendable/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":222226132,"owners_count":16951818,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["database"],"created_at":"2024-10-30T13:10:58.249Z","updated_at":"2024-10-30T13:10:58.826Z","avatar_url":"https://github.com/kevmo314.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Appendable\n\n[![discord](https://img.shields.io/badge/Join-Discord-%235865F2.svg)](https://discord.gg/q9PE76FxXw)\n\nAppendable is an append-only, schema-less, daemon-less database.\n\nAppendable doesn't require a conventional server, instead it generates an index\nfile that you can host on your favorite static hosting site.\n\nAppendable currently supports data files in the following formats:\n\n- [x] [JSON Lines](https://jsonlines.org/) `.jsonl`\n- [ ] Parquet\n- [ ] CSV\n- [ ] TSV\n- [ ] RecordIO\n\nwith more formats coming soon.\n\n\u003e [!CAUTION]\n\u003e This README is currently somewhat aspirational and many features are not implemented yet.\n\u003e Check out the \u003ca href=\"https://kevmo314.github.io/appendable/\"\u003etechnical demo\u003c/a\u003e for the functionality.\n\n## Motivation\n\nA smart friend of mine once said\n\n\u003e _The problem with databases is that everybody cares about a different killer feature_\n\nAppendable's primary goals are\n\n- Cost-optimized serving. Leverage your favorite static content host instead of\n  maintaining and provisioning resources for a dedicated database server.\n- Speed-optimized (O(1)) index updating for appends. Index file updates are\n  fast and deterministic, making them suitable for real-time and streaming data\n  updates.\n\n## Demonstration\n\nCheck out this repository's \u003ca href=\"https://kevmo314.github.io/appendable/\"\u003eGitHub pages\u003c/a\u003e for an example querying the server.\n\n```ts\nimport Appendable from \"appendable\";\n\nconst db = Appendable.init(\"data.jsonl\", \"index.dat\");\n\nconst results = await db\n  .where(\"timestamp\", \"\u003e=\", \"2023-11-01T00:00:00Z\")\n  .where(\"count\", \"\u003c=\", 15)\n  .orderBy(\"count\", \"DESC\")\n  .orderBy(\"timestamp\", \"ASC\")\n  .limit(20)\n  .get();\n\nconsole.log(results); // contains data.jsonl queried with the above query.\n```\n\n## Getting Started\n\nCheck out the demo's \u003ca href=\"https://github.com/kevmo314/appendable/blob/main/examples/README.md\"\u003eREADME\u003c/a\u003e to get started.\n\n## Advanced Usage\n\n### Real-time updates\n\nAppendable indexes are intended to be very cheap to produce incrementally. It is so\ncheap that it is not unreasonable to generate the index on demand. That is, you can\nrun a server such that `index.dat` produces the output from running\n`./appendable -i index.dat` and cache the latest version on your CDN. Couple this with\na signalling channel to indicate that a version update has occurred to subscribe to\nupdates. For example,\n\n```ts\nimport Appendable from \"appendable\";\n\nconst db = Appendable.init(\"data.jsonl\", \"index.dat\");\n\nconst unsubscribe = db\n  .where(\"timestamp\", \"\u003e=\", \"2023-11-01T00:00:00Z\")\n  .where(\"count\", \"\u003c=\", 15)\n  .orderBy(\"count\", \"DESC\")\n  .orderBy(\"timestamp\", \"ASC\")\n  .limit(20)\n  .onSnapshot((results) =\u003e {\n    console.log(results);\n  });\n\n// then elsewhere\n\ndb.dirty();\n```\n\nSnapshot updates will only occur when the underlying data has changed. Therefore, `.dirty()`\ncan be called without too much concern.\n\n### Schemas\n\nA schema file is not required to use Appendable, however if you wish to ensure that\nyour data follows certain types, pass a JSON Schema file with `-s schema.json` and\nAppendable will throw an error instead of inferring the type from the data. This\ncan be useful for detecting consistency issues or enforcing field restrictions.\n\nA word of caution, if you add a non-nullable field to your JSON schema, this will cause\nall your previous data to be invalidated requiring an index regeneration. To avoid this,\npass `--imply-nullable` to indicate that previous data is ok to be null but new data\nshould validate. Be aware that this has implications on the generated types, in particular\nyour client will see the field as nullable despite the schema saying non-nullable.\n\n### Generated types\n\nAppendable can also emit TypeScript type definitions. Pass `-t output.d.ts` to produce\nan inferred type definition file to make your queries type-safe. This can be used with\n\n```ts\nimport Appendable from \"appendable\";\nimport DBTypes from 'output.d.ts';\n\nconst db = Appendable.init\u003cDBTypes\u003e(\"data.jsonl\", \"index.dat\");\n\n...\n```\n\nNote that if a schema file is provided, it is guaranteed that the generated type definition\nfile is stable. That is, if the schema file does not change, the type definition file will\nnot change.\n\n### Complex queries\n\nThe demonstration example uses a simple query, however the query builder is syntactic sugar over\na `.query()` call. If you wish to perform more advanced queries, you can do so by calling `.query()`\ndirectly. For example,\n\n```ts\nimport Appendable from \"appendable\";\n\nconst db = Appendable.init(\"data.jsonl\", \"index.dat\");\n\nconst results = await db.query({\n  where: [\n    { operation: \"\u003e=\", key: \"timestamp\", value: \"2023-11-01T00:00:00Z\" },\n    { operation: \"\u003c=\", key: \"count\", value: 15 },\n  ],\n  orderBy: [\n    { key: \"count\", direction: \"DESC\" },\n    { key: \"timestamp\", direction: \"ASC\" },\n  ],\n});\n```\n\n### Permissioning and sharding\n\nAppendable does not support permissioning because it assumes that the data is publicly\nreadable. To accommodate permissions, we recomend guarding the access of your data files\nvia your preferred authentication scheme. That is, create an index file for each user's\ndata. For example, your static file content may look something like\n\n```\n/users/alice/data.jsonl\n/users/alice/index.dat\n/users/bob/data.jsonl\n/users/bob/index.dat\n/users/catherine/data.jsonl\n/users/catherine/index.dat\n```\n\nWhere each user has access to their own data and index file.\n\n### Mutability and skew\n\nAppendable is geared towards data that is immutable, however in practice this might not\nbe ideal. In order to accommodate data mutations, a data integrity hash is maintained so\nwhen data is mutated, the data will be reindexed. Reindexing is O(n) in the age\nof the oldest mutation (hence why appending is O(1) for updating the index!) so mutating\ndata early on in the data file will be more expensive to update.\n\nMutations must be carefully performed because they will cause the previous index\nto be corrupted. Therefore, when updating the files on your server, the data and index\nfiles must be done atomically. This is tricky to do right, however one approach is to\nversion your data and index files.\n\nWhen a mutation is performed, you will need to create a new version of the data\nfile that includes the mutation along with the updated index file. Host the two\nseparately under a different version number so clients that started by querying\nthe previous version can continue to access it.\n\nNote that the performance limitations of indexing makes this intractible at any\nappreciable scale so generally speaking, it's strongly recommended to keep your\ndata immutable and append-only within a data file.\n\n### Custom `fetch()` API\n\nFor convenience, Appendable uses the browser's `fetch()` for fetching data files if\nthe data and index files are specified as a string. If you wish to use your own library\nor wish to add your own headers, pass a callback.\n\nThe callback must correctly return a byte slice representing the start and end parameters.\n\nFor example,\n\n```ts\nimport Appendable from \"appendable\";\n\nconst db = Appendable.init(\n  (start: number, end: number) =\u003e {\n    const response = await fetch(\"data.jsonl\", {\n      headers: { Range: `bytes=${start}-${end}` },\n    });\n    return await response.arrayBuffer();\n  },\n  (start: number, end: number) =\u003e {\n    const response = await fetch(\"index.dat\", {\n      headers: { Range: `bytes=${start}-${end}` },\n    });\n    return await response.arrayBuffer();\n  },\n);\n```\n\n## Peanut gallery\n\n### Why not query a SQLite database with range requests?\n\nDumping all the data into a SQLite database, hosting it, and then querying with\nsomething like [sql.js](https://sql.js.org/) _could_ (and probably would) work,\nbut I find it particularly elegant that Appendable doesn't change the raw data.\nIn other words, besides producing an index file the `jsonl` file provided stays\nuntouched which means updates to the index file can lag behind data changes and\nremain valid because the database doesn't need to shuffle the data around.\n\nAdditionally, Appendable is built for serving statically, so it takes advantage\nof [multiple ranges](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Range#requesting_multiple_ranges)\nwhenever possible. Compared to [hosting SQLite](https://phiresky.github.io/blog/2021/hosting-sqlite-databases-on-github-pages/)\nwhich does a great job at minimizing data usage, Appendable further lessens the\nrequest overhead with less data transfer and fewer round trips.\n\n### My data is never append-only, what's the point of this?\n\nA lot of data isn't actually append-only but can often be restructured as if it\nwere append-only. For example, creating a sequence of changelogs or deltas lets\nyou see the history and evolution of a document.\n\nOf course, not all data can be structured like this but Appendable started from\nme observing that a decent chunk of my data _was_ written to an appending table\nand not mutating any existing data and that it avoided some performance issues,\nbut the underlying database didn't take advantage of them. For example, with an\nappend-only data file, Appendable doesn't have to worry about row locking. That\nmeans that there's no tail latency issues when querying and the database can be\nscaled horizontally with conventional CDNs. This isn't possible (well ok, it is\nbut it's [very expensive](https://cloud.google.com/spanner)) with the usual set\nof databases.\n\nIf you're not convinced, think of Appendable as more geared towards time-series\ndatasets. Somewhat like a [kdb+](https://en.wikipedia.org/wiki/Kdb%2B) database\nbut meant for applications instead of more specialized use cases.\n\n### How do I deal with deletion requests if I can only append?\n\nIt's recommended to shard your data in a way that, if you need to delete any of\nit, the entire data file and index file are deleted together. For example, data\nrepresenting a document can be deleted by deleting the document's corresponding\ndata file and and its index file.\n\nIf, for example, you wish to delete records associated with a given user within\na data file and a mutation must be performed, consult the _Mutability and skew_\nsection above for the associated caveats.\n\n## Limitations\n\n- Max field size: 8793945536512 bytes\n- Max number of rows: 2^64-1\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkevmo314%2Fappendable","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkevmo314%2Fappendable","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkevmo314%2Fappendable/lists"}