{"id":20564454,"url":"https://github.com/tarantool/document","last_synced_at":"2025-04-14T15:13:20.769Z","repository":{"id":46196090,"uuid":"88863243","full_name":"tarantool/document","owner":"tarantool","description":"Effortless JSON storage for Tarantool","archived":false,"fork":false,"pushed_at":"2022-11-17T17:12:43.000Z","size":402,"stargazers_count":24,"open_issues_count":3,"forks_count":4,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-04-14T15:13:10.701Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Lua","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tarantool.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-04-20T12:28:43.000Z","updated_at":"2023-04-24T05:56:15.000Z","dependencies_parsed_at":"2023-01-22T05:16:18.017Z","dependency_job_id":null,"html_url":"https://github.com/tarantool/document","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tarantool%2Fdocument","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tarantool%2Fdocument/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tarantool%2Fdocument/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tarantool%2Fdocument/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tarantool","download_url":"https://codeload.github.com/tarantool/document/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248904637,"owners_count":21180835,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-16T04:26:48.381Z","updated_at":"2025-04-14T15:13:20.734Z","avatar_url":"https://github.com/tarantool.png","language":"Lua","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cimg src='doc/books.png' width='600' title='Document'\u003e\n\n[Use cases](#use-cases)\u0026nbsp; | \u0026nbsp;[Setup](#setup)\u0026nbsp; | \u0026nbsp;[Status](#status)\u0026nbsp; | \u0026nbsp;[API](#api)\u0026nbsp; | \u0026nbsp;[Contact](#contacts)\n\u003cbr\u003e\u003cbr\u003e\n[![Build Status](https://travis-ci.org/tarantool/document.svg?branch=master)](https://travis-ci.org/tarantool/document)\n\n# Effortless JSON storage for Tarantool\n\nYou may use this module to receive and store structured data you get\nfrom external world. It has a few important strengths:\n\n-   You are not forced to define any kind of schema for your documents\n-   Still, they are stored with very little redundancy\n-   You can build indices on arbitrary fields (even nested)\n-   There are convenient high-level functions for data manipulation\n-   The module works transparently for local spaces, remote spaces and even sharded spaces!\n-   You can do \"eventually consistent\" selects and joins across sharded spaces!\n\n## Use cases\n\nThis module is suitable for projects where having a strict schema is\nnot desirable. And especially for small codebases, where you don't\nwant to write lots of boilerplate.\n\n## Setup\n\nThis module has no outside dependencies, so you can just drop\ndocument.lua into the root of your project.\n\nAlternatively, you can use Tarantool package manager:\n\n```bash\ntarantoolctl rocks install document\n```\n\n## Usage\n\nBoilerplate:\n\n```lua\ndoc = require('document')\njson = require('json')\n\nbox.cfg{}\n\nbox.schema.create_space('test', {if_not_exists = true})\ndoc.create_index(box.space.test, 'primary',\n                 {parts={'id', 'unsigned'}, if_not_exists=true})\n```\n\nActual data manipulation\n\n```lua\ndoc.insert(box.space.test, {id=1, foo=\"foo\", bar={baz=3}})\ndoc.insert(box.space.test, {id=2, foo=\"bar\", bar={baz=0}})\n\nprint('All tuples')\nfor _, r in doc.select(box.space.test) do\n    print('tuple:', json.encode(r))\nend\n\nprint('Tuples where bar.baz \u003e 0')\nfor _, r in doc.select(box.space.test, {{'$bar.baz', '\u003e', 0}}) do\n    print('tuple:', json.encode(r))\nend\n\nprint('Deleting a tuple where primary key == 2')\ndoc.delete(box.space.test, {{\"$id\", \"==\", 2}})\n```\n\n## How it works\n\nA naive implementation would have just stored JSON documents as\nstrings inside a tuple, and extracted indices into separate fields of\nthe tuple.\n\nA more optimized approach is what mongodb or postgresql are doing:\ninstead of storing JSON documents as text, invent a compact binary\nformat and store it inside a tuple.\n\nBut we decided to take another approach, and dynamically figure out\ndocument schema. We walk through the incoming document and put each\nleaf element into a separate tuple field, essentially \"flattening\" it.\nIf we already saw such field previously, then schema already contains\na mapping between path in the document and a position inside the\ntuple. If not, then we extend the schema and add a new field,\nassigning a new rightmost column in the tuple to store its data.\n\nWhen data is selected back, we reconstruct the original object using\ndocument schema.\n\nOur experiments show that most documents can achieve 5x to 10x\ncompression due to the method, because the schema is stored only once\nper space.\n\n## Queries\n\nQueries are written using Lua tables, and are just lists of conditions\nof the following form:\n\n    {left, op, right}\n\nWhere `left` and `right` parts of the condition are either regular\nvalues or references to field name, and `op` is a comparison operator.\n\nExample values for `left` and `right`:\n\n-   `1`\n-   `nil`\n-   `\"foo\"`\n-   `\"$id\"`\n\nHere, the `\"$id\"` is a special form that references tuple field by\nname. You can put a \"path\" there, separated with \".\", like\n`\"$foo.bar.val\"`.\n\nExample values for `op`:\n\n-   `\"\u003e\"`\n-   `\"\u003e=\"`\n-   `\"==\"`\n-   `\"\u003c=\"`\n-   `\"\u003c\"`\n\nQuery examples:\n\n-   `{{\"$id\", \"\u003e\", 10}}`\n-   `{{\"$id\", \"\u003e\", 10}, {\"$id\", \"\u003c\", 100}}`\n-   `{{\"$user.name\", \"==\", \"foo\"}, {\"$qty\", \"==\", 0}}`\n\n## Status\n\n- The functionality for dealing with regular spaces is feature-complete\n- Serialization/deserialization should be reasonably fast for most use-cases (though, there are no benchmarks at the moment)\n- Selects/joins across sharded spaces may have bugs. There is no automated test coverage for this case.\n\n## API\n\n### `doc.insert(space, tbl)`\n\nInsert document `tbl` into `space`.\n\n### `doc.delete(space, query)`\n\nInsert table `tbl` into `space`.\n\nDelete documents from `space`, that match `query` (see Queries above)\n\n### `doc.select(space, query, options)`\n\nSelect documents from `space` that match `query` (see Queries above)\nand return an iterator to the result set.\n\n`options` is a table with the following optional keys:\n- `limit`: maximum number of results to return\n- `offset`: the offset from the beginning of the result set\n\n### `doc.join(space1, space2, query, options)`\n\nPerform an inner join of spaces `space1` and `space2`, where both\nitems satisfy `query` (see Queries above).\n\n`options` is a table with the following optional keys:\n- `limit`: maximum number of results to return\n- `offset`: the offset from the beginning of the result set\n\n## Low level API\n\n### `doc.flatten(space, tbl)`\n\nConverts document tbl to flat array, updating schema for space `space` as necessary.\n\n### `doc.unflatten(space, tbl)`\n\nConverts flat array tbl to a nested document, according to schema for space `space`.\n\n### `create_index(index_name, options)`\n\nBehaves similar to `box.space.create_index()`, but allows to specify string field names in addition to numeric in parts.\n\n### `field_key(space, field_name)`\n\nReturns integer key for field named `field_name` in a flattened document. If you need a key for nested documents, use dot notation, like: `\"foo.bar.id\"`.\n\n## Contacts\n\nThis module was initialy written by [Konstantin Nazarov](github.com/racktear).\n\nYou can reach out to him at [mail@kn.am](mailto:mail@kn.am).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftarantool%2Fdocument","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftarantool%2Fdocument","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftarantool%2Fdocument/lists"}