{"id":13699465,"url":"https://github.com/coleifer/sophy","last_synced_at":"2025-04-07T12:08:46.755Z","repository":{"id":66041659,"uuid":"48258599","full_name":"coleifer/sophy","owner":"coleifer","description":"Fast Python bindings to Sophia Database","archived":false,"fork":false,"pushed_at":"2024-10-15T14:59:10.000Z","size":691,"stargazers_count":80,"open_issues_count":0,"forks_count":7,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-03-30T16:44:40.133Z","etag":null,"topics":["cython","embedded-database","nosql","python","sophia"],"latest_commit_sha":null,"homepage":"http://sophy.readthedocs.io/en/latest/","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/coleifer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-12-18T22:07:15.000Z","updated_at":"2025-02-16T12:47:47.000Z","dependencies_parsed_at":null,"dependency_job_id":"11248195-ebbb-4af1-8158-c7a988813f3f","html_url":"https://github.com/coleifer/sophy","commit_stats":{"total_commits":143,"total_committers":4,"mean_commits":35.75,"dds":"0.12587412587412583","last_synced_commit":"bb27135aecc891887e2e92aaa4e7c6dc1b67dc40"},"previous_names":[],"tags_count":26,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coleifer%2Fsophy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coleifer%2Fsophy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coleifer%2Fsophy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coleifer%2Fsophy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/coleifer","download_url":"https://codeload.github.com/coleifer/sophy/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247648978,"owners_count":20972945,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cython","embedded-database","nosql","python","sophia"],"created_at":"2024-08-02T20:00:33.908Z","updated_at":"2025-04-07T12:08:46.733Z","avatar_url":"https://github.com/coleifer.png","language":"C","readme":"\u003ca href=\"http://sophia.systems/\"\u003e\u003cimg src=\"http://media.charlesleifer.com/blog/photos/sophia-logo.png\" width=\"215px\" height=\"95px\" /\u003e\u003c/a\u003e\n\n[sophy](http://sophy.readthedocs.io/en/latest/), fast Python bindings for\n[Sophia embedded database](http://sophia.systems), v2.2.\n\n\u003ca href=\"https://travis-ci.org/coleifer/sophy\"\u003e\u003cimg src=\"https://api.travis-ci.org/coleifer/sophy.svg?branch=master\" /\u003e\u003c/a\u003e\n\n#### About sophy\n\n* Written in Cython for speed and low-overhead\n* Clean, memorable APIs\n* Extensive support for Sophia's features\n* Python 2 **and** Python 3 support\n* No 3rd-party dependencies besides Cython\n* [Documentation on readthedocs](http://sophy.readthedocs.io/en/latest/)\n\n#### About Sophia\n\n* Ordered key/value store\n* Keys and values can be composed of multiple fieldsdata-types\n* ACID transactions\n* MVCC, optimistic, non-blocking concurrency with multiple readers and writers.\n* Multiple databases per environment\n* Multiple- and single-statement transactions across databases\n* Prefix searches\n* Automatic garbage collection and key expiration\n* Hot backup\n* Compression\n* Multi-threaded compaction\n* `mmap` support, direct I/O support\n* APIs for variety of statistics on storage engine internals\n* BSD licensed\n\n#### Some ideas of where Sophia might be a good fit\n\n* Running on application servers, low-latency / high-throughput\n* Time-series\n* Analytics / Events / Logging\n* Full-text search\n* Secondary-index for external data-store\n\n#### Limitations\n\n* Not tested on Windoze.\n\nIf you encounter any bugs in the library, please [open an issue](https://github.com/coleifer/sophy/issues/new), including a description of the bug and any related traceback.\n\n## Installation\n\nThe [sophia](http://sophia.systems) sources are bundled with the `sophy` source\ncode, so the only thing you need to install is [Cython](http://cython.org). You\ncan install from [GitHub](https://github.com/coleifer/sophy) or from\n[PyPI](https://pypi.python.org/pypi/sophy/).\n\nPip instructions:\n\n```console\n$ pip install Cython\n$ pip install sophy\n```\n\nOr to install the latest code from master:\n\n```console\n$ pip install -e git+https://github.com/coleifer/sophy#egg=sophy\n```\n\nGit instructions:\n\n```console\n$ pip install Cython\n$ git clone https://github.com/coleifer/sophy\n$ cd sophy\n$ python setup.py build\n$ python setup.py install\n```\n\nTo run the tests:\n\n```console\n$ python tests.py\n```\n\n![](http://media.charlesleifer.com/blog/photos/sophy-logo.png)\n\n---------------------------------------------\n\n## Overview\n\nSophy is very simple to use. It acts like a Python `dict` object, but in\naddition to normal dictionary operations, you can read slices of data that are\nreturned efficiently using cursors. Similarly, bulk writes using `update()` use\nan efficient, atomic batch operation.\n\nDespite the simple APIs, Sophia has quite a few advanced features. There is too\nmuch to cover everything in this document, so be sure to check out the official\n[Sophia storage engine documentation](http://sophia.systems/v2.2/).\n\nThe next section will show how to perform common actions with `sophy`.\n\n## Using Sophy\n\nLet's begin by import `sophy` and creating an environment. The environment\ncan host multiple databases, each of which may have a different schema. In this\nexample our database will store arbitrary binary data as the key and value.\nFinally we'll open the environment so we can start storing and retrieving data.\n\n```python\nfrom sophy import Sophia, Schema, StringIndex\n\n# Instantiate our environment by passing a directory path which will store the\n# various data and metadata for our databases.\nenv = Sophia('/path/to/store/data')\n\n# We'll define a very simple schema consisting of a single utf-8 string for the\n# key, and a single utf-8 string for the associated value.\nschema = Schema(key_parts=[StringIndex('key')],\n                value_parts=[StringIndex('value')])\n\n# Create a key/value database using the schema above.\ndb = env.add_database('example_db', schema)\n\nif not env.open():\n    raise Exception('Unable to open Sophia environment.')\n```\n\n### CRUD operations\n\nSophy databases use the familiar `dict` APIs for CRUD operations:\n\n```python\n\ndb['name'] = 'Huey'\ndb['animal_type'] = 'cat'\nprint db['name'], 'is a', db['animal_type']  # Huey is a cat\n\n'name' in db  # True\n'color' in db  # False\n\ndb['temp_val'] = 'foo'\ndel db['temp_val']\nprint db['temp_val']  # raises a KeyError.\n```\n\nUse `update()` for bulk-insert, and `multi_get()` for bulk-fetch. Unlike\n`__getitem__()`, calling `multi_get()` with a non-existant key will not raise\nan exception and return `None` instead.\n\n```python\ndb.update(k1='v1', k2='v2', k3='v3')\n\nfor value in db.multi_get('k1', 'k3', 'kx'):\n    print value\n# v1\n# v3\n# None\n\nresult_dict = db.multi_get_dict(['k1', 'k3', 'kx'])\n# {'k1': 'v1', 'k3': 'v3'}\n```\n\n### Other dictionary methods\n\nSophy databases also provides efficient implementations for  `keys()`,\n`values()` and `items()`. Unlike dictionaries, however, iterating directly over\na Sophy database will return the equivalent of the `items()` (as opposed to the\njust the keys):\n\n```python\n\ndb.update(k1='v1', k2='v2', k3='v3')\n\nlist(db)\n# [('k1', 'v1'), ('k2', 'v2'), ('k3', 'v3')]\n\n\ndb.items()\n# same as above.\n\n\ndb.keys()\n# ['k1', 'k2', 'k3']\n\n\ndb.values()\n# ['v1', 'v2', 'v3']\n```\n\nThere are two ways to get the count of items in a database. You can use the\n`len()` function, which is not very efficient since it must allocate a cursor\nand iterate through the full database. An alternative is the `index_count`\nproperty, which may not be exact as it includes transactional duplicates and\nnot-yet-merged duplicates.\n\n```python\n\nprint(len(db))\n# 4\n\nprint(db.index_count)\n# 4\n```\n\n### Fetching ranges\n\nBecause Sophia is an ordered data-store, performing ordered range scans is\nefficient. To retrieve a range of key-value pairs with Sophy, use the ordinary\ndictionary lookup with a `slice` instead.\n\n```python\n\ndb.update(k1='v1', k2='v2', k3='v3', k4='v4')\n\n\n# Slice key-ranges are inclusive:\ndb['k1':'k3']\n# [('k1', 'v1'), ('k2', 'v2'), ('k3', 'v3')]\n\n\n# Inexact matches are fine, too:\ndb['k1.1':'k3.1']\n# [('k2', 'v2'), ('k3', 'v3')]\n\n\n# Leave the start or end empty to retrieve from the first/to the last key:\ndb[:'k2']\n# [('k1', 'v1'), ('k2', 'v2')]\n\ndb['k3':]\n# [('k3', 'v3'), ('k4', 'v4')]\n\n\n# To retrieve a range in reverse order, use the higher key first:\ndb['k3':'k1']\n# [('k3', 'v3'), ('k2', 'v2'), ('k1', 'v1')]\n```\n\nTo retrieve a range in reverse order where the start or end is unspecified, you\ncan pass in `True` as the `step` value of the slice to also indicate reverse:\n\n```python\n\ndb[:'k2':True]\n# [('k2', 'k1'), ('k1', 'v1')]\n\ndb['k3'::True]\n# [('k4', 'v4'), ('k3', 'v3')]\n\ndb[::True]\n# [('k4', 'v4'), ('k3', 'v3'), ('k2', 'v2'), ('k1', 'v1')]\n```\n\n### Cursors\n\nFor finer-grained control over iteration, or to do prefix-matching, Sophy\nprovides a cursor interface.\n\nThe `cursor()` method accepts 5 parameters:\n\n* `order` (default=`\u003e=`) -- semantics for matching the start key and ordering\n  results.\n* `key` -- the start key\n* `prefix` -- search for prefix matches\n* `keys` -- (default=`True`) -- return keys while iterating\n* `values` -- (default=`True`) -- return values while iterating\n\nSuppose we were storing events in a database and were using an\nISO-8601-formatted date-time as the key. Since ISO-8601 sorts\nlexicographically, we could retrieve events in correct order simply by\niterating. To retrieve a particular slice of time, a prefix could be specified:\n\n```python\n\n# Iterate over events for July, 2017:\nfor timestamp, event_data in db.cursor(key='2017-07-01T00:00:00',\n                                       prefix='2017-07-'):\n    do_something()\n```\n\n### Transactions\n\nSophia supports ACID transactions. Even better, a single transaction can cover\noperations to multiple databases in a given environment.\n\nExample usage:\n\n```python\n\naccount_balance = env.add_database('balance', ...)\ntransaction_log = env.add_database('transaction_log', ...)\n\n# ...\n\ndef transfer_funds(from_acct, to_acct, amount):\n    with env.transaction() as txn:\n        # To write to a database within a transaction, obtain a reference to\n        # a wrapper object for the db:\n        txn_acct_bal = txn[account_balance]\n        txn_log = txn[transaction_log]\n\n        # Transfer the asset by updating the respective balances. Note that we\n        # are operating on the wrapper database, not the db instance.\n        from_bal = txn_acct_bal[from_acct]\n        txn_acct_bal[to_account] = from_bal + amount\n        txn_acct_bal[from_account] = from_bal - amount\n\n        # Log the transaction in the transaction_log database. Again, we use\n        # the wrapper for the database:\n        txn_log[from_account, to_account, get_timestamp()] = amount\n```\n\nMultiple transactions are allowed to be open at the same time, but if there are\nconflicting changes, an exception will be thrown when attempting to commit the\noffending transaction:\n\n```python\n\n# Create a basic k/v store. Schema.key_value() is a convenience/factory-method.\nkv = env.add_database('main', Schema.key_value())\n\n# ...\n\n# Instead of using the context manager, we'll call begin() explicitly so we\n# can show the interaction of 2 open transactions.\ntxn = env.transaction().begin()\n\nt_kv = txn[kv]\nt_kv['k1'] = 'v1'\n\ntxn2 = env.transaction().begin()\nt2_kv = txn2[kv]\n\nt2_kv['k1'] = 'v1-x'\n\ntxn2.commit()  # ERROR !!\n# SophiaError('txn is not finished, waiting for concurrent txn to finish.')\n\ntxn.commit()  # OK\n\n# Try again?\ntxn2.commit()  # ERROR !!\n# SophiaError('transasction rolled back by another concurrent transaction.')\n```\n\n## Index types, multi-field keys and values\n\nSophia supports multi-field keys and values. Additionally, the individual\nfields can have different data-types. Sophy provides the following field\ntypes:\n\n* `StringIndex` - stores UTF8-encoded strings, e.g. text.\n* `BytesIndex` - stores bytestrings, e.g. binary data.\n* `JsonIndex` - stores arbitrary objects as UTF8-encoded JSON data.\n* `MsgPackIndex` - stores arbitrary objects using `msgpack` serialization.\n* `PickleIndex` - stores arbitrary objects using Python `pickle` library.\n* `UUIDIndex` - stores UUIDs.\n* `U64Index` and reversed, `U64RevIndex`\n* `U32Index` and reversed, `U32RevIndex`\n* `U16Index` and reversed, `U16RevIndex`\n* `U8Index` and reversed, `U8RevIndex`\n* `SerializedIndex` - which is basically a `BytesIndex` that accepts two\n  functions: one for serializing the value to the db, and another for\n  deserializing.\n\nTo store arbitrary data encoded using msgpack, you could use `MsgPackIndex`:\n\n```python\n\nschema = Schema(StringIndex('key'), MsgPackIndex('value'))\ndb = sophia_env.add_database('main', schema)\n```\n\nTo declare a database with a multi-field key or value, you will pass the\nindividual fields as arguments when constructing the `Schema` object. To\ninitialize a schema where the key is composed of two strings and a 64-bit\nunsigned integer, and the value is composed of a string, you would write:\n\n```python\n\nkey = [StringIndex('last_name'), StringIndex('first_name'), U64Index('area_code')]\nvalue = [StringIndex('address_data')]\nschema = Schema(key_parts=key, value_parts=value)\n\naddress_book = sophia_env.add_database('address_book', schema)\n```\n\nTo store data, we use the same dictionary methods as usual, just passing tuples\ninstead of individual values:\n\n```python\nsophia_env.open()\n\naddress_book['kitty', 'huey', 66604] = '123 Meow St'\naddress_book['puppy', 'mickey', 66604] = '1337 Woof-woof Court'\n```\n\nTo retrieve our data:\n\n```python\nhuey_address = address_book['kitty', 'huey', 66604]\n```\n\nTo delete a row:\n\n```python\ndel address_book['puppy', 'mickey', 66604]\n```\n\nIndexing and slicing works as you would expect.\n\n**Note:** when working with a multi-part value, a tuple containing the value\ncomponents will be returned. When working with a scalar value, instead of\nreturning a 1-item tuple, the value itself is returned.\n\n## Configuring and Administering Sophia\n\nSophia can be configured using special properties on the `Sophia` and\n`Database` objects. Refer to the [configuration\ndocument](http://sophia.systems/v2.2/conf/sophia.html) for the details on the\navailable options, including whether they are read-only, and the expected\ndata-type.\n\nFor example, to query Sophia's status, you can use the `status` property, which\nis a readonly setting returning a string:\n\n```python\nprint(env.status)\n\"online\"\n```\n\nOther properties can be changed by assigning a new value to the property. For\nexample, to read and then increase the number of threads used by the scheduler:\n\n```python\nnthreads = env.scheduler_threads\nenv.scheduler_threads = nthread + 2\n```\n\nDatabase-specific properties are available as well. For example to get the\nnumber of GET and SET operations performed on a database, you would write:\n\n```python\nprint(db.stat_get, 'get operations')\nprint(db.stat_set, 'set operations')\n```\n\nRefer to the [documentation](http://sophia.systems/v2.2/conf/sophia.html) for\ncomplete lists of settings. Dotted-paths are translated into\nunderscore-separated attributes.\n","funding_links":[],"categories":["C"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoleifer%2Fsophy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcoleifer%2Fsophy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoleifer%2Fsophy/lists"}