https://github.com/coleifer/kvkit
dank key/value store high-level APIs
https://github.com/coleifer/kvkit
Last synced: 9 months ago
JSON representation
dank key/value store high-level APIs
- Host: GitHub
- URL: https://github.com/coleifer/kvkit
- Owner: coleifer
- Created: 2017-07-31T15:28:37.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2018-03-09T16:27:01.000Z (almost 8 years ago)
- Last Synced: 2025-03-30T16:44:40.069Z (10 months ago)
- Language: Python
- Size: 35.2 KB
- Stars: 18
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# kvkit
High-level Python toolkit for ordered key/value stores.
Supports:
* [BerkeleyDB](http://www.oracle.com/technetwork/database/database-technologies/berkeleydb/downloads/index.html) via [bsddb3](https://www.jcea.es/programacion/pybsddb_doc/).
* [KyotoCabinet](http://fallabs.com/kyotocabinet/) via [Python 2.x bindings](http://fallabs.com/kyotocabinet/pythonlegacydoc/).
* [LevelDB](http://leveldb.org/) via [plyvel](https://plyvel.readthedocs.io/en/latest/).
* [RocksDB](http://rocksdb.org/) via [pyrocksdb](https://pyrocksdb.readthedocs.io/en/v0.4/)
* [Sqlite4 LSM DB](https://www.sqlite.org/src4/doc/trunk/www/lsmusr.wiki) via [python-lsm-db](https://lsm-db.readthedocs.io/en/latest/)
Right now KyotoCabinet is the most well-supported database, but the SQLite4 LSM is also pretty robust. The other databases implement the minimal slicing APIs to enable the Model/Secondary Indexing APIs to work.
This project should be considered **experimental**.
### Features
* Store structured data models.
* Secondary indexes and arbitrarily complex querying.
* Graph database (Hexastore) with search pipeline.
* High-level slicing APIs.
### Models
`kvkit` provides a lightweight structured data model API. Individual fields on the model can be optionally typed, and also support secondary indexes.
Field types:
* `Field()`: simplest field type, treated as raw bytes.
* `DateTimeField()`: store Python `datetime` objects.
* `DateField()`: store Python `date` objects.
* `LongField()`: store Python `int` and `long`. Values are encoded as an 8 byte `long long`, big-endian.
* `FloatField()`: store Python `float`. Values are encoded as an 8 byte double-precision float, big-endian.
A `Model` is composed of one or more fields, in addition to a required `id` field which stores an automatically-generated integer ID.
`Model` classes are defined declaratively, a-la many popular Python ORMs:
```python
# KyotoCabinet on-disk B-tree.
db = TreeDB('address_book.kct')
# Create a base model-class pointing at our db.
class BaseModel(Model):
class Meta:
database = db
class Contact(BaseModel):
last_name = Field(index=True)
first_name = Field(index=True)
class PhoneNumber(BaseModel):
contact_id = LongField(index=True)
phone_number = Field()
def get_contact(self):
return Contact.load(self.contact_id)
class Address(BaseModel):
contact_id = LongField(index=True)
street = Field()
city = Field(index=True)
state = Field(index=True)
postal_code = Field()
def get_contact(self):
return Contact.load(self.contact_id)
```
To create a new contact and add a phone number for them, we might write:
```python
huey = Contact.create(
last_name='Leifer',
first_name='Huey',
dob=datetime.date(2011, 5, 1))
phone = PhoneNumber.create(
contact_id=huey.id,
phone_number='555-1234')
```
Let's say we need to look up Huey's phone number(s). We might write:
```python
huey = Contact.get(Contact.first_name == 'Huey')
phones = PhoneNumber.query(PhoneNumber.contact_id == huey.id)
for phone in phones:
print phone.phone_number
```
If there were more than one person named "Huey" in our database, we could be more specific by specifying additional query clauses:
```python
huey_leifer = Contact.get(
(Contact.first_name == 'Huey') &
(Contact.last_name == 'Leifer'))
```
To query all contacts whose last name begins with "Le" we could write:
```python
Contact.query(Contact.last_name.startswith('Le'))
```
If we wanted to express a range, such as "Le" -> "Mo", we could write:
```python
Contact.query(
(Contact.last_name >= 'Le') &
(Contact.last_name <= 'Mo'))
```
Fields can be queried using the following operations:
* `==` for equality
* `<` and `<=`
* `>` and `>=`
* `!=` for inequality
* `.startswith()` for prefix search
Multiple clauses can be combined using set operations:
* `&` for AND (intersection)
* `|` for OR (union)
### Graph database (Hexastore)
The graph database is based on an idea described in the Redis [secondary indexing documentation](http://redis.io/topics/indexes#representing-and-querying-graphs-using-an-hexastore). The idea is that the database will store triples of `subject`, `predicate` and `object`. These can be any application-specific values. For example, I might want to store my friends and some information about them:
```python
db = CacheTreeDB() # KyotoCabinet in-memory B-tree
graph = Hexastore(db)
data = (
('charlie', 'friends', 'huey'),
('charlie', 'friends', 'mickey'),
('charlie', 'friends', 'zaizee'),
('huey', 'friends', 'charlie'),
('huey', 'friends', 'zaizee'),
('zaizee', 'friends', 'huey'),
('charlie', 'lives', 'KS'),
('huey', 'lives', 'KS'),
('mickey', 'lives', 'KS'),
('zaizee', 'lives', 'MO'),
)
graph.store_many(data)
```
To do a simple query asking who my friends are, I can write:
```python
for result in graph.query(s='charlie', p='friends'):
print result['o']
# prints huey, mickey, zaizee
```
I can also ask for other things, like all the people who live in Kansas:
```python
for result in graph.query(p='lives', o='KS'):
print result['s']
# prints charlie, huey, mickey
```
Things get especially interesting when you construct a pipeline using variables. Let's get all of my friends who live in Kansas:
```python
X = graph.v.X # Create a variable reference.
results = graph.search(
('charlie', 'friends', X),
(X, 'lives', 'KS'))
print results['X']
# prints set(['huey', 'mickey'])
```
In this query, we will use two variables, and answer the question "Who has friends who live in Missouri?"
```python
X = graph.v.X
Y = graph.v.Y
results = graph.search(
(X, 'lives', 'MO'),
(Y, 'friends', X))
print results['Y']
# prints set(['charlie', 'huey'])
# charlie and huey are friends with zaizee, who lives in MO.
```
### Unified Slicing API
`kvkit` provides unified indexing and slicing APIs. Slices obey the following rules:
* Inclusive of both endpoints.
* If the start key does not exist, the next-highest key will be used, if one exists.
* If the end key does not exist, the next-lowest key will be used, if one exists.
* Supports efficient iteration forwards or backwards.
```pycon
>>> from kvkit import CacheTreeDB # KyotoCabinet in-memory B-tree
>>> db = CacheTreeDB()
>>> # Populate some data.
>>> for key in ['aa', 'aa1', 'aa2', 'bb', 'cc', 'dd', 'ee']:
... db[key] = key
...
>>> list(db['aa':'cc'])
[('aa', 'aa'), ('aa1', 'aa1'), ('aa2', 'aa2'), ('bb', 'bb'), ('cc', 'cc')]
>>> list(db['aa0':'cc2']) # Example where start & end do not exist.
[('aa1', 'aa1'), ('aa2', 'aa2'), ('bb', 'bb'), ('cc', 'cc')]
```
In addition to slicing, all databases implement the following dictionary-like methods:
* `update()`
* `keys()`
* `values()`
* `items()`
* `__setitem__` and `__delitem__`
* `__iter__`
All databases also implement:
* `incr()`
* `decr()`
* `open()`
* `close()`
### Installation
`kvkit` can be installed from PyPI:
```console
$ pip install kvkit
```