https://github.com/bluedynamics/souper

Generic Indexed Storage based the Zope Object Database (ZODB) and repoze.catalog
https://github.com/bluedynamics/souper

Last synced: 8 months ago
JSON representation

Generic Indexed Storage based the Zope Object Database (ZODB) and repoze.catalog

Host: GitHub
URL: https://github.com/bluedynamics/souper
Owner: bluedynamics
License: other
Created: 2012-06-29T18:24:35.000Z (almost 14 years ago)
Default Branch: master
Last Pushed: 2022-12-05T11:56:51.000Z (over 3 years ago)
Last Synced: 2025-09-24T20:51:53.025Z (9 months ago)
Language: Python
Homepage: https://pypi.org/project/souper/
Size: 64.5 KB
Stars: 5
Watchers: 6
Forks: 4
Open Issues: 2
Metadata Files:
- Readme: README.rst
- Changelog: CHANGES.rst
- License: LICENSE.rst

Awesome Lists containing this project

README

          
.. image:: https://travis-ci.org/bluedynamics/souper.svg?branch=master

    :target: https://travis-ci.org/bluedynamics/souper

ZODB Storage for lots of (light weight) data.

Utilizes:

- `ZODB `_ and its `BTrees `_,

- `node `_ (and `node.ext.zodb `_).

- `repoze.catalog `_,

.. image:: https://raw.githubusercontent.com/bluedynamics/souper/master/docs/Souper-64.png

Souper is a tool for programmers. It offers an integrated storage tied together with indexes in a catalog.

The records in the storage are generic.

It is possible to store any data on a record if it is persistent pickable in ZODB.

Souper can be used used in any Python application, either standalone using the pure ZODB or with `Pyramid `_, `Zope `_ or `Plone `_.

Using Souper

============

Providing a Locator

-------------------

Soups are looked up by adapting ``souper.interfaces.IStorageLocator`` to some context.

Souper does not provide any default locator.

So first one need to be provided. Let's assume context is some persistent dict-like instance

.. code-block:: pycon

    >>> from zope.interface import implementer

    >>> from zope.interface import Interface

    >>> from zope.component import provideAdapter

    >>> from souper.interfaces import IStorageLocator

    >>> from souper.soup import SoupData

    >>> @implementer(IStorageLocator)

    ... class StorageLocator(object):

    ...

    ...     def __init__(self, context):

    ...        self.context = context

    ...

    ...     def storage(self, soup_name):

    ...        if soup_name not in self.context:

    ...            self.context[soup_name] = SoupData()

    ...        return self.context[soup_name]

    >>> provideAdapter(StorageLocator, adapts=[Interface])

So we have locator creating soups by name on the fly. Now its easy to get a soup by name:

.. code-block:: pycon

    >>> from souper.soup import get_soup

    >>> soup = get_soup('mysoup', context)

    >>> soup

    

Providing a Catalog Factory

---------------------------

Depending on your needs the catalog and its indexes may look different from use-case to use-case.

The catalog factory is responsible to create a catalog for a soup. The factory is a named utility implementing ``souper.interfaces.ICatalogFactory``.

The name of the utility has to the the same as the soup have.

Here ``repoze.catalog`` is used and to let the indexes access the data on the records by key the ``NodeAttributeIndexer`` is used.

For special cases one may write its custom indexers, but the default one is fine most of the time:

.. code-block:: pycon

    >>> from souper.interfaces import ICatalogFactory

    >>> from souper.soup import NodeAttributeIndexer

    >>> from souper.soup import NodeTextIndexer

    >>> from zope.component import provideUtility

    >>> from repoze.catalog.catalog import Catalog

    >>> from repoze.catalog.indexes.field import CatalogFieldIndex

    >>> from repoze.catalog.indexes.text import CatalogTextIndex

    >>> from repoze.catalog.indexes.keyword import CatalogKeywordIndex

    >>> @implementer(ICatalogFactory)

    ... class MySoupCatalogFactory(object):

    ...

    ...     def __call__(self, context=None):

    ...         catalog = Catalog()

    ...         userindexer = NodeAttributeIndexer('user')

    ...         catalog[u'user'] = CatalogFieldIndex(userindexer)

    ...         textindexer = NodeTextIndexer(['text', 'user')

    ...         catalog[u'text'] = CatalogTextIndex(textindexer)

    ...         keywordindexer = NodeAttributeIndexer('keywords')

    ...         catalog[u'keywords'] = CatalogKeywordIndex(keywordindexer)

    ...         return catalog

    >>> provideUtility(MySoupCatalogFactory(), name="mysoup")

The catalog factory is used soup-internal only but one may want to check if it works fine:

.. code-block:: pycon

    >>> catalogfactory = getUtility(ICatalogFactory, name='mysoup')

    >>> catalogfactory

    

    >>> catalog = catalogfactory()

    >>> sorted(catalog.items())

    [(u'keywords', ),

    (u'text', ),

    (u'user', )]

Adding records

--------------

As mentioned above the ``souper.soup.Record`` is the one and only kind of data added to the soup.

A record has attributes containing the data:

.. code-block:: pycon

    >>> from souper.soup import get_soup

    >>> from souper.soup import Record

    >>> soup = get_soup('mysoup', context)

    >>> record = Record()

    >>> record.attrs['user'] = 'user1'

    >>> record.attrs['text'] = u'foo bar baz'

    >>> record.attrs['keywords'] = [u'1', u'2', u'ü']

    >>> record_id = soup.add(record)

A record may contains other records. But to index them one would need a custom indexer.

So, usually contained records are valuable for later display, not for searching:

.. code-block:: pycon

    >>> record['subrecord'] = Record()

    >>> record['homeaddress'].attrs['zip'] = '6020'

    >>> record['homeaddress'].attrs['town'] = 'Innsbruck'

    >>> record['homeaddress'].attrs['country'] = 'Austria'

Access data

-----------

Even without any query a record can be fetched by id:

.. code-block:: pycon

    >>> from souper.soup import get_soup

    >>> soup = get_soup('mysoup', context)

    >>> record = soup.get(record_id)

All records can be accessed using utilizing the container BTree:

.. code-block:: pycon

    >>> soup.data.keys()[0] == record_id

    True

Query data

----------

`How to query a repoze catalog is documented well. `_

Sorting works the same too.

Queries are passed to soups ``query`` method (which uses then repoze catalog).

It returns a generator:

.. code-block:: pycon

    >>> from repoze.catalog.query import Eq

    >>> [r for r in soup.query(Eq('user', 'user1'))]

    []

    >>> [r for r in soup.query(Eq('user', 'nonexists'))]

    []

To also get the size of the result set pass a ``with_size=True`` to the query.

The first item returned by the generator is the size:

.. code-block:: pycon

    >>> [r for r in soup.query(Eq('user', 'user1'), with_size-True)]

    [1, ]

To optimize handling of large result sets one may not to fetch the record but a generator returning light weight objects. Records are fetched on call:

.. code-block:: pycon

    >>> lazy = [l for l in soup.lazy(Eq('name', 'name'))]

    >>> lazy

    [,

    >>> lazy[0]()

    

Here the size is passed as first value of the geneartor too if ``with_size=True`` is passed.

Delete a record

---------------

To remove a record from the soup python ``del`` is used like one would do on

any dict:

.. code-block:: pycon

    >>> del soup[record]

Reindex

-------

After a records data changed it needs a reindex:

.. code-block:: pycon

    >>> record.attrs['user'] = 'user1'

    >>> soup.reindex(records=[record])

Sometimes one may want to reindex all data. Then ``reindex`` has to be called without parameters.

It may take a while:

.. code-block:: pycon

    >>> soup.reindex()

Rebuild catalog

---------------

Usally after a change of the catalog factory was made - i.e. some index was added - a rebuild of the catalog i needed.

It replaces the current catalog with a new one created by the catalog factory and reindexes all data.

It may take while:

.. code-block:: pycon

    >>> soup.rebuild()

Reset (or clear) the soup

-------------------------

To remove all data from the soup and empty and rebuild the catalog call ``clear``.

**Attention**: *All data is lost!*

.. code-block:: pycon

    >>> soup.clear()

Source Code

===========

The sources are in a GIT DVCS with its main branches at `github `_.

We'd be happy to see many forks and pull-requests to make souper even better.

Contributors

============

- Robert Niederreiter 

- Jens W. Klein

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bluedynamics/souper

Awesome Lists containing this project

README