Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/bretthoerner/timak

Timelines (activity streams) backed by Riak
https://github.com/bretthoerner/timak

Last synced: 16 days ago
JSON representation

Timelines (activity streams) backed by Riak

Host: GitHub
URL: https://github.com/bretthoerner/timak
Owner: bretthoerner
License: other
Created: 2011-08-08T15:36:14.000Z (over 13 years ago)
Default Branch: master
Last Pushed: 2011-09-14T20:37:44.000Z (about 13 years ago)
Last Synced: 2024-10-13T21:43:54.767Z (about 1 month ago)
Language: Python
Homepage:
Size: 116 KB
Stars: 55
Watchers: 5
Forks: 5
Open Issues: 0
Metadata Files:
- Readme: README.rst
- License: LICENSE

Awesome Lists containing this project

README

        =====

timak

=====

timak is a Python library for storing timelines (activity streams) in Riak. It is very alpha and rough around the edges.

It is loosely based on my understanding of Yammer's `Streamie `_.

Example

-------

Timelines are unique sets of objects (unique by the ID you provide) ordered by a datetime (that you also provide). They are bounded, so items fall off the end when a (user defined) capacity is reached.

    >>> from datetime import datetime

    >>> import riak

    >>> from timak.timelines import Timeline

    >>> conn = riak.RiakClient()

    >>> tl = Timeline(connection=conn, max_items=3)

    >>> # t1.add("key", "unique_id", "score")

    >>> tl.add("brett:tweets", 1, datetime(2011, 1, 1))

    [1]

    >>> tl.add("brett:tweets", 2, datetime(2011, 1, 2))

    [2, 1]

    >>> tl.add("brett:tweets", 3, datetime(2011, 1, 3))

    [3, 2, 1]

    >>> tl.add("brett:tweets", 4, datetime(2011, 1, 4))

    [4, 3, 2]

    >>> tl.delete("brett:tweets", 2, datetime(2011, 1, 2))

    [4, 3]

If you provide a ``datetime.datetime`` value to score Timak will automatically convert to a sortable score value.

As you can see the default order is descending by the date you provide, and the object IDs are returned by default. You can also provide an ``obj_data`` argument (must be JSON serializable) which will be returned instead.

   >>> tl.add("brett:tweets", 5, datetime(2011, 1, 5), obj_data={'body': 'Hello world, this is my first tweet'})

   [{'body': 'Hello world, this is my first tweet'}, 4, 3]

Why?

----

I needed *highly available*, *linearly scalable* timelines where readers and writers *don't block* one another. Because Riak is a Dynamo based system, multiple writers can update a single value and I can merge the conflicts on a later read. I can also add a machine to the cluster for more throughput, and since it's simply fetching denormalized timelines by key it should be incredibly performant.

So what? I could write this in...

---------------------------------

PostgreSQL or MySQL

```````````````````

This would be a very simple table in a RDBMS. It could even be boundless (though without some PLSQL hackery large ``OFFSETS`` are very expensive). You'd be hitting large indexes instead of fetching values directly by key. The biggest problem is it all has to fit on a single system, unless you manually shard the data (and re-shard if you ever grew out of that size). Plus you'd have to deal with availability using read slaves and failover.

MongoDB

```````

The only possible difference I see from the RDBMSs above is that you could use Mongo's "auto-sharding." If that's your thing, and you trust it, then I wish you the best of luck. You may want to `read this `_.

Redis

`````

You can fake timelines in Redis using a list or sorted set. Like RDBMS you have to handle all of the sharding yourself, re-shard on growth, and use slaves and failover for availability. In addition to these, and even more critical for my use case: all of your timelines would have to fit in RAM. If you have this problem and that kind of money please send me some.

Cassandra

`````````

Probably another great fit. You could even store much longer timelines, though I'm not sure what the cost is of doing a ``SELECT`` with ``OFFSET`` equivalent on the columns in a Cassandra row.

TODO

----

1. Add better API with cursors (last seen ``obj_date``?) for pagination.

2. Built-in Django support for update on ``post_save`` and ``post_delete``.

3. Compress values.