https://github.com/ranaroussi/pystore

Fast data store for Pandas time-series data
https://github.com/ranaroussi/pystore

dask database dataframe datastore pandas parquet timeseries

Last synced: 3 months ago
JSON representation

Fast data store for Pandas time-series data

Host: GitHub
URL: https://github.com/ranaroussi/pystore
Owner: ranaroussi
License: apache-2.0
Created: 2018-05-26T20:38:44.000Z (about 7 years ago)
Default Branch: main
Last Pushed: 2024-07-10T17:44:13.000Z (about 1 year ago)
Last Synced: 2025-03-09T12:18:48.475Z (4 months ago)
Topics: dask, database, dataframe, datastore, pandas, parquet, timeseries
Language: Python
Homepage:
Size: 155 KB
Stars: 575
Watchers: 37
Forks: 101
Open Issues: 32
Metadata Files:
- Readme: README.rst
- Changelog: CHANGELOG.rst
- Funding: .github/FUNDING.yml
- License: LICENSE.txt

Awesome Lists containing this project

awesome-systematic-trading - PyStore - Fast data store for Pandas time-series data (Databases / TimeSeries Analysis)

README

        PyStore - Fast data store for Pandas timeseries data

====================================================

.. image:: https://img.shields.io/badge/python-2.7,%203.5+-blue.svg?style=flat

    :target: https://pypi.python.org/pypi/pystore

    :alt: Python version

.. image:: https://img.shields.io/pypi/v/pystore.svg?maxAge=60

    :target: https://pypi.python.org/pypi/pystore

    :alt: PyPi version

.. image:: https://img.shields.io/pypi/status/pystore.svg?maxAge=60

    :target: https://pypi.python.org/pypi/pystore

    :alt: PyPi status

.. image:: https://img.shields.io/travis/ranaroussi/pystore/master.svg?maxAge=1

    :target: https://travis-ci.com/ranaroussi/pystore

    :alt: Travis-CI build status

.. image:: https://www.codefactor.io/repository/github/ranaroussi/pystore/badge

    :target: https://www.codefactor.io/repository/github/ranaroussi/pystore

    :alt: CodeFactor

.. image:: https://img.shields.io/github/stars/ranaroussi/pystore.svg?style=social&label=Star&maxAge=60

    :target: https://github.com/ranaroussi/pystore

    :alt: Star this repo

.. image:: https://img.shields.io/twitter/follow/aroussi.svg?style=social&label=Follow&maxAge=60

    :target: https://twitter.com/aroussi

    :alt: Follow me on twitter

\

`PyStore `_ is a simple (yet powerful)

datastore for Pandas dataframes, and while it can store any Pandas object,

**it was designed with storing timeseries data in mind**.

It's built on top of `Pandas `_, `Numpy `_,

`Dask `_, and `Parquet `_

(via `pyarrow `_),

to provide an easy to use datastore for Python developers that can easily

query millions of rows per second per client.

==> Check out `this Blog post `_

for the reasoning and philosophy behind PyStore, as well as a detailed tutorial with code examples.

==> Follow `this PyStore tutorial `_ in Jupyter notebook format.

Quickstart

==========

Install PyStore

---------------

Install using `pip`:

.. code:: bash

    $ pip install pystore --upgrade --no-cache-dir

Install using `conda`:

.. code:: bash

    $ conda install -c ranaroussi pystore

**INSTALLATION NOTE:**

If you don't have Snappy installed (compression/decompression library), you'll need to

you'll need to `install it first `_.

Using PyStore

-------------

.. code:: python

    #!/usr/bin/env python

    # -*- coding: utf-8 -*-

    import pystore

    import quandl

    # Set storage path (optional)

    # Defaults to `~/pystore` or `PYSTORE_PATH` environment variable (if set)

    pystore.set_path("~/pystore")

    # List stores

    pystore.list_stores()

    # Connect to datastore (create it if not exist)

    store = pystore.store('mydatastore')

    # List existing collections

    store.list_collections()

    # Access a collection (create it if not exist)

    collection = store.collection('NASDAQ')

    # List items in collection

    collection.list_items()

    # Load some data from Quandl

    aapl = quandl.get("WIKI/AAPL", authtoken="your token here")

    # Store the first 100 rows of the data in the collection under "AAPL"

    collection.write('AAPL', aapl[:100], metadata={'source': 'Quandl'})

    # Reading the item's data

    item = collection.item('AAPL')

    data = item.data  # <-- Dask dataframe (see dask.pydata.org)

    metadata = item.metadata

    df = item.to_pandas()

    # Append the rest of the rows to the "AAPL" item

    collection.append('AAPL', aapl[100:])

    # Reading the item's data

    item = collection.item('AAPL')

    data = item.data

    metadata = item.metadata

    df = item.to_pandas()

    # --- Query functionality ---

    # Query avaialable symbols based on metadata

    collection.list_items(some_key='some_value', other_key='other_value')

    # --- Snapshot functionality ---

    # Snapshot a collection

    # (Point-in-time named reference for all current symbols in a collection)

    collection.create_snapshot('snapshot_name')

    # List available snapshots

    collection.list_snapshots()

    # Get a version of a symbol given a snapshot name

    collection.item('AAPL', snapshot='snapshot_name')

    # Delete a collection snapshot

    collection.delete_snapshot('snapshot_name')

    # ...

    # Delete the item from the current version

    collection.delete_item('AAPL')

    # Delete the collection

    store.delete_collection('NASDAQ')

Using Dask schedulers

---------------------

PyStore 0.1.18+ supports using Dask distributed.

To use a local Dask scheduler, add this to your code:

.. code:: python

    from dask.distributed import LocalCluster

    pystore.set_client(LocalCluster())

To use a distributed Dask scheduler, add this to your code:

.. code:: python

    pystore.set_client("tcp://xxx.xxx.xxx.xxx:xxxx")

    pystore.set_path("/path/to/shared/volume/all/workers/can/access")

Concepts

========

PyStore provides namespaced *collections* of data.

These collections allow bucketing data by *source*, *user* or some other metric

(for example frequency: End-Of-Day; Minute Bars; etc.). Each collection (or namespace)

maps to a directory containing partitioned **parquet files** for each item (e.g. symbol).

A good practice it to create collections that may look something like this:

* collection.EOD

* collection.ONEMINUTE

Requirements

============

* Python 2.7 or Python > 3.5

* Pandas

* Numpy

* Dask

* Pyarrow

* `Snappy `_ (Google's compression/decompression library)

* multitasking

PyStore was tested to work on \*nix-like systems, including macOS.

Dependencies:

-------------

PyStore uses `Snappy `_,

a fast and efficient compression/decompression library from Google.

You'll need to install Snappy on your system before installing PyStore.

\* See the ``python-snappy`` `Github repo `_ for more information.

***nix Systems:**

- APT: ``sudo apt-get install libsnappy-dev``

- RPM: ``sudo yum install libsnappy-devel``

**macOS:**

First, install Snappy's C library using `Homebrew `_:

.. code::

    $ brew install snappy

Then, install Python's snappy using conda:

.. code::

    $ conda install python-snappy -c conda-forge

...or, using `pip`:

.. code::

    $ CPPFLAGS="-I/usr/local/include -L/usr/local/lib" pip install python-snappy

**Windows:**

Windows users should checkout `Snappy for Windows `_ and `this Stackoverflow post `_ for help on installing Snappy and ``python-snappy``.

Roadmap

=======

PyStore currently offers support for local filesystem (including attached network drives).

I plan on adding support for Amazon S3 (via `s3fs `_),

Google Cloud Storage (via `gcsfs `_)

and Hadoop Distributed File System (via `hdfs3 `_) in the future.

Acknowledgements

================

PyStore is hugely inspired by `Man AHL `_'s

`Arctic `_ which uses

MongoDB for storage and allow for versioning and other features.

I highly reommend you check it out.

License

=======

PyStore is licensed under the **Apache License, Version 2.0**. A copy of which is included in LICENSE.txt.

-----

I'm very interested in your experience with PyStore.

Please drop me an note with any feedback you have.

Contributions welcome!

\- **Ran Aroussi**

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ranaroussi/pystore

Awesome Lists containing this project

README