Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ranaroussi/pystore
Fast data store for Pandas time-series data
https://github.com/ranaroussi/pystore
dask database dataframe datastore pandas parquet timeseries
Last synced: 2 months ago
JSON representation
Fast data store for Pandas time-series data
- Host: GitHub
- URL: https://github.com/ranaroussi/pystore
- Owner: ranaroussi
- License: apache-2.0
- Created: 2018-05-26T20:38:44.000Z (over 6 years ago)
- Default Branch: main
- Last Pushed: 2024-07-10T17:44:13.000Z (6 months ago)
- Last Synced: 2024-10-01T11:40:24.765Z (3 months ago)
- Topics: dask, database, dataframe, datastore, pandas, parquet, timeseries
- Language: Python
- Homepage:
- Size: 155 KB
- Stars: 558
- Watchers: 37
- Forks: 100
- Open Issues: 30
-
Metadata Files:
- Readme: README.rst
- Changelog: CHANGELOG.rst
- Funding: .github/FUNDING.yml
- License: LICENSE.txt
Awesome Lists containing this project
- awesome-systematic-trading - PyStore - Fast data store for Pandas time-series data (Databases / TimeSeries Analysis)
README
PyStore - Fast data store for Pandas timeseries data
====================================================.. image:: https://img.shields.io/badge/python-2.7,%203.5+-blue.svg?style=flat
:target: https://pypi.python.org/pypi/pystore
:alt: Python version.. image:: https://img.shields.io/pypi/v/pystore.svg?maxAge=60
:target: https://pypi.python.org/pypi/pystore
:alt: PyPi version.. image:: https://img.shields.io/pypi/status/pystore.svg?maxAge=60
:target: https://pypi.python.org/pypi/pystore
:alt: PyPi status.. image:: https://img.shields.io/travis/ranaroussi/pystore/master.svg?maxAge=1
:target: https://travis-ci.com/ranaroussi/pystore
:alt: Travis-CI build status.. image:: https://www.codefactor.io/repository/github/ranaroussi/pystore/badge
:target: https://www.codefactor.io/repository/github/ranaroussi/pystore
:alt: CodeFactor.. image:: https://img.shields.io/github/stars/ranaroussi/pystore.svg?style=social&label=Star&maxAge=60
:target: https://github.com/ranaroussi/pystore
:alt: Star this repo.. image:: https://img.shields.io/twitter/follow/aroussi.svg?style=social&label=Follow&maxAge=60
:target: https://twitter.com/aroussi
:alt: Follow me on twitter\
`PyStore `_ is a simple (yet powerful)
datastore for Pandas dataframes, and while it can store any Pandas object,
**it was designed with storing timeseries data in mind**.It's built on top of `Pandas `_, `Numpy `_,
`Dask `_, and `Parquet `_
(via `pyarrow `_),
to provide an easy to use datastore for Python developers that can easily
query millions of rows per second per client.==> Check out `this Blog post `_
for the reasoning and philosophy behind PyStore, as well as a detailed tutorial with code examples.==> Follow `this PyStore tutorial `_ in Jupyter notebook format.
Quickstart
==========Install PyStore
---------------Install using `pip`:
.. code:: bash
$ pip install pystore --upgrade --no-cache-dir
Install using `conda`:
.. code:: bash
$ conda install -c ranaroussi pystore
**INSTALLATION NOTE:**
If you don't have Snappy installed (compression/decompression library), you'll need to
you'll need to `install it first `_.Using PyStore
-------------.. code:: python
#!/usr/bin/env python
# -*- coding: utf-8 -*-import pystore
import quandl# Set storage path (optional)
# Defaults to `~/pystore` or `PYSTORE_PATH` environment variable (if set)
pystore.set_path("~/pystore")# List stores
pystore.list_stores()# Connect to datastore (create it if not exist)
store = pystore.store('mydatastore')# List existing collections
store.list_collections()# Access a collection (create it if not exist)
collection = store.collection('NASDAQ')# List items in collection
collection.list_items()# Load some data from Quandl
aapl = quandl.get("WIKI/AAPL", authtoken="your token here")# Store the first 100 rows of the data in the collection under "AAPL"
collection.write('AAPL', aapl[:100], metadata={'source': 'Quandl'})# Reading the item's data
item = collection.item('AAPL')
data = item.data # <-- Dask dataframe (see dask.pydata.org)
metadata = item.metadata
df = item.to_pandas()# Append the rest of the rows to the "AAPL" item
collection.append('AAPL', aapl[100:])# Reading the item's data
item = collection.item('AAPL')
data = item.data
metadata = item.metadata
df = item.to_pandas()# --- Query functionality ---
# Query avaialable symbols based on metadata
collection.list_items(some_key='some_value', other_key='other_value')# --- Snapshot functionality ---
# Snapshot a collection
# (Point-in-time named reference for all current symbols in a collection)
collection.create_snapshot('snapshot_name')# List available snapshots
collection.list_snapshots()# Get a version of a symbol given a snapshot name
collection.item('AAPL', snapshot='snapshot_name')# Delete a collection snapshot
collection.delete_snapshot('snapshot_name')# ...
# Delete the item from the current version
collection.delete_item('AAPL')# Delete the collection
store.delete_collection('NASDAQ')Using Dask schedulers
---------------------PyStore 0.1.18+ supports using Dask distributed.
To use a local Dask scheduler, add this to your code:
.. code:: python
from dask.distributed import LocalCluster
pystore.set_client(LocalCluster())To use a distributed Dask scheduler, add this to your code:
.. code:: python
pystore.set_client("tcp://xxx.xxx.xxx.xxx:xxxx")
pystore.set_path("/path/to/shared/volume/all/workers/can/access")Concepts
========PyStore provides namespaced *collections* of data.
These collections allow bucketing data by *source*, *user* or some other metric
(for example frequency: End-Of-Day; Minute Bars; etc.). Each collection (or namespace)
maps to a directory containing partitioned **parquet files** for each item (e.g. symbol).A good practice it to create collections that may look something like this:
* collection.EOD
* collection.ONEMINUTERequirements
============* Python 2.7 or Python > 3.5
* Pandas
* Numpy
* Dask
* Pyarrow
* `Snappy `_ (Google's compression/decompression library)
* multitaskingPyStore was tested to work on \*nix-like systems, including macOS.
Dependencies:
-------------PyStore uses `Snappy `_,
a fast and efficient compression/decompression library from Google.
You'll need to install Snappy on your system before installing PyStore.\* See the ``python-snappy`` `Github repo `_ for more information.
***nix Systems:**
- APT: ``sudo apt-get install libsnappy-dev``
- RPM: ``sudo yum install libsnappy-devel``**macOS:**
First, install Snappy's C library using `Homebrew `_:
.. code::
$ brew install snappy
Then, install Python's snappy using conda:
.. code::
$ conda install python-snappy -c conda-forge
...or, using `pip`:
.. code::
$ CPPFLAGS="-I/usr/local/include -L/usr/local/lib" pip install python-snappy
**Windows:**
Windows users should checkout `Snappy for Windows `_ and `this Stackoverflow post `_ for help on installing Snappy and ``python-snappy``.
Roadmap
=======PyStore currently offers support for local filesystem (including attached network drives).
I plan on adding support for Amazon S3 (via `s3fs `_),
Google Cloud Storage (via `gcsfs `_)
and Hadoop Distributed File System (via `hdfs3 `_) in the future.Acknowledgements
================PyStore is hugely inspired by `Man AHL `_'s
`Arctic `_ which uses
MongoDB for storage and allow for versioning and other features.
I highly reommend you check it out.License
=======PyStore is licensed under the **Apache License, Version 2.0**. A copy of which is included in LICENSE.txt.
-----
I'm very interested in your experience with PyStore.
Please drop me an note with any feedback you have.Contributions welcome!
\- **Ran Aroussi**