{"id":13533935,"url":"https://github.com/ranaroussi/pystore","last_synced_at":"2025-04-01T22:31:04.383Z","repository":{"id":39877251,"uuid":"134993131","full_name":"ranaroussi/pystore","owner":"ranaroussi","description":"Fast data store for Pandas time-series data","archived":false,"fork":false,"pushed_at":"2024-07-10T17:44:13.000Z","size":159,"stargazers_count":575,"open_issues_count":32,"forks_count":101,"subscribers_count":37,"default_branch":"main","last_synced_at":"2025-03-09T12:18:48.475Z","etag":null,"topics":["dask","database","dataframe","datastore","pandas","parquet","timeseries"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ranaroussi.png","metadata":{"files":{"readme":"README.rst","changelog":"CHANGELOG.rst","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":null,"patreon":"ranaroussi","open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":null}},"created_at":"2018-05-26T20:38:44.000Z","updated_at":"2025-03-03T17:45:01.000Z","dependencies_parsed_at":"2022-07-12T21:32:36.393Z","dependency_job_id":"00cf6486-7952-4998-af16-f90393f38e46","html_url":"https://github.com/ranaroussi/pystore","commit_stats":{"total_commits":195,"total_committers":7,"mean_commits":"27.857142857142858","dds":0.09743589743589742,"last_synced_commit":"f3e94d4bf2174743051f26da43030b02e5b997e8"},"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ranaroussi%2Fpystore","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ranaroussi%2Fpystore/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ranaroussi%2Fpystore/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ranaroussi%2Fpystore/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ranaroussi","download_url":"https://codeload.github.com/ranaroussi/pystore/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246720466,"owners_count":20822908,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dask","database","dataframe","datastore","pandas","parquet","timeseries"],"created_at":"2024-08-01T07:01:24.442Z","updated_at":"2025-04-01T22:31:04.095Z","avatar_url":"https://github.com/ranaroussi.png","language":"Python","funding_links":["https://patreon.com/ranaroussi"],"categories":["Databases"],"sub_categories":["TimeSeries Analysis"],"readme":"PyStore - Fast data store for Pandas timeseries data\n====================================================\n\n.. image:: https://img.shields.io/badge/python-2.7,%203.5+-blue.svg?style=flat\n    :target: https://pypi.python.org/pypi/pystore\n    :alt: Python version\n\n.. image:: https://img.shields.io/pypi/v/pystore.svg?maxAge=60\n    :target: https://pypi.python.org/pypi/pystore\n    :alt: PyPi version\n\n.. image:: https://img.shields.io/pypi/status/pystore.svg?maxAge=60\n    :target: https://pypi.python.org/pypi/pystore\n    :alt: PyPi status\n\n.. image:: https://img.shields.io/travis/ranaroussi/pystore/master.svg?maxAge=1\n    :target: https://travis-ci.com/ranaroussi/pystore\n    :alt: Travis-CI build status\n\n.. image:: https://www.codefactor.io/repository/github/ranaroussi/pystore/badge\n    :target: https://www.codefactor.io/repository/github/ranaroussi/pystore\n    :alt: CodeFactor\n\n.. image:: https://img.shields.io/github/stars/ranaroussi/pystore.svg?style=social\u0026label=Star\u0026maxAge=60\n    :target: https://github.com/ranaroussi/pystore\n    :alt: Star this repo\n\n.. image:: https://img.shields.io/twitter/follow/aroussi.svg?style=social\u0026label=Follow\u0026maxAge=60\n    :target: https://twitter.com/aroussi\n    :alt: Follow me on twitter\n\n\\\n\n\n`PyStore \u003chttps://github.com/ranaroussi/pystore\u003e`_ is a simple (yet powerful)\ndatastore for Pandas dataframes, and while it can store any Pandas object,\n**it was designed with storing timeseries data in mind**.\n\nIt's built on top of `Pandas \u003chttp://pandas.pydata.org\u003e`_, `Numpy \u003chttp://numpy.pydata.org\u003e`_,\n`Dask \u003chttp://dask.pydata.org\u003e`_, and `Parquet \u003chttp://parquet.apache.org\u003e`_\n(via `pyarrow \u003chttps://github.com/apache/arrow\u003e`_),\nto provide an easy to use datastore for Python developers that can easily\nquery millions of rows per second per client.\n\n\n==\u003e Check out `this Blog post \u003chttps://medium.com/@aroussi/fast-data-store-for-pandas-time-series-data-using-pystore-89d9caeef4e2\u003e`_\nfor the reasoning and philosophy behind PyStore, as well as a detailed tutorial with code examples.\n\n==\u003e Follow `this PyStore tutorial \u003chttps://github.com/ranaroussi/pystore/blob/master/examples/pystore-tutorial.ipynb\u003e`_ in Jupyter notebook format.\n\n\nQuickstart\n==========\n\nInstall PyStore\n---------------\n\nInstall using `pip`:\n\n.. code:: bash\n\n    $ pip install pystore --upgrade --no-cache-dir\n\nInstall using `conda`:\n\n.. code:: bash\n\n    $ conda install -c ranaroussi pystore\n\n**INSTALLATION NOTE:**\nIf you don't have Snappy installed (compression/decompression library), you'll need to\nyou'll need to `install it first \u003chttps://github.com/ranaroussi/pystore#dependencies\u003e`_.\n\n\nUsing PyStore\n-------------\n\n.. code:: python\n\n    #!/usr/bin/env python\n    # -*- coding: utf-8 -*-\n\n    import pystore\n    import quandl\n\n    # Set storage path (optional)\n    # Defaults to `~/pystore` or `PYSTORE_PATH` environment variable (if set)\n    pystore.set_path(\"~/pystore\")\n\n    # List stores\n    pystore.list_stores()\n\n    # Connect to datastore (create it if not exist)\n    store = pystore.store('mydatastore')\n\n    # List existing collections\n    store.list_collections()\n\n    # Access a collection (create it if not exist)\n    collection = store.collection('NASDAQ')\n\n    # List items in collection\n    collection.list_items()\n\n    # Load some data from Quandl\n    aapl = quandl.get(\"WIKI/AAPL\", authtoken=\"your token here\")\n\n    # Store the first 100 rows of the data in the collection under \"AAPL\"\n    collection.write('AAPL', aapl[:100], metadata={'source': 'Quandl'})\n\n    # Reading the item's data\n    item = collection.item('AAPL')\n    data = item.data  # \u003c-- Dask dataframe (see dask.pydata.org)\n    metadata = item.metadata\n    df = item.to_pandas()\n\n    # Append the rest of the rows to the \"AAPL\" item\n    collection.append('AAPL', aapl[100:])\n\n    # Reading the item's data\n    item = collection.item('AAPL')\n    data = item.data\n    metadata = item.metadata\n    df = item.to_pandas()\n\n\n    # --- Query functionality ---\n\n    # Query avaialable symbols based on metadata\n    collection.list_items(some_key='some_value', other_key='other_value')\n\n\n    # --- Snapshot functionality ---\n\n    # Snapshot a collection\n    # (Point-in-time named reference for all current symbols in a collection)\n    collection.create_snapshot('snapshot_name')\n\n    # List available snapshots\n    collection.list_snapshots()\n\n    # Get a version of a symbol given a snapshot name\n    collection.item('AAPL', snapshot='snapshot_name')\n\n    # Delete a collection snapshot\n    collection.delete_snapshot('snapshot_name')\n\n\n    # ...\n\n\n    # Delete the item from the current version\n    collection.delete_item('AAPL')\n\n    # Delete the collection\n    store.delete_collection('NASDAQ')\n\n\nUsing Dask schedulers\n---------------------\n\nPyStore 0.1.18+ supports using Dask distributed.\n\nTo use a local Dask scheduler, add this to your code:\n\n.. code:: python\n\n    from dask.distributed import LocalCluster\n    pystore.set_client(LocalCluster())\n\n\nTo use a distributed Dask scheduler, add this to your code:\n\n.. code:: python\n\n    pystore.set_client(\"tcp://xxx.xxx.xxx.xxx:xxxx\")\n    pystore.set_path(\"/path/to/shared/volume/all/workers/can/access\")\n\n\n\nConcepts\n========\n\nPyStore provides namespaced *collections* of data.\nThese collections allow bucketing data by *source*, *user* or some other metric\n(for example frequency: End-Of-Day; Minute Bars; etc.). Each collection (or namespace)\nmaps to a directory containing partitioned **parquet files** for each item (e.g. symbol).\n\nA good practice it to create collections that may look something like this:\n\n* collection.EOD\n* collection.ONEMINUTE\n\nRequirements\n============\n\n* Python 2.7 or Python \u003e 3.5\n* Pandas\n* Numpy\n* Dask\n* Pyarrow\n* `Snappy \u003chttp://google.github.io/snappy/\u003e`_ (Google's compression/decompression library)\n* multitasking\n\nPyStore was tested to work on \\*nix-like systems, including macOS.\n\n\nDependencies:\n-------------\n\nPyStore uses `Snappy \u003chttp://google.github.io/snappy/\u003e`_,\na fast and efficient compression/decompression library from Google.\nYou'll need to install Snappy on your system before installing PyStore.\n\n\\* See the ``python-snappy`` `Github repo \u003chttps://github.com/andrix/python-snappy#dependencies\u003e`_ for more information.\n\n***nix Systems:**\n\n- APT: ``sudo apt-get install libsnappy-dev``\n- RPM: ``sudo yum install libsnappy-devel``\n\n**macOS:**\n\nFirst, install Snappy's C library using `Homebrew \u003chttps://brew.sh\u003e`_:\n\n.. code::\n\n    $ brew install snappy\n\nThen, install Python's snappy using conda:\n\n.. code::\n\n    $ conda install python-snappy -c conda-forge\n\n...or, using `pip`:\n\n.. code::\n\n    $ CPPFLAGS=\"-I/usr/local/include -L/usr/local/lib\" pip install python-snappy\n\n\n**Windows:**\n\nWindows users should checkout `Snappy for Windows \u003chttps://snappy.machinezoo.com\u003e`_ and `this Stackoverflow post \u003chttps://stackoverflow.com/a/43756412/1783569\u003e`_ for help on installing Snappy and ``python-snappy``.\n\n\nRoadmap\n=======\n\nPyStore currently offers support for local filesystem (including attached network drives).\nI plan on adding support for Amazon S3 (via `s3fs \u003chttp://s3fs.readthedocs.io/\u003e`_),\nGoogle Cloud Storage (via `gcsfs \u003chttps://github.com/dask/gcsfs/\u003e`_)\nand Hadoop Distributed File System (via `hdfs3 \u003chttp://hdfs3.readthedocs.io/\u003e`_) in the future.\n\nAcknowledgements\n================\n\nPyStore is hugely inspired by `Man AHL \u003chttp://www.ahl.com/\u003e`_'s\n`Arctic \u003chttps://github.com/manahl/arctic\u003e`_ which uses\nMongoDB for storage and allow for versioning and other features.\nI highly reommend you check it out.\n\n\n\nLicense\n=======\n\n\nPyStore is licensed under the **Apache License, Version 2.0**. A copy of which is included in LICENSE.txt.\n\n-----\n\nI'm very interested in your experience with PyStore.\nPlease drop me an note with any feedback you have.\n\nContributions welcome!\n\n\\- **Ran Aroussi**\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Franaroussi%2Fpystore","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Franaroussi%2Fpystore","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Franaroussi%2Fpystore/lists"}