Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jmaces/statstream
Statistics for Streaming Data
https://github.com/jmaces/statstream
data-science numpy statistics streaming-data
Last synced: 3 months ago
JSON representation
Statistics for Streaming Data
- Host: GitHub
- URL: https://github.com/jmaces/statstream
- Owner: jmaces
- License: mit
- Created: 2019-11-01T14:16:37.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2022-08-17T19:15:16.000Z (over 2 years ago)
- Last Synced: 2024-10-09T12:57:04.155Z (3 months ago)
- Topics: data-science, numpy, statistics, streaming-data
- Language: Python
- Size: 69.3 KB
- Stars: 9
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.rst
- Changelog: CHANGELOG.rst
- Contributing: .github/CONTRIBUTING.rst
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.rst
Awesome Lists containing this project
README
=============================================
``statstream``: Statistics for Streaming Data
=============================================.. add project badges here
.. image:: https://readthedocs.org/projects/statstream/badge/?version=latest
:target: https://statstream.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status.. image:: https://github.com/jmaces/statstream/actions/workflows/pr-check.yml/badge.svg?branch=master
:target: https://github.com/jmaces/statstream/actions/workflows/pr-check.yml?branch=master
:alt: CI Status.. image:: https://codecov.io/gh/jmaces/statstream/branch/master/graph/badge.svg
:target: https://codecov.io/gh/jmaces/statstream
:alt: Code Coverage.. image:: https://img.shields.io/badge/code%20style-black-000000.svg
:target: https://github.com/psf/black
:alt: Code Style: Black.. teaser-start
``statstream`` is a lightweight Python package providing data analysis and statistics utilities for streaming data.
Its main goal is to provide **single-pass** variants of conventional `numpy `_
data analysis and statistics functionality for **streaming** data that is
either generated on the fly or to large to be handled at once. Data can be
streamed as in chunks called **mini-batches**, which makes ``statstream``
extremely useful in combination with machine learning and deep learning
packages like `keras `_, `tensorflow `_, or `pytorch `_... teaser-end
.. example
``statstream`` functions consume iterators providing batches of data.
They compute statistics of these batches and combine them to obtain statistics
for the full data set... code-block:: python
import statstream
mean = statstream.streaming_mean(some_iterable)The `Overview `_ and
`Examples `_ sections
of our documentation provide more realistic and complete examples... project-info-start
Project Information
===================``statstream`` is released under the `MIT license `_,
its documentation lives at `Read the Docs `_,
the code on `GitHub `_,
and the latest release can be found on `PyPI `_.
It’s tested on Python 2.7 and 3.5+.If you'd like to contribute to ``statstream`` you're most welcome.
We have written a `short guide `_ to help you get you started!.. project-info-end
.. literature-start
Further Reading
===============Additional information on the algorithmic aspects of ``statstream`` can be found
in the following works:- Tony F. Chan & Gene H. Golub & Randall J. LeVeque,
“Updating formulae and a pairwise algorithm for computing sample variances”,
1979
- Radim, Rehurek,
“Scalability of Semantic Analysis in Natural Language Processing”,
2011.. literature-end
Acknowledgments
===============During the setup of this project we were heavily influenced and inspired by
the works of `Hynek Schlawack `_ and in particular his
`attrs `_ package and blog posts on
`testing and packaing `_
and `deploying to PyPI `_.
Thank you for sharing your experiences and insights.