{"id":19389755,"url":"https://github.com/asuiu/streamerate","last_synced_at":"2025-07-06T05:39:49.155Z","repository":{"id":162516500,"uuid":"636866193","full_name":"asuiu/streamerate","owner":"asuiu","description":"Iterable Java8 style Streams for Python","archived":false,"fork":false,"pushed_at":"2025-03-27T00:51:25.000Z","size":474,"stargazers_count":8,"open_issues_count":0,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-27T01:01:31.741Z","etag":null,"topics":["java-streams","map-reduce","mapreduce","python","python-iterables","python-itertools","python-mapreduce","python-multiprocessing","python-multithreading","python-streaming","python3","streaming"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/asuiu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-05T20:47:09.000Z","updated_at":"2025-03-27T00:51:28.000Z","dependencies_parsed_at":null,"dependency_job_id":"3cde7b06-3e6e-4428-9b66-1a541bf31b86","html_url":"https://github.com/asuiu/streamerate","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asuiu%2Fstreamerate","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asuiu%2Fstreamerate/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asuiu%2Fstreamerate/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asuiu%2Fstreamerate/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/asuiu","download_url":"https://codeload.github.com/asuiu/streamerate/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250539320,"owners_count":21447288,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["java-streams","map-reduce","mapreduce","python","python-iterables","python-itertools","python-mapreduce","python-multiprocessing","python-multithreading","python-streaming","python3","streaming"],"created_at":"2024-11-10T10:17:18.588Z","updated_at":"2025-04-24T00:30:58.539Z","avatar_url":"https://github.com/asuiu.png","language":"Python","readme":"# streamerate\n[![Build Status](https://github.com/asuiu/streamerate/actions/workflows/python-package.yml/badge.svg?branch=master)](https://github.com/asuiu/streamerate/actions/workflows/python-package.yml)\n\n__[streamerate](https://github.com/asuiu/streamerate)__  is a powerful pure-Python library inspired by **[Fluent Interface pattern](https://en.wikipedia.org/wiki/Fluent_interface)** (used by Java 8 streams), providing a chainable and expressive approach to processing iterable data.\n\n\nBy leveraging the **[Fluent Interface pattern](https://en.wikipedia.org/wiki/Fluent_interface)**, [streamerate](https://github.com/asuiu/streamerate) enables you to chain together multiple operations, such as filtering, mapping, and reducing, to create complex data processing pipelines with ease. With streamerate, you can write elegant and readable code that efficiently operates on streams of data, facilitating the development of clean and expressive Python applications.\n\n\n__[streamerate](https://github.com/asuiu/streamerate)__ empowers you to write elegant and functional code, unlocking the full potential of your iterable data processing pipelines\n\nThe library is distributed under the permissive [MIT license](https://opensource.org/license/mit/), allowing you to freely use, modify, and distribute it in both open-source and commercial projects.\n\n*Note:* __[streamerate](https://github.com/asuiu/streamerate)__ originated as part of the [pyxtension](https://github.com/asuiu/pyxtension) project but has since been migrated as a standalone library.\n\n\n## Installation\n```\npip install streamerate\n```\nor from Github:\n```\ngit clone https://github.com/asuiu/streamerate.git\ncd streamerate\npython setup.py install\n```\nor\n```\ngit submodule add https://github.com/asuiu/streamerate.git\n```\n\n### For developers (running unit tests)\n\nFirst install [Poetry](https://python-poetry.org/docs/) and make sure it\nis in your `$PATH`. Create a virtual environment using Poetry and then\nuse the script to run tests:\n```bash\npoetry install\npoetry run ./run_tests.py\n```\n\n\n## Modules overview\n\n### streams.py\n#### stream\n`stream` subclasses `collections.Iterable`. It's the same Python iterable, but with more added methods, suitable for multithreading and multiprocess processings.\nUsed to create stream processing pipelines, similar to those used in [Scala](http://www.scala-lang.org/) and [MapReduce](https://en.wikipedia.org/wiki/MapReduce) programming model.\nThose who used [Apache Spark](http://spark.apache.org/) [RDD](http://spark.apache.org/docs/latest/programming-guide.html#rdd-operations) functions will find this model of processing very easy to use.\n\n### [streams](https://github.com/asuiu/streamerate/blob/master/streams.py)\n**Never again will you have to write code like this**:\n```python\n\u003e lst = xrange(1,6)\n\u003e reduce(lambda x, y: x * y, map(lambda _: _ * _, filter(lambda _: _ % 2 == 0, lst)))\n64\n```\nFrom now on, you may simply write the following lines:\n```python\n\u003e the_stream = stream( xrange(1,6) )\n\u003e the_stream.\\\n    filter(lambda _: _ % 2 == 0).\\\n    map(lambda _: _ * _).\\\n    reduce(lambda x, y: x * y)\n64\n```\n\n#### A Word Count [Map-Reduce](https://en.wikipedia.org/wiki/MapReduce) naive example using multiprocessing map\n```python\ncorpus = [\n    \"MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster.\",\n    \"At Google, MapReduce was used to completely regenerate Google's index of the World Wide Web\",\n    \"Conceptually similar approaches have been very well known since 1995 with the Message Passing Interface standard having reduce and scatter operations.\"]\n\ndef reduceMaps(m1, m2):\n    for k, v in m2.iteritems():\n        m1[k] = m1.get(k, 0) + v\n    return m1\n\nword_counts = stream(corpus).\\\n    mpmap(lambda line: stream(line.lower().split(' ')).countByValue()).\\\n    reduce(reduceMaps)\n```\n\n#### Basic methods\n##### **map(f)**\nIdentic with builtin `map` but returns a stream\n\n\n##### **mpmap(self, f: Callable[[_K], _V], poolSize: int = cpu_count(), bufferSize: Optional[int] = None)**\nParallel ordered map using `multiprocessing.Pool.imap()`.\n\nIt can replace the `map` when need to split computations to multiple cores, and order of results matters.\n\nIt spawns at most `poolSize` processes and applies the `f` function.\n\nIt won't take more than `bufferSize` elements from the input unless it was already required by output, so you can use it with `takeWhile` on infinite streams and not be afraid that it will continue work in background.\n\nThe elements in the result stream appears in the same order they appear in the initial iterable.\n\n```\n:type f: (T) -\u003e V\n:rtype: `stream`\n```\n\n\n##### **mpfastmap(self, f: Callable[[_K], _V], poolSize: int = cpu_count(), bufferSize: Optional[int] = None)**\nParallel ordered map using `multiprocessing.Pool.imap_unordered()`.\n\nIt can replace the `map` when the ordered of results doesn't matter.\n\nIt spawns at most `poolSize` processes and applies the `f` function.\n\nIt won't take more than `bufferSize` elements from the input unless it was already required by output, so you can use it with `takeWhile` on infinite streams and not be afraid that it will continue work in background.\n\nThe elements in the result stream appears in the unpredicted order.\n\n```\n:type f: (T) -\u003e V\n:rtype: `stream`\n```\n\n\n##### **fastmap(self, f: Callable[[_K], _V], poolSize: int = cpu_count(), bufferSize: Optional[int] = None)**\nParallel unordered map using multithreaded pool.\nIt can replace the `map` when the ordered of results doesn't matter.\n\nIt spawns at most `poolSize` threads and applies the `f` function.\n\nThe elements in the result stream appears in the **unpredicted** order.\n\nIt won't take more than `bufferSize` elements from the input unless it was already required by output, so you can use it with `takeWhile` on infinite streams and not be afraid that it will continue work in background.\n\nBecause of CPython [GIL](https://wiki.python.org/moin/GlobalInterpreterLock) it's most usefull for I/O or CPU intensive consuming native functions, or on Jython or IronPython interpreters.\n\n:type f: (T) -\u003e V\n\n:rtype: `stream`\n\n##### **mtmap(self, f: Callable[[_K], _V], poolSize: int = cpu_count(), bufferSize: Optional[int] = None)**\nParallel ordered map using multithreaded pool.\nIt can replace the `map` and the order of output stream will be the same as of the input.\n\nIt spawns at most `poolSize` threads and applies the `f` function.\n\nThe elements in the result stream appears in the **predicted** order.\n\nIt won't take more than `bufferSize` elements from the input unless it was already required by output, so you can use it with `takeWhile` on infinite streams and not be afraid that it will continue work in background.\n\nBecause of CPython [GIL](https://wiki.python.org/moin/GlobalInterpreterLock) it's most usefull for I/O or CPU intensive consuming native functions, or on Jython or IronPython interpreters.\n\n:type f: (T) -\u003e V\n\n:rtype: `stream`\n\n##### **gtmap(self, f: Callable[[_K], _V], poolSize: int = cpu_count())**\n\n##### **flatMap(predicate=_IDENTITY_FUNC)**\n:param predicate: is a function that will receive elements of self collection and return an iterable\n\nBy default predicate is an identity function\n\n:type predicate: (V)-\u003e collections.Iterable[T]\n\n:return: will return stream of objects of the same type of elements from the stream returned by predicate()\n\nExample:\n```python\nstream([[1, 2], [3, 4], [4, 5]]).flatMap().toList() == [1, 2, 3, 4, 4, 5]\n```\n\n\n##### **filter(predicate)**\nidentic with builtin filter, but returns stream\n\n\n##### **reversed()**\nreturns reversed stream\n\n\n##### **exists(predicate)**\nTests whether a predicate holds for some of the elements of this sequence.\n\n:rtype: bool\n\nExample:\n```python\nstream([1, 2, 3]).exists(0) -\u003e False\nstream([1, 2, 3]).exists(1) -\u003e True\n```\n\n\n##### **keyBy(keyfunc = _IDENTITY_FUNC)**\nTransforms stream of values to a stream of tuples (key, value)\n\n:param keyfunc: function to map values to keys\n\n:type keyfunc: (V) -\u003e T\n\n:return: stream of Key, Value pairs\n\n:rtype: stream[( T, V )]\n\nExample:\n```python\nstream([1, 2, 3, 4]).keyBy(lambda _:_ % 2) -\u003e [(1, 1), (0, 2), (1, 3), (0, 4)]\n```\n\n##### **groupBy()**\ngroupBy([keyfunc]) -\u003e Make an iterator that returns consecutive keys and groups from the iterable.\n\nThe iterable needs not to be sorted on the same key function, but the keyfunction need to return hasable objects.\n\n:param keyfunc: [Optional] The key is a function computing a key value for each element.\n\n:type keyfunc: (T) -\u003e (V)\n\n:return: (key, sub-iterator) grouped by each value of key(value).\n\n:rtype: stream[ ( V, slist[T] ) ]\n\nExample:\n```python\nstream([1, 2, 3, 4]).groupBy(lambda _: _ % 2) -\u003e [(0, [2, 4]), (1, [1, 3])]\n```\n\n##### **countByValue()**\nReturns a collections.Counter of values\n\nExample\n```python\nstream(['a', 'b', 'a', 'b', 'c', 'd']).countByValue() == {'a': 2, 'b': 2, 'c': 1, 'd': 1}\n```\n\n##### **distinct()**\nReturns stream of distinct values. Values must be hashable.\n```python\nstream(['a', 'b', 'a', 'b', 'c', 'd']).distinct() == {'a', 'b', 'c', 'd'}\n```\n\n\n##### **reduce(f, init=None)**\nsame arguments with builtin reduce() function\n\n##### **throttle(max_req: int, interval: float) -\u003e \"stream[_K]\"**\nThrottles the stream.\n\n:param max_req: number of requests\n:param interval: period in number of seconds\n:return: throttled stream\n\nExample:\n```py\n\u003e\u003e\u003e s = Stream()\n\u003e\u003e\u003e throttled_stream = s.throttle(10, 1.5)\n\u003e\u003e\u003e for item in throttled_stream:\n...     print(item)\n```\n\n##### **toSet()**\nreturns sset() instance\n\n\n##### **toList()**\nreturns slist() instance\n\n\n##### **toMap()**\nreturns sdict() instance\n\n\n##### **sorted(key=None, cmp=None, reverse=False)**\nsame arguments with builtin sorted()\n\n\n##### **size()**\nreturns length of stream. Use carefully on infinite streams.\n\n\n##### **join(f)**\nReturns a string joined by f. Proivides same functionality as str.join() builtin method.\n\nif f is basestring, uses it to join the stream, else f should be a callable that returns a string to be used for join\n\n\n##### **mkString(f)**\nidentic with join(f)\n\n\n##### **take(n)**\n    returns first n elements from stream\n\n\n##### **head()**\n    returns first element from stream\n\n\n##### **zip()**\n    the same behavior with itertools.izip()\n\n##### **unique(predicate=_IDENTITY_FUNC)**\n    Returns a stream of unique (according to predicate) elements appearing in the same order as in original stream\n\n    The items returned by predicate should be hashable and comparable.\n\n\n#### Statistics related methods\n##### **entropy()**\ncalculates the Shannon entropy of the values from stream\n\n\n##### **pstddev()**\nCalculates the population standard deviation.\n\n\n##### **mean()**\nreturns the arithmetical mean of the values\n\n\n##### **sum()**\nreturns the sum of elements from stream\n\n\n##### **min(key=_IDENTITY_FUNC)**\nsame functionality with builtin min() funcion\n\n\n##### **min_default(default, key=_IDENTITY_FUNC)**\nsame functionality with min() but returns :default: when called on empty streams\n\n\n##### **max()**\nsame functionality with builtin max()\n\n\n##### **maxes(key=_IDENTITY_FUNC)**\nreturns a stream of max values from stream\n\n\n##### **mins(key=_IDENTITY_FUNC)**\nreturns a stream of min values from stream\n\n\n### Other classes\n##### slist\nInherits `streams.stream` and built-in `list` classes, and keeps in memory a list allowing faster index access\n##### sset\nInherits `streams.stream` and built-in `set` classes, and keeps in memory the whole set of values\n##### sdict\nInherits `streams.stream` and built-in `dict`, and keeps in memory the dict object.\n##### defaultstreamdict\nInherits `streams.sdict` and adds functionality  of `collections.defaultdict` from stdlib\n\n\n### License\nstreamerate is released under MIT license.\n\n### Alternatives\nThere are other libraries that support Fluent Interface streams as alternatives to streamerate, but being much more poor in features for streaming:\n- https://pypi.org/project/lazy-streams/\n- https://pypi.org/project/pystreams/\n- https://pypi.org/project/fluentpy/\n- https://github.com/matthagy/scalaps\n- https://pypi.org/project/infixpy/ mentioned [here](https://stackoverflow.com/questions/49001986/left-to-right-application-of-operations-on-a-list-in-python3/62585964?noredirect=1#comment111806251_62585964)\n- https://github.com/sspipe/sspipe\n\n\nand something quite different from Fluent pattern, that makes kind of Piping: https://github.com/sspipe/sspipe and https://github.com/JulienPalard/Pipe\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fasuiu%2Fstreamerate","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fasuiu%2Fstreamerate","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fasuiu%2Fstreamerate/lists"}