{"id":19389758,"url":"https://github.com/asuiu/pyxtension","last_synced_at":"2025-04-09T06:12:31.166Z","repository":{"id":57458668,"uuid":"44428902","full_name":"asuiu/pyxtension","owner":"asuiu","description":"Pure Python extensions library that includes Scala-like streams, Json with attribute access syntax, and other common use stuff","archived":false,"fork":false,"pushed_at":"2025-03-27T10:35:30.000Z","size":330,"stargazers_count":44,"open_issues_count":13,"forks_count":1,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-03-27T10:48:14.558Z","etag":null,"topics":["java-streams","mapreduce","python","python-iterables","python-itertools","python-json","python-mapreduce","python-multiprocessing","python-multithreading","python-streaming","streaming"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/asuiu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-10-17T08:06:01.000Z","updated_at":"2025-03-27T10:35:34.000Z","dependencies_parsed_at":"2024-05-22T12:24:39.445Z","dependency_job_id":"d8ba5f50-6edf-49f0-90f6-d9b4cd16b9ce","html_url":"https://github.com/asuiu/pyxtension","commit_stats":{"total_commits":155,"total_committers":3,"mean_commits":"51.666666666666664","dds":"0.058064516129032295","last_synced_commit":"31bacfaa0355f372ff05171c318ce8c5edef5410"},"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asuiu%2Fpyxtension","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asuiu%2Fpyxtension/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asuiu%2Fpyxtension/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asuiu%2Fpyxtension/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/asuiu","download_url":"https://codeload.github.com/asuiu/pyxtension/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247987285,"owners_count":21028895,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["java-streams","mapreduce","python","python-iterables","python-itertools","python-json","python-mapreduce","python-multiprocessing","python-multithreading","python-streaming","streaming"],"created_at":"2024-11-10T10:17:18.752Z","updated_at":"2025-04-09T06:12:31.160Z","avatar_url":"https://github.com/asuiu.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pyxtension\n[![Build Status](https://github.com/asuiu/pyxtension/actions/workflows/python-package.yml/badge.svg?branch=main)](https://github.com/asuiu/pyxtension/actions/workflows/python-package.yml)\n\n[pyxtension](https://github.com/asuiu/pyxtension) is a pure Python MIT-licensed library that includes Scala-like streams (using [Fluent Interface pattern](https://en.wikipedia.org/wiki/Fluent_interface)), Json with attribute access syntax, and other common-use stuff.\n\n###### Note:\n\n**Drop support \u0026 maintenance for Python 2.x version, due to [Py2 death](https://www.python.org/doc/sunset-python-2/).**\n\n Although Py2 version will remain in the repository, I won't update PyPi package, so the last Py2 version of the `pyxtension` available at [PyPi](https://pypi.org/project/pyxtension/) will remain [`1.12.7`](https://pypi.org/project/pyxtension/1.12.7/)\n\nStarting with [`1.13.0`](https://pypi.org/project/pyxtension/1.13.0/) I've migrated the packaging \u0026 distributing method to [Wheel](https://pythonwheels.com/).\n\n## Installation\n```\npip install pyxtension\n```\nor from Github:\n```\ngit clone https://github.com/asuiu/pyxtension.git\ncd pyxtension\npython setup.py install\n```\nor\n```\ngit submodule add https://github.com/asuiu/pyxtension.git\n```\n\n## Modules overview\n### Json.py\n##### Json\nA `dict` subclass to represent a Json object. You should be able to use this\nabsolutely anywhere you can use a `dict`. While this is probably the class you\nwant to use, there are a few caveats that follow from this being a `dict` under\nthe hood.\n\n**Never again will you have to write code like this**:\n```python\nbody = {\n    'query': {\n        'filtered': {\n            'query': {\n                'match': {'description': 'addictive'}\n            },\n            'filter': {\n                'term': {'created_by': 'ASU'}\n            }\n        }\n    }\n}\n```\n\nFrom now on, you may simply write the following three lines:\n```python\nbody = Json()\nbody.query.filtered.query.match.description = 'addictive'\nbody.query.filtered.filter.term.created_by = 'ASU'\n```\n### streams.py\n#### stream\n`stream` subclasses `collections.Iterable`. It's the same Python iterable, but with more added methods, suitable for multithreading and multiprocess processings.\nUsed to create stream processing pipelines, similar to those used in [Scala](http://www.scala-lang.org/) and [MapReduce](https://en.wikipedia.org/wiki/MapReduce) programming model.\nThose who used [Apache Spark](http://spark.apache.org/) [RDD](http://spark.apache.org/docs/latest/programming-guide.html#rdd-operations) functions will find this model of processing very easy to use.\n\n### [streams](https://github.com/asuiu/pyxtension/blob/master/streams.py)\n**Never again will you have to write code like this**:\n```python\n\u003e lst = xrange(1,6)\n\u003e reduce(lambda x, y: x * y, map(lambda _: _ * _, filter(lambda _: _ % 2 == 0, lst)))\n64\n```\nFrom now on, you may simply write the following lines:\n```python\n\u003e the_stream = stream( xrange(1,6) )\n\u003e the_stream.\\\n    filter(lambda _: _ % 2 == 0).\\\n    map(lambda _: _ * _).\\\n    reduce(lambda x, y: x * y)\n64\n```\n\n#### A Word Count [Map-Reduce](https://en.wikipedia.org/wiki/MapReduce) naive example using multiprocessing map\n```python\ncorpus = [\n    \"MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster.\",\n    \"At Google, MapReduce was used to completely regenerate Google's index of the World Wide Web\",\n    \"Conceptually similar approaches have been very well known since 1995 with the Message Passing Interface standard having reduce and scatter operations.\"]\n\ndef reduceMaps(m1, m2):\n    for k, v in m2.iteritems():\n        m1[k] = m1.get(k, 0) + v\n    return m1\n\nword_counts = stream(corpus).\\\n    mpmap(lambda line: stream(line.lower().split(' ')).countByValue()).\\\n    reduce(reduceMaps)\n```\n\n#### Basic methods\n###### **map(f)**\nIdentic with builtin `map` but returns a stream\n\n\n###### **mpmap(self, f: Callable[[_K], _V], poolSize: int = cpu_count(), bufferSize: Optional[int] = None)**\nParallel ordered map using `multiprocessing.Pool.imap()`.\n\nIt can replace the `map` when need to split computations to multiple cores, and order of results matters.\n\nIt spawns at most `poolSize` processes and applies the `f` function.\n\nIt won't take more than `bufferSize` elements from the input unless it was already required by output, so you can use it with `takeWhile` on infinite streams and not be afraid that it will continue work in background.\n\nThe elements in the result stream appears in the same order they appear in the initial iterable.\n\n```\n:type f: (T) -\u003e V\n:rtype: `stream`\n```\n\n\n###### **mpfastmap(self, f: Callable[[_K], _V], poolSize: int = cpu_count(), bufferSize: Optional[int] = None)**\nParallel ordered map using `multiprocessing.Pool.imap_unordered()`.\n\nIt can replace the `map` when the ordered of results doesn't matter.\n\nIt spawns at most `poolSize` processes and applies the `f` function.\n\nIt won't take more than `bufferSize` elements from the input unless it was already required by output, so you can use it with `takeWhile` on infinite streams and not be afraid that it will continue work in background.\n\nThe elements in the result stream appears in the unpredicted order.\n\n```\n:type f: (T) -\u003e V\n:rtype: `stream`\n```\n\n\n###### **fastmap(self, f: Callable[[_K], _V], poolSize: int = cpu_count(), bufferSize: Optional[int] = None)**\nParallel unordered map using multithreaded pool.\nIt can replace the `map` when the ordered of results doesn't matter.\n\nIt spawns at most `poolSize` threads and applies the `f` function.\n\nThe elements in the result stream appears in the **unpredicted** order.\n\nIt won't take more than `bufferSize` elements from the input unless it was already required by output, so you can use it with `takeWhile` on infinite streams and not be afraid that it will continue work in background.\n\nBecause of CPython [GIL](https://wiki.python.org/moin/GlobalInterpreterLock) it's most usefull for I/O or CPU intensive consuming native functions, or on Jython or IronPython interpreters.\n\n:type f: (T) -\u003e V\n\n:rtype: `stream`\n\n###### **mtmap(self, f: Callable[[_K], _V], poolSize: int = cpu_count(), bufferSize: Optional[int] = None)**\nParallel ordered map using multithreaded pool.\nIt can replace the `map` and the order of output stream will be the same as of the input.\n\nIt spawns at most `poolSize` threads and applies the `f` function.\n\nThe elements in the result stream appears in the **predicted** order.\n\nIt won't take more than `bufferSize` elements from the input unless it was already required by output, so you can use it with `takeWhile` on infinite streams and not be afraid that it will continue work in background.\n\nBecause of CPython [GIL](https://wiki.python.org/moin/GlobalInterpreterLock) it's most usefull for I/O or CPU intensive consuming native functions, or on Jython or IronPython interpreters.\n\n:type f: (T) -\u003e V\n\n:rtype: `stream`\n\n\n###### **flatMap(predicate=_IDENTITY_FUNC)**\n:param predicate: is a function that will receive elements of self collection and return an iterable\n\nBy default predicate is an identity function\n\n:type predicate: (V)-\u003e collections.Iterable[T]\n\n:return: will return stream of objects of the same type of elements from the stream returned by predicate()\n\nExample:\n```python\nstream([[1, 2], [3, 4], [4, 5]]).flatMap().toList() == [1, 2, 3, 4, 4, 5]\n```\n\n\n###### **filter(predicate)**\nidentic with builtin filter, but returns stream\n\n\n###### **reversed()**\nreturns reversed stream\n\n\n###### **exists(predicate)**\nTests whether a predicate holds for some of the elements of this sequence.\n\n:rtype: bool\n\nExample:\n```python\nstream([1, 2, 3]).exists(0) -\u003e False\nstream([1, 2, 3]).exists(1) -\u003e True\n```\n\n\n###### **keyBy(keyfunc = _IDENTITY_FUNC)**\nTransforms stream of values to a stream of tuples (key, value)\n\n:param keyfunc: function to map values to keys\n\n:type keyfunc: (V) -\u003e T\n\n:return: stream of Key, Value pairs\n\n:rtype: stream[( T, V )]\n\nExample:\n```python\nstream([1, 2, 3, 4]).keyBy(lambda _:_ % 2) -\u003e [(1, 1), (0, 2), (1, 3), (0, 4)]\n```\n\n###### **groupBy()**\ngroupBy([keyfunc]) -\u003e Make an iterator that returns consecutive keys and groups from the iterable.\n\nThe iterable needs not to be sorted on the same key function, but the keyfunction need to return hasable objects.\n\n:param keyfunc: [Optional] The key is a function computing a key value for each element.\n\n:type keyfunc: (T) -\u003e (V)\n\n:return: (key, sub-iterator) grouped by each value of key(value).\n\n:rtype: stream[ ( V, slist[T] ) ]\n\nExample:\n```python\nstream([1, 2, 3, 4]).groupBy(lambda _: _ % 2) -\u003e [(0, [2, 4]), (1, [1, 3])]\n```\n\n###### **countByValue()**\nReturns a collections.Counter of values\n\nExample\n```python\nstream(['a', 'b', 'a', 'b', 'c', 'd']).countByValue() == {'a': 2, 'b': 2, 'c': 1, 'd': 1}\n```\n\n###### **distinct()**\nReturns stream of distinct values. Values must be hashable.\n```python\nstream(['a', 'b', 'a', 'b', 'c', 'd']).distinct() == {'a', 'b', 'c', 'd'}\n```\n\n\n###### **reduce(f, init=None)**\nsame arguments with builtin reduce() function\n\n\n###### **toSet()**\nreturns sset() instance\n\n\n###### **toList()**\nreturns slist() instance\n\n\n###### **toMap()**\nreturns sdict() instance\n\n\n###### **sorted(key=None, cmp=None, reverse=False)**\nsame arguments with builtin sorted()\n\n\n###### **size()**\nreturns length of stream. Use carefully on infinite streams.\n\n\n###### **join(f)**\nReturns a string joined by f. Proivides same functionality as str.join() builtin method.\n\nif f is basestring, uses it to join the stream, else f should be a callable that returns a string to be used for join\n\n\n###### **mkString(f)**\nidentic with join(f)\n\n\n###### **take(n)**\n    returns first n elements from stream\n\n\n###### **head()**\n    returns first element from stream\n\n\n###### **zip()**\n    the same behavior with itertools.izip()\n\n###### **unique(predicate=_IDENTITY_FUNC)**\n    Returns a stream of unique (according to predicate) elements appearing in the same order as in original stream\n\n    The items returned by predicate should be hashable and comparable.\n\n\n#### Statistics related methods\n###### **entropy()**\ncalculates the Shannon entropy of the values from stream\n\n\n###### **pstddev()**\nCalculates the population standard deviation.\n\n\n###### **mean()**\nreturns the arithmetical mean of the values\n\n\n###### **sum()**\nreturns the sum of elements from stream\n\n\n###### **min(key=_IDENTITY_FUNC)**\nsame functionality with builtin min() funcion\n\n\n###### **min_default(default, key=_IDENTITY_FUNC)**\nsame functionality with min() but returns :default: when called on empty streams\n\n\n###### **max()**\nsame functionality with builtin max()\n\n\n###### **maxes(key=_IDENTITY_FUNC)**\nreturns a stream of max values from stream\n\n\n###### **mins(key=_IDENTITY_FUNC)**\nreturns a stream of min values from stream\n\n\n### Other classes\n##### slist\nInherits `streams.stream` and built-in `list` classes, and keeps in memory a list allowing faster index access\n##### sset\nInherits `streams.stream` and built-in `set` classes, and keeps in memory the whole set of values\n##### sdict\nInherits `streams.stream` and built-in `dict`, and keeps in memory the dict object.\n##### defaultstreamdict\nInherits `streams.sdict` and adds functionality  of `collections.defaultdict` from stdlib\n\n\n### [Json](https://github.com/asuiu/pyxtension/blob/master/Json.py)\n\n[Json](https://github.com/asuiu/pyxtension/blob/master/Json.py) is a module that provides mapping objects that allow their elements to be accessed both as keys and as attributes:\n\n```python\n    \u003e from pyxtension import Json\n\u003e a = Json({'foo': 'bar'})\n\u003e a.foo\n'bar'\n\u003e a['foo']\n'bar'\n```\n\nAttribute access makes it easy to create convenient, hierarchical settings objects:\n```python\n    with open('settings.yaml') as fileobj:\n        settings = Json(yaml.safe_load(fileobj))\n\n    cursor = connect(**settings.db.credentials).cursor()\n\n    cursor.execute(\"SELECT column FROM table;\")\n```\n\n### Basic Usage\n\nJson comes with two different classes, `Json`, and `JsonList`.\nJson is fairly similar to native `dict` as it extends it an is a mutable mapping that allow creating, accessing, and deleting key-value pairs as attributes.\n`JsonList` is similar to native `list` as it extends it and offers a way to transform the `dict` objects from inside also in `Json` instances.\n\n#### Construction\n###### Directly from a JSON string\n```python\n\u003e Json('{\"key1\": \"val1\", \"lst1\": [1,2] }')\n{u'key1': u'val1', u'lst1': [1, 2]}\n```\n###### From `tuple`s:\n```python\n\u003e Json( ('key1','val1'), ('lst1', [1,2]) )\n{'key1': 'val1', 'lst1': [1, 2]}\n# keep in mind that you should provide at least two tuples with key-value pairs\n```\n###### As a built-in `dict`\n```python\n\u003e Json( [('key1','val1'), ('lst1', [1,2])] )\n{'key1': 'val1', 'lst1': [1, 2]}\n\nJson({'key1': 'val1', 'lst1': [1, 2]})\n{'key1': 'val1', 'lst1': [1, 2]}\n```\n#### Convert to a `dict`\n```python\n\u003e json = Json({'key1': 'val1', 'lst1': [1, 2]})\n\u003e json.toOrig()\n{'key1': 'val1', 'lst1': [1, 2]}\n```\n\n#### Valid Names\n\nAny key can be used as an attribute as long as:\n\n1. The key represents a valid attribute (i.e., it is a string comprised only of\n   alphanumeric characters and underscores that doesn't start with a number)\n2. The key does not shadow a class attribute (e.g., get).\n\n#### Attributes vs. Keys\nThere is a minor difference between accessing a value as an attribute vs.\naccessing it as a key, is that when a dict is accessed as an attribute, it will\nautomatically be converted to a `Json` object. This allows you to recursively\naccess keys::\n```python\n    \u003e attr = Json({'foo': {'bar': 'baz'}})\n    \u003e attr.foo.bar\n    'baz'\n```\nRelatedly, by default, sequence types that aren't `bytes`, `str`, or `unicode`\n(e.g., `list`s, `tuple`s) will automatically be converted to `tuple`s, with any\nmappings converted to `Json`:\n```python\n    \u003e attr = Json({'foo': [{'bar': 'baz'}, {'bar': 'qux'}]})\n    \u003e for sub_attr in attr.foo:\n    \u003e     print(sub_attr.bar)\n    'baz'\n    'qux'\n```\nTo get this recursive functionality for keys that cannot be used as attributes,\nyou can replicate the behavior by using dict syntax on `Json` object::\n```python\n\u003e json = Json({1: {'two': 3}})\n\u003e json[1].two\n3\n```\n`JsonList` usage examples:\n```\n\u003e json = Json('{\"lst\":[1,2,3]}')\n\u003e type(json.lst)\n\u003cclass 'pyxtension.Json.JsonList'\u003e\n\n\u003e json = Json('{\"1\":[1,2]}')\n\u003e json[\"1\"][1]\n2\n```\n\n\nAssignment as keys will still work::\n```python\n\u003e json = Json({'foo': {'bar': 'baz'}})\n\u003e json['foo']['bar'] = 'baz'\n\u003e json.foo\n{'bar': 'baz'}\n```\n\n### frozendict\n`frozendict` is a simple immutable dictionary, where you can't change the internal variables of the class, and they are all immutable objects. Reinvoking `__init__` also doesn't alter the object.\n\nThe API is the same as `dict`, without methods that can change the immutability.\n\n`frozendict` is also hashable and can be used as keys for other dictionaries, of course with the condition that all values of the frozendict are also hashable.\n\n```python\n\u003e\u003e\u003e from pyxtension import frozendict\n\n\u003e\u003e\u003e fd = frozendict({\"A\": \"B\", \"C\": \"D\"})\n\u003e\u003e\u003e print(fd)\n{'A': 'B', 'C': 'D'}\n\n\u003e\u003e\u003e fd[\"A\"] = \"C\"\nTypeError: object is immutable\n\n\u003e\u003e\u003e hash(fd)\n-5063792767678978828\n```\n\n### License\npyxtension is released under a GNU Public license.\nThe idea for [Json](https://github.com/asuiu/pyxtension/blob/master/Json.py) module was inspired from [addict](https://github.com/mewwts/addict) and [AttrDict](https://github.com/bcj/AttrDict),\nbut it has a better performance with lower memory consumption.\n\n### Alternatives\nThere are other libraries that support Fluent Interface streams as alternatives to Pyxtension, but being much more poor in features for streaming:\n- https://pypi.org/project/lazy-streams/\n- https://pypi.org/project/pystreams/\n- https://pypi.org/project/fluentpy/\n- https://github.com/matthagy/scalaps\n- https://pypi.org/project/infixpy/ mentioned [here](https://stackoverflow.com/questions/49001986/left-to-right-application-of-operations-on-a-list-in-python3/62585964?noredirect=1#comment111806251_62585964)\n- https://github.com/sspipe/sspipe\n\n\nand something quite different from Fluent patterm, that makes kind of Piping: https://github.com/sspipe/sspipe and https://github.com/JulienPalard/Pipe\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fasuiu%2Fpyxtension","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fasuiu%2Fpyxtension","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fasuiu%2Fpyxtension/lists"}