{"id":15628932,"url":"https://github.com/pwwang/pipda","last_synced_at":"2025-04-15T16:41:29.049Z","repository":{"id":40360036,"uuid":"316357203","full_name":"pwwang/pipda","owner":"pwwang","description":"A framework for data piping in python","archived":false,"fork":false,"pushed_at":"2023-10-10T21:27:55.000Z","size":870,"stargazers_count":37,"open_issues_count":0,"forks_count":3,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-28T22:34:57.133Z","etag":null,"topics":["data-wrangling","dplyr","pandas","piping","python"],"latest_commit_sha":null,"homepage":"https://pwwang.github.io/pipda/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pwwang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-11-26T23:34:47.000Z","updated_at":"2024-12-31T19:33:41.000Z","dependencies_parsed_at":"2024-06-18T16:49:25.451Z","dependency_job_id":"c0f51336-e99d-4c55-9303-a43af39dfb04","html_url":"https://github.com/pwwang/pipda","commit_stats":{"total_commits":139,"total_committers":2,"mean_commits":69.5,"dds":0.2517985611510791,"last_synced_commit":"e74af56a2f3e3bd7daacc3c27aca0d8b762e0342"},"previous_names":[],"tags_count":56,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pwwang%2Fpipda","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pwwang%2Fpipda/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pwwang%2Fpipda/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pwwang%2Fpipda/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pwwang","download_url":"https://codeload.github.com/pwwang/pipda/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248694562,"owners_count":21146945,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-wrangling","dplyr","pandas","piping","python"],"created_at":"2024-10-03T10:24:52.140Z","updated_at":"2025-04-15T16:41:29.029Z","avatar_url":"https://github.com/pwwang.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pipda\n\n[![Pypi][7]][8] [![Github][9]][10] [![PythonVers][11]][8] [![Codacy][16]][14] [![Codacy coverage][15]][14] ![Docs building][13] ![Building][12]\n\nA framework for data piping in python\n\nInspired by [siuba][1], [dfply][2], [plydata][3] and [dplython][4], but with simple yet powerful APIs to mimic the `dplyr` and `tidyr` packages in python\n\n[API][17] | [Change Log][18] | [Documentation][19]\n\n## Installation\n\n```shell\npip install -U pipda\n```\n\n## Usage\n\n### Verbs\n\n- A verb is pipeable (able to be called like `data \u003e\u003e verb(...)`)\n- A verb is dispatchable by the type of its first argument\n- A verb evaluates other arguments using the first one\n- A verb is passing down the context if not specified in the arguments\n\n```python\nimport pandas as pd\nfrom pipda import (\n    register_verb,\n    register_func,\n    register_operator,\n    evaluate_expr,\n    Operator,\n    Symbolic,\n    Context\n)\n\nf = Symbolic()\n\ndf = pd.DataFrame({\n    'x': [0, 1, 2, 3],\n    'y': ['zero', 'one', 'two', 'three']\n})\n\ndf\n\n#      x    y\n# 0    0    zero\n# 1    1    one\n# 2    2    two\n# 3    3    three\n\n@register_verb(pd.DataFrame)\ndef head(data, n=5):\n    return data.head(n)\n\ndf \u003e\u003e head(2)\n#      x    y\n# 0    0    zero\n# 1    1    one\n\n@register_verb(pd.DataFrame, context=Context.EVAL)\ndef mutate(data, **kwargs):\n    data = data.copy()\n    for key, val in kwargs.items():\n        data[key] = val\n    return data\n\ndf \u003e\u003e mutate(z=1)\n#    x      y  z\n# 0  0   zero  1\n# 1  1    one  1\n# 2  2    two  1\n# 3  3  three  1\n\ndf \u003e\u003e mutate(z=f.x)\n#    x      y  z\n# 0  0   zero  0\n# 1  1    one  1\n# 2  2    two  2\n# 3  3  three  3\n```\n\n### Functions used as verb arguments\n\n```python\n# verb can be used as an argument passed to another verb\n# dep=True make `data` argument invisible while calling\n@register_verb(pd.DataFrame, context=Context.EVAL, dep=True)\ndef if_else(data, cond, true, false):\n    cond.loc[cond.isin([True]), ] = true\n    cond.loc[cond.isin([False]), ] = false\n    return cond\n\n# The function is then also a singledispatch generic function\n\ndf \u003e\u003e mutate(z=if_else(f.x\u003e1, 20, 10))\n#    x      y   z\n# 0  0   zero  10\n# 1  1    one  10\n# 2  2    two  20\n# 3  3  three  20\n```\n\n```python\n# function without data argument\n@register_func\ndef length(strings):\n    return [len(s) for s in strings]\n\ndf \u003e\u003e mutate(z=length(f.y))\n\n#    x     y    z\n# 0  0  zero    4\n# 1  1   one    3\n# 2  2   two    3\n# 3  3 three    5\n```\n\n### Context\n\nThe context defines how a reference (`f.A`, `f['A']`, `f.A.B` is evaluated)\n\n```python\n@register_verb(pd.DataFrame, context=Context.SELECT)\ndef select(df, *columns):\n    return df[list(columns)]\n\ndf \u003e\u003e select(f.x, f.y)\n#    x     y\n# 0  0  zero\n# 1  1   one\n# 2  2   two\n# 3  3 three\n```\n\n## How it works\n\n```R\ndata %\u003e% verb(arg1, ..., key1=kwarg1, ...)\n```\n\nThe above is a typical `dplyr`/`tidyr` data piping syntax.\n\nThe counterpart python syntax we expect is:\n\n```python\ndata \u003e\u003e verb(arg1, ..., key1=kwarg1, ...)\n```\n\nTo implement that, we need to defer the execution of the `verb` by turning it into a `Verb` object, which holds all information of the function to be executed later. The `Verb` object won't be executed until the `data` is piped in. It all thanks to the [`executing`][5] package to let us determine the ast nodes where the function is called. So that we are able to determine whether the function is called in a piping mode.\n\nIf an argument is referring to a column of the data and the column will be involved in the later computation, the it also needs to be deferred. For example, with `dplyr` in `R`:\n\n```R\ndata %\u003e% mutate(z=a)\n```\n\nis trying add a column named `z` with the data from column `a`.\n\nIn python, we want to do the same with:\n\n```python\ndata \u003e\u003e mutate(z=f.a)\n```\n\nwhere `f.a` is a `Reference` object that carries the column information without fetching the data while python sees it immmediately.\n\nHere the trick is `f`. Like other packages, we introduced the `Symbolic` object, which will connect the parts in the argument and make the whole argument an `Expression` object. This object is holding the execution information, which we could use later when the piping is detected.\n\n## Documentation\n\n[https://pwwang.github.io/pipda/][19]\n\nSee also [datar][6] for real-case usages.\n\n[1]: https://github.com/machow/siuba\n[2]: https://github.com/kieferk/dfply\n[3]: https://github.com/has2k1/plydata\n[4]: https://github.com/dodger487/dplython\n[5]: https://github.com/alexmojaki/executing\n[6]: https://github.com/pwwang/datar\n[7]: https://img.shields.io/pypi/v/pipda?style=flat-square\n[8]: https://pypi.org/project/pipda/\n[9]: https://img.shields.io/github/v/tag/pwwang/pipda?style=flat-square\n[10]: https://github.com/pwwang/pipda\n[11]: https://img.shields.io/pypi/pyversions/pipda?style=flat-square\n[12]: https://img.shields.io/github/actions/workflow/status/pwwang/pipda/build.yml?label=CI\u0026style=flat-square\n[13]: https://img.shields.io/github/actions/workflow/status/pwwang/pipda/docs.yml?label=docs\u0026style=flat-square\n[14]: https://app.codacy.com/gh/pwwang/pipda/dashboard\n[15]: https://img.shields.io/codacy/coverage/75d312da24c94bdda5923627fc311a99?style=flat-square\n[16]: https://img.shields.io/codacy/grade/75d312da24c94bdda5923627fc311a99?style=flat-square\n[17]: https://pwwang.github.io/pipda/api/pipda/\n[18]: https://pwwang.github.io/pipda/CHANGELOG/\n[19]: https://pwwang.github.io/pipda/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpwwang%2Fpipda","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpwwang%2Fpipda","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpwwang%2Fpipda/lists"}