https://github.com/multimeric/pipechain
Functional pipelines in Python using method chaining
https://github.com/multimeric/pipechain
Last synced: 9 days ago
JSON representation
Functional pipelines in Python using method chaining
- Host: GitHub
- URL: https://github.com/multimeric/pipechain
- Owner: multimeric
- License: mit
- Created: 2021-10-26T13:52:26.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2023-03-24T11:58:34.000Z (about 3 years ago)
- Last Synced: 2024-10-10T09:31:03.978Z (over 1 year ago)
- Language: Python
- Homepage:
- Size: 44.9 KB
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: readme.ipynb
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# PipeChain\n",
"\n",
"## Motivation\n",
"\n",
"PipeChain is a utility library for creating functional pipelines.\n",
"Let's start with a motivating example.\n",
"We have a list of Australian phone numbers from our users.\n",
"We need to clean this data before we insert it into the database.\n",
"With PipeChain, you can do this whole process in one neat pipeline:\n"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"text/plain": "{'0255507491', '0455505488', '0491570156', '0493225813', '1800975707'}"
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from pipechain import PipeChain, PLACEHOLDER as _\n",
"\n",
"nums = [\n",
" \"493225813\",\n",
" \"0491 570 156\",\n",
" \"55505488\",\n",
" \"Barry\",\n",
" \"02 5550 7491\",\n",
" \"491570156\",\n",
" \"\",\n",
" \"1800 975 707\"\n",
"]\n",
"\n",
"PipeChain(\n",
" nums\n",
").pipe(\n",
" # Remove spaces\n",
" map, lambda x: x.replace(\" \", \"\"), _\n",
").pipe(\n",
" # Remove non-numeric entries\n",
" filter, lambda x: x.isnumeric(), _\n",
").pipe(\n",
" # Add the mobile code to the start of 8-digit numbers\n",
" map, lambda x: \"04\" + x if len(x) == 8 else x, _\n",
").pipe(\n",
" # Add the 0 to the start of 9-digit numbers\n",
" map, lambda x: \"0\" + x if len(x) == 9 else x, _\n",
").pipe(\n",
" # Convert to a set to remove duplicates\n",
" set\n",
").eval()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Without PipeChain, we would have to horrifically nest our code, or else use a lot of temporary variables:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"text/plain": "{'0255507491', '0455505488', '0491570156', '0493225813', '1800975707'}"
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"set(\n",
" map(\n",
" lambda x: \"0\" + x if len(x) == 9 else x,\n",
" map(\n",
" lambda x: \"04\" + x if len(x) == 8 else x,\n",
" filter(\n",
" lambda x: x.isnumeric(),\n",
" map(\n",
" lambda x: x.replace(\" \", \"\"),\n",
" nums\n",
" )\n",
" )\n",
" )\n",
" )\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"## Installation\n",
"\n",
"```bash\n",
"pip install pipechain\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Usage\n",
"### Basic Usage\n",
"\n",
"PipeChain has only two exports: `PipeChain`, and `PLACEHOLDER`.\n",
"\n",
"`PipeChain` is a class that defines a pipeline.\n",
"You create an instance of the class, and then call `.pipe()` to add another function onto the pipeline:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"text/plain": "PipeChain(arg=1, pipes=[functools.partial()])"
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from pipechain import PipeChain, PLACEHOLDER\n",
"PipeChain(1).pipe(str)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, you call `.eval()` to run the pipeline and return the result:"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"text/plain": "'1'"
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"PipeChain(1).pipe(str).eval()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can \"feed\" the pipe at either end, either during construction (`PipeChain(\"foo\")`), or during evaluation `.eval(\"foo\")`:"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"text/plain": "'1'"
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"PipeChain().pipe(str).eval(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Each call to `.pipe()` takes a function, and any additional arguments you provide, both positional and keyword, will be forwarded to the function:"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"text/plain": "['c', 'b', 'a']"
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"PipeChain([\"b\", \"a\", \"c\"]).pipe(sorted, reverse=True).eval()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Argument Position\n",
"By default, the previous value is passed as the first positional argument to the function:"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"text/plain": "8"
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"PipeChain(2).pipe(pow, 3).eval()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The only magic here is that if you use the `PLACEHOLDER` variable as an argument to `.pipe()`, then the pipeline will replace it with the output of the previous pipe at runtime:"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"text/plain": "9"
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"PipeChain(2).pipe(pow, 3, PLACEHOLDER).eval()"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Note that you can rename `PLACEHOLDER` to something more usable using Python's import statement, e.g."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"text/plain": "9"
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from pipechain import PLACEHOLDER as _\n",
"PipeChain(2).pipe(pow, 3, _).eval()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Methods\n",
"It might not see like methods will play that well with this pipe convention, but after all, they are just functions.\n",
"You should be able to access any object's method as a function by accessing it on that object's parent class.\n",
"In the below example, `str` is the parent class of \"\":"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"text/plain": "'abc'"
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"\"\".join([\"a\", \"b\", \"c\"])"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"text/plain": "'abc'"
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"PipeChain([\"a\", \"b\", \"c\"]).pipe(str.join, \"\", _).eval()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Operators\n",
"\n",
"The same goes for operators, such as `+`, `*`, `[]` etc.\n",
"We just have to use the `operator` module in the standard library:"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"text/plain": "15"
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from operator import add, mul, getitem\n",
"\n",
"PipeChain(5).pipe(mul, 3).eval()"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"text/plain": "8"
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"PipeChain(5).pipe(add, 3).eval()"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"text/plain": "'b'"
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"PipeChain([\"a\", \"b\", \"c\"]).pipe(getitem, 1).eval()"
]
},
{
"cell_type": "markdown",
"source": [
"## Test Suite\n",
"\n",
"Note, you will need poetry installed.\n",
"\n",
"To run the test suite, use:\n",
"\n",
"```bash\n",
"git clone https://github.com/multimeric/PipeChain.git\n",
"cd PipeChain\n",
"poetry install\n",
"poetry run pytest test/test.py\n",
"```"
],
"metadata": {
"collapsed": false
}
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
}
},
"nbformat": 4,
"nbformat_minor": 1
}