{"id":13571175,"url":"https://github.com/calebwin/pipelines","last_synced_at":"2025-04-09T18:19:44.235Z","repository":{"id":100568336,"uuid":"163543441","full_name":"calebwin/pipelines","owner":"calebwin","description":"An experimental programming language for data flow","archived":false,"fork":false,"pushed_at":"2019-10-18T20:14:14.000Z","size":1133,"stargazers_count":374,"open_issues_count":2,"forks_count":9,"subscribers_count":13,"default_branch":"master","last_synced_at":"2025-04-09T18:19:30.838Z","etag":null,"topics":["compiler","language","nim","parallel","pipeline","pipelines","python"],"latest_commit_sha":null,"homepage":"","language":"Nim","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/calebwin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2018-12-29T21:53:12.000Z","updated_at":"2025-02-08T12:34:34.000Z","dependencies_parsed_at":"2023-05-15T23:45:27.598Z","dependency_job_id":null,"html_url":"https://github.com/calebwin/pipelines","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/calebwin%2Fpipelines","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/calebwin%2Fpipelines/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/calebwin%2Fpipelines/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/calebwin%2Fpipelines/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/calebwin","download_url":"https://codeload.github.com/calebwin/pipelines/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248085325,"owners_count":21045139,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compiler","language","nim","parallel","pipeline","pipelines","python"],"created_at":"2024-08-01T14:00:59.576Z","updated_at":"2025-04-09T18:19:44.201Z","avatar_url":"https://github.com/calebwin.png","language":"Nim","readme":"\u003cp align=\"center\"\u003e\n\u003c!-- \u003cimg width=\"250px\" src=\"https://i.imgur.com/kTap42K.png\"/\u003e --\u003e\n    \u003cimg height=\"90px\" src=\"https://i.imgur.com/rbx2Hlh.png\"/\u003e\u003c!--https://i.imgur.com/YfK7YdY.png--\u003e\n\u003c/p\u003e\n\u003c!--- https://i.imgur.com/rbx2Hlh.png or https://i.imgur.com/YfK7YdY.png) ---\u003e\n\u003c!--- https://carbon.now.sh/?bg=rgba(239%2C228%2C176%2C1)\u0026t=zenburn\u0026wt=none\u0026l=python\u0026ds=true\u0026dsyoff=20px\u0026dsblur=68px\u0026wc=false\u0026wa=true\u0026pv=56px\u0026ph=56px\u0026ln=false\u0026fm=Ubuntu%20Mono\u0026fs=17px\u0026lh=136%25\u0026si=false\u0026code=from%2520utils%2520import%2520customers%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520as%2520customers%2520%2523%2520a%2520generator%2520function%2520in%2520the%2520utils%2520module%250Afrom%2520utils%2520import%2520parse_row%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520as%2520parser%250Afrom%2520utils%2520import%2520get_recommendations%2520%2520%2520as%2520recommender%250Afrom%2520utils%2520import%2520print_recommendations%2520as%2520printer%250A%250Acustomers%2520%257C%253E%2520parser%2520%257C%253E%2520recommender%2520%257C%253E%2520printer\u0026es=2x\u0026wm=false ---\u003e\n\nPipelines is a language and runtime for crafting massively parallel pipelines. Unlike other languages for defining data flow, the Pipeline language requires implementation of components to be defined separately in the Python scripting language. This allows the details of implementations to be separated from the structure of the pipeline, while providing access to thousands of active libraries for machine learning, data analysis and processing. Skip to [Getting Started](https://github.com/calebwin/pipelines#some-next-steps) to install the Pipeline compiler.\n\n### An example\n\nAs an introductory example, a simple pipeline for Fizz Buzz on even numbers could be written as follows -\n\n```python\nfrom fizzbuzz import numbers\nfrom fizzbuzz import even\nfrom fizzbuzz import fizzbuzz\nfrom fizzbuzz import printer\n\nnumbers\n/\u003e even \n|\u003e fizzbuzz where (number=*, fizz=\"Fizz\", buzz=\"Buzz\")\n|\u003e printer\n```\n\nMeanwhile, the implementation of the components would be written in Python -\n\n```python\ndef numbers():\n    for number in range(1, 100):\n        yield number\n\ndef even(number):\n    return number % 2 == 0\n\ndef fizzbuzz(number, fizz, buzz):\n    if number % 15 == 0: return fizz + buzz\n    elif number % 3 == 0: return fizz\n    elif number % 5 == 0: return buzz\n    else: return number\n\ndef printer(number):\n    print(number)\n```\n\nRunning the Pipeline document would safely execute each component of the pipeline in parallel and output the expected result.\n\n### The imports\n\nComponents are scripted in Python and linked into a pipeline using imports. The syntax for an import has 3 parts - (1) the path to the module, (2) the name of the function, and (3) the alias for the component. Here's an example -\n```python\nfrom parser import parse_fasta as parse\n```\nThat's really all there is to imports. Once a component is imported it can be referenced anywhere in the document with the alias.\n\n### The stream\n\nEvery pipeline is operated on a stream of data. The stream of data is created by a Python [generator](https://docs.python.org/3/tutorial/classes.html#generators). The following is an example of a generator that generates a stream of numbers from 0 to 1000.\n```python\ndef numbers():\n    for number in range(0, 1000):\n        yield number\n```\nHere's a generator that reads entries from a file\n```python\ndef customers():\n    for line in open(\"customers.csv\", 'r'):\n        yield line\n```\nThe first component in a pipeline is always the generator. The generator is run in parallel with all other components and each element of data is passed through the other components.\n```python\nfrom utils import customers             as customers # a generator function in the utils module\nfrom utils import parse_row             as parser\nfrom utils import get_recommendations   as recommender\nfrom utils import print_recommendations as printer\n\ncustomers |\u003e parser |\u003e recommender |\u003e printer\n```\n\n### The pipes\n\nPipes are what connect components together to form a pipeline. As of now, there are 2 types of pipes in the Pipeline language - (1) transformer pipes, and (2) filter pipes. Transformer pipes are used when input is to be passed through a component. For example, a function can be defined to determine the potential of a particle and a function can be defined to print the potential.\n```python\nparticles |\u003e get_potential |\u003e printer\n```\nThe above pipeline code would pass data from the stream generated by `particles` through `get_potential` and then the output of `get_potential` through `printer`. Filter pipes work similarly except they use the following component to filter data. For example, a function can be defined to determine if a person is over 50 and then print their names to a file.\n```python\npopulation /\u003e over_50 |\u003e printer\n```\nThis would use the function referenced by `over_50` to filter out data from the stream generated by `population` and then pass output to `printer`.\n\n### The `where` keyword\n\nThe `where` keyword lets you pass in multiple parameters to a component as opposed to just what the output from the previous component was. For example, a function can be defined to print to a file the names of all applicants under a certain age.\n```python\napplicants\n|\u003e printer where (person=*, age_limit=21)\n```\nThis could be done using a filter as well.\n```python\napplicants\n/\u003e age_limit where (person=*, age=21)\n|\u003e printer\n```\nIn this case, the function for `age_limit` could look something like this -\n```python\ndef age_limit(person, age):\n    return person.age \u003c= age\n```\nNote that this function still has just one return value - the boolean expression that is used to determine wether input to the component is passed on as output.\n\n### The `to` keyword\nThe `to` keyword is for when you want the previous component has multiple return values and you want to specify which ones to pass on to the next component. As an example, if you had a function for calculating the electronegativity and electron affinity of an atom, you could use it in a pipeline as follows -\n```python\natoms\n|\u003e calculator to (electronegativity, electron_affinity)\n|\u003e printer where (line=electronegativity)\n```\nHere's an example using a filter.\n```python\natoms\n/\u003e below where (atom=*, limit=2) to (is_below, electronegativity, electron_affinity) with is_below\n|\u003e printer where (line=electronegativity)\n```\nNote the use of the `with` keyword here. This is necessary for filters to specify which return value of the function is used to filter out elements in the stream.\n\n### Getting started\nAll you need to get started is the Pipelines compiler. You can install it by downloading the executable from [Releases](https://github.com/calebwin/pipelines/releases).\n\u003e If you have the [Nimble](https://github.com/nim-lang/nimble/) package manager installed and `~/.nimble/bin` permanantly added to your PATH environment variable (look this up \u003e if you don't know how to do this), you can also install by running the following command.\n\u003e ```\n\u003e nimble install pipelines\n\u003e ```\nPipelines' only dependency is [the Python interpreter](https://www.python.org/downloads/release/python-2715/) being installed on your system. At the moment, most versions 2.7 and earlier are supported and support for Python 3 is in the works. Once Pipelines is installed and added to your PATH, you can create a `.pipeline` file, run or compile anywhere on your system -\n```console\n$ pipelines\nthe .pipeline compiler (v:0.1.0)\n\nusage:\n  pipelines                Show this\n  pipelines \u003cfile\u003e         Compile .pipeline file\n  pipelines \u003cfolder\u003e       Compile all .pipeline files in folder\n  pipelines run \u003cfile\u003e     Run .pipeline file\n  pipelines clean \u003cfolder\u003e Remove all compiled .py files from folder\n\nfor more info, go to github.com/calebwin/pipelines\n```\n\n### Some next steps\n\nThere are several things I'm hoping to implement in the future for this project. I'm hoping to implement some sort of `and` operator for piping data from the stream into multiple components in parallel with the output ending up in the stream in a nondeterministic order. Further down the line, I plan on porting the whole thing to C and putting in a complete error handling system\n\u003c!---\n- String imports\n- Control allocation of processes with Pool\n- Use Pipe instead of multiple Queue\n- Only have num_cpus running at one time\n---\u003e\n","funding_links":[],"categories":["Nim"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcalebwin%2Fpipelines","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcalebwin%2Fpipelines","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcalebwin%2Fpipelines/lists"}