{"id":19341872,"url":"https://github.com/matchory/data-pipe","last_synced_at":"2026-05-16T04:32:19.909Z","repository":{"id":62524858,"uuid":"380965373","full_name":"matchory/data-pipe","owner":"matchory","description":"An opinionated framework for building data enrichment pipelines in PHP","archived":false,"fork":false,"pushed_at":"2021-09-23T14:51:54.000Z","size":104,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-01-06T11:45:18.421Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/matchory.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-06-28T08:47:18.000Z","updated_at":"2024-05-15T11:57:03.000Z","dependencies_parsed_at":"2022-11-02T13:47:44.250Z","dependency_job_id":null,"html_url":"https://github.com/matchory/data-pipe","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matchory%2Fdata-pipe","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matchory%2Fdata-pipe/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matchory%2Fdata-pipe/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matchory%2Fdata-pipe/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/matchory","download_url":"https://codeload.github.com/matchory/data-pipe/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240449497,"owners_count":19803120,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-10T03:32:55.662Z","updated_at":"2026-05-16T04:32:14.890Z","avatar_url":"https://github.com/matchory.png","language":"PHP","funding_links":[],"categories":[],"sub_categories":[],"readme":"Data Pipe [![Latest Stable Version](http://poser.pugx.org/matchory/data-pipe/v)](https://packagist.org/packages/matchory/data-pipe) [![Total Downloads](http://poser.pugx.org/matchory/data-pipe/downloads)](https://packagist.org/packages/matchory/data-pipe) [![Latest Unstable Version](http://poser.pugx.org/matchory/data-pipe/v/unstable)](https://packagist.org/packages/matchory/data-pipe) [![License](http://poser.pugx.org/matchory/data-pipe/license)](https://packagist.org/packages/matchory/data-pipe)\n=========\n\u003e An opinionated framework for building data enrichment pipelines in PHP\n\nData Pipe is a framework to create data enrichment pipelines in PHP. Such an application works by taking a piece of information, enriching it with additional\ndata, and enhancing that data by applying transformations on them.\n\nAs a more tangible example, take a _customer_ pipeline: It ingests the name of a customer, retrieves their _shopping history_ and _age_, then enhances the\nrecord by removing old items from the shopping history, and assigning a targeting group to the customer.\n\nWhile that, of course, merely describes some arbitrary business logic, Data Pipe helps you to describe this process with a set of reusable, composable, and\nencapsulated steps!\n\nPreface\n-------\nPlease note that this package is still **under active development** and **NOT ready** to be used in production environments yet. We're still building our own\nworkflow on top of data-pipe, so everything is subject to change until the 1.0 release. If you're interested in shaping the future of this library, you're very\nwelcome to jump in!\n\nInstallation\n------------\nInstall the library as a dependency using composer:\n```bash\nphp composer require matchory/data-pipe\n```\n\n### Symfony Usage\nThis package includes a Symfony integration. Please [read the instructions](./src/Integration/Symfony/README.md) to get started.  \nThe integration will add fully automatic pipeline configuration to your app.\n\n### Laravel Usage\nThis package includes an **incomplete** Laravel integration. Please [read the instructions](./src/Integration/Laravel/README.md) to get started.\n\n\u003e **Note:**\n\u003e We didn't implement Laravel support yet, because we don't currently need it. If you're interested in using `data-pipe` within a Laravel application, and would\n\u003e like to have automatic pipeline configuration as with Symfony, please [open an issue](https://github.com/matchory/data-pipe/issues).\n\n\nUsage\n-----\n\u003e **Note:** Before getting started with Data Pipe, you should familiarize\n\u003e yourself with [its core concepts](#core-concepts).\n\nData Pipe works by setting up pipelines with a pre-configured set of inter-dependent nodes. There are currently two types of\nnodes: [Collector nodes](#collector-nodes) and [Transformer nodes](#transformer-nodes) (which are both variants of generic pipeline nodes).  \nNodes take a payload object, modify and return it. Enriching nodes add new data, post-processing nodes transform existing values. This distinction might seem\nirrelevant, but it allows lots of runtime-optimizations.\n\n### Creating nodes\nIn its simplest form, an enriching node might look like this:\n```php\nuse Matchory\\DataPipe\\Nodes\\AbstractCollector as Node;\nuse Matchory\\DataPipe\\PipelineContext;\n\nclass MyNode extends Node\n{\n    public function __construct(protected $yourInternalAgeApi) {}\n\n    public function pipe(PipelineContext $context): PipelineContext\n    {\n        // Work with the data payload\n        $email = $context-\u003egetPayload()-\u003egetAttribute('email');\n        \n        // Perform domain-specific work\n        $age = $this-\u003eyourInternalAgeApi-\u003equery($email);\n        \n        // Update the payload\n        if ($age) {\n            $context-\u003eproposeChange($this, 'age', $age);\n        }\n        \n        return $context;\n    }\n}\n```\n\n### Proposing changes\nNote that you cannot directly update the payload: Every node receives just a clone of the actual payload. Instead, you can _propose_ a change to the payload.\nData Pipe provides a simple algorithm for\n[best-fit change application](#best-fit-change-application). This allows to keep and compare multiple values for a single attribute.\n\n### Creating pipelines\nNow that we have a node, let's create a pipeline to add it to:\n```php\nuse Matchory\\DataPipe\\Payload\\Payload;\nuse Matchory\\DataPipe\\Pipeline;\nuse Symfony\\Component\\EventDispatcher\\EventDispatcher;\n\n$nodes = [\n    new MyNode(),\n];\n$eventDispatcher = new EventDispatcher();\n$pipeline = new Pipeline($nodes, $eventDispatcher);\n\nfunction(): Generator {\n    yield new Payload([\n        'email' =\u003e 'foo@bar.com'\n    ]);\n}\n\n$pipeline-\u003eprocess(fetchNextPayload());\n```\n\n### DI usage\nThis is a contrived example, of course; in reality, a dependency-injection container would handle almost everything for you:\n```php\nuse Matchory\\DataPipe\\Pipeline;\n\nclass EntryPoint {\n    public function main(Pipeline $pipeline, Generator $recordFetcher): void\n    {\n        foreach ($recordFetcher as $record) {\n            $pipeline-\u003eprocess($recordFetcher);\n        }\n    }\n}\n```\n\nCore Concepts\n-------------\nData Pipe uses a few building blocks to structure your pipelines.\n\n### Pipeline nodes\nNodes are the stages forming a pipeline. They can depend on other nodes to have been executed previously; these dependencies will be figured out before the\npipeline runs, so you don't have to define an order manually. Every payload processed by the pipeline will be piped to all nodes in it, each having the option\nto suggest changes to the data.  \nThere are two types of nodes currently:\n\n#### Collector nodes\nNodes that enhance a record with additional information are called _collector nodes_. These nodes may optionally define a _cost_: It is used to order those\nnodes by cost, and determine whether executing additional nodes is even necessary.  \nImagine you have two data sources -- your own, internal database, and an external system that charges per API call. The node for your database will have a lower\ncost than that or the external API. Now, if we're looking for a piece of information, we'll first execute the \"cheaper\" node (your internal database), then,\n_only if it can't satisfy our request_, we'll also execute the more expensive node.\n\nThe more nodes you have, the more apparent the advantage of granular costs will be: Information will always be acquired with the cheapest means possible.\n\n#### Transformer nodes\nTransformer nodes allow you to refine, modify, or compare previously gathered information. This is different from data enriching nodes, as they're typically\nexecuted _after_ those nodes.\n\n### Best-Fit change application\nThe more data sources you have, the more variants of pieces of information you will collect. What's problematic is determining the _best_ of those variants -\nthink of an email address for example:\n\n- dxdtnfa1n5@privaterelay.appleid.com\n- foobar@trashmail.to\n- john.doe@company.com\n- john.doe+yourdomain.tld@gmail.com\n- john.doe@gmail.com\n\nDepending on a few rules, you're probably able to infer which is the closest variant to what you're looking for. Now, to keep a sequence of nodes from\noverriding each other's results, instead of setting an attribute on the payload, they can _suggest changes_ instead:\n\n```php\n$context-\u003eproposeChange($this, 'attribute_name', 42);\n```\n\nAll nodes may propose changes to existing data, along with an optional _confidence score_: In the email case above, for example, we'd probably have a grey-list\nof trashmail domains, and assign that address a low confidence score. The idea here is, _take that email if nothing better can be found later on_.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmatchory%2Fdata-pipe","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmatchory%2Fdata-pipe","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmatchory%2Fdata-pipe/lists"}