{"id":13616681,"url":"https://github.com/code-rhapsodie/dataflow-bundle","last_synced_at":"2025-04-13T11:06:23.261Z","repository":{"id":52682228,"uuid":"214142947","full_name":"code-rhapsodie/dataflow-bundle","owner":"code-rhapsodie","description":"Data processing framework inspired by PortPHP","archived":false,"fork":false,"pushed_at":"2024-10-31T15:40:49.000Z","size":285,"stargazers_count":17,"open_issues_count":2,"forks_count":4,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-13T11:05:52.886Z","etag":null,"topics":["database","dataflow","dataflow-bundle","frame","php","portphp","readers","symfony-bundle","writer"],"latest_commit_sha":null,"homepage":"https://www.code-rhapsodie.fr/","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/code-rhapsodie.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-10-10T09:34:48.000Z","updated_at":"2025-04-09T10:14:51.000Z","dependencies_parsed_at":"2023-11-16T16:49:16.693Z","dependency_job_id":"77f97100-32db-4f46-b4cd-ebb001fa0f68","html_url":"https://github.com/code-rhapsodie/dataflow-bundle","commit_stats":{"total_commits":46,"total_committers":7,"mean_commits":6.571428571428571,"dds":0.4782608695652174,"last_synced_commit":"25b2e9ec0f3c5d71f203706e292608f2d055c13f"},"previous_names":[],"tags_count":17,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code-rhapsodie%2Fdataflow-bundle","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code-rhapsodie%2Fdataflow-bundle/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code-rhapsodie%2Fdataflow-bundle/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code-rhapsodie%2Fdataflow-bundle/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/code-rhapsodie","download_url":"https://codeload.github.com/code-rhapsodie/dataflow-bundle/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248703200,"owners_count":21148118,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["database","dataflow","dataflow-bundle","frame","php","portphp","readers","symfony-bundle","writer"],"created_at":"2024-08-01T20:01:31.877Z","updated_at":"2025-04-13T11:06:23.218Z","avatar_url":"https://github.com/code-rhapsodie.png","language":"PHP","readme":"# Code Rhapsodie Dataflow Bundle\n\nDataflowBundle is a bundle for Symfony 3.4+\nproviding an easy way to create import / export dataflow.\n\n| Dataflow | Symfony                  | Support |\n|----------|--------------------------|---------|\n| 5.x      | 7.x                      | yes     |\n| 4.x      | 3.4 \\| 4.x \\| 5.x \\| 6.x | yes     |\n| 3.x      | 3.4 \\| 4.x \\| 5.x        | no      |\n| 2.x      | 3.4 \\| 4.x               | no      |\n| 1.x      | 3.4 \\| 4.x               | no      |\n\nDataflow uses a linear generic workflow in three parts:\n\n* one reader\n* any number of steps that can be synchronous or asynchronous\n* one or more writers\n\nThe reader can read data from anywhere and return data row by row. Each step processes the current row data.\nThe steps are executed in the order in which they are added.\nAnd, one or more writers save the row anywhere you want.\n\nAs the following schema shows, you can define more than one dataflow:\n\n![Dataflow schema](src/Resources/doc/schema.png)\n\n# Features\n\n* Define and configure a Dataflow\n* Run the Job scheduled\n* Run one Dataflow from the command line\n* Define the schedule for a Dataflow from the command line\n* Enable/Disable a scheduled Dataflow from the command line\n* Display the list of scheduled Dataflow from the command line\n* Display the result for the last Job for a Dataflow from the command line\n* Work with multiple Doctrine DBAL connections\n\n## Installation\n\nSecurity notice: Symfony 4.x is not supported before 4.1.12, see https://github.com/advisories/GHSA-pgwj-prpq-jpc2\n\n### Add the dependency\n\nTo install this bundle, run this command :\n\n```shell script\n$ composer require code-rhapsodie/dataflow-bundle\n```\n\n#### Suggest\n\nYou can use the generic readers, writers and steps from [PortPHP](https://github.com/portphp/portphp).\n\nFor the writers, you must use the adapter `CodeRhapsodie\\DataflowBundle\\DataflowType\\Writer\\PortWriterAdapter` like\nthis:\n\n```php\n\u003c?php\n// ...\n$streamWriter = new \\Port\\Writer\\StreamMergeWriter();\n\n$builder-\u003eaddWriter(new \\CodeRhapsodie\\DataflowBundle\\DataflowType\\Writer\\PortWriterAdapter($streamWriter));\n// ...\n```\n\n### Register the bundle\n\nAdd `CodeRhapsodie\\DataflowBundle\\CodeRhapsodieDataflowBundle::class =\u003e ['all' =\u003e true],\n` in the `config/bundles.php` file.\n\nLike this:\n\n```php\n\u003c?php\n\nreturn [\n     // ...\n    CodeRhapsodie\\DataflowBundle\\CodeRhapsodieDataflowBundle::class =\u003e ['all' =\u003e true],\n    // ...\n];\n```\n\n### Update the database\n\nThis bundle uses Doctrine DBAL to store Dataflow schedule into the database table (`cr_dataflow_scheduled`)\nand jobs (`cr_dataflow_job`).\n\nIf you use [Doctrine Migration Bundle](https://symfony.com/doc/master/bundles/DoctrineMigrationsBundle/index.html)\nor [Phinx](https://phinx.org/)\nor [Kaliop Migration Bundle](https://github.com/kaliop-uk/ezmigrationbundle) or whatever,\nyou can add a new migration with the generated SQL query from this command:\n\n```shell script\n$ bin/console code-rhapsodie:dataflow:dump-schema\n```\n\nIf you have already the tables, you can add a new migration with the generated update SQL query from this command:\n\n```shell script\n$ bin/console code-rhapsodie:dataflow:dump-schema --update\n```\n\n## Configuration\n\nBy default, the Doctrine DBAL connection used is `default`. You can configure the default connection.\nAdd this configuration into your Symfony configuration:\n\n```yaml\ncode_rhapsodie_dataflow:\n  dbal_default_connection: test #Name of the default connection used by Dataflow bundle\n```\n\nBy default, the `logger` service will be used to log all exceptions and custom messages.\nIf you want to use another logger, like a specific Monolog handler, Add this configuration:\n\n```yaml\ncode_rhapsodie_dataflow:\n  default_logger: monolog.logger.custom #Service ID of the logger you want Dataflow to use\n```\n\n### Messenger mode\n\nDataflow can delegate the execution of its jobs to the Symfony messenger component, if available.\nThis allows jobs to be executed concurrently by workers instead of sequentially.\n\nTo enable messenger mode:\n\n```yaml\ncode_rhapsodie_dataflow:\n  messenger_mode:\n    enabled: true\n    # bus: 'messenger.default_bus' #Service ID of the bus you want Dataflow to use, if not the default one\n```\n\nYou also need to route Dataflow messages to the proper transport:\n\n```yaml\n# config/packages/messenger.yaml\nframework:\n  messenger:\n    transports:\n      async: '%env(MESSENGER_TRANSPORT_DSN)%'\n\n    routing:\n      CodeRhapsodie\\DataflowBundle\\MessengerMode\\JobMessage: async\n```\n\n## Define a dataflow type\n\nThis bundle uses a fixed and simple workflow structure in order to let you focus on the data processing logic part of\nyour dataflow.\n\nA dataflow type defines the different parts of your dataflow. A dataflow is made of:\n\n- exactly one *Reader*\n- any number of *Steps*\n- one or more *Writers*\n\nDataflow types can be configured with options.\n\nA dataflow type must implement `CodeRhapsodie\\DataflowBundle\\DataflowType\\DataflowTypeInterface`.\n\nTo help with creating your dataflow types, an abstract\nclass `CodeRhapsodie\\DataflowBundle\\DataflowType\\AbstractDataflowType`\nis provided, allowing you to define your dataflow through a handy\nbuilder `CodeRhapsodie\\DataflowBundle\\DataflowType\\DataflowBuilder`.\n\nThis is an example to define one class DataflowType:\n\n```php\n\u003c?php\nnamespace CodeRhapsodie\\DataflowExemple\\DataflowType;\n\nuse CodeRhapsodie\\DataflowBundle\\DataflowType\\AbstractDataflowType;\nuse CodeRhapsodie\\DataflowBundle\\DataflowType\\DataflowBuilder;\nuse CodeRhapsodie\\DataflowExemple\\Reader\\FileReader;\nuse CodeRhapsodie\\DataflowExemple\\Writer\\FileWriter;\n\nclass MyFirstDataflowType extends AbstractDataflowType\n{\n    private $myReader;\n\n    private $myWriter;\n\n    public function __construct(FileReader $myReader, FileWriter $myWriter)\n    {\n        $this-\u003emyReader = $myReader;\n        $this-\u003emyWriter = $myWriter;\n    }\n\n    protected function buildDataflow(DataflowBuilder $builder, array $options): void\n    {\n        $this-\u003emyWriter-\u003esetDestinationFilePath($options['to-file']);\n\n        $builder\n            -\u003esetReader($this-\u003emyReader-\u003eread($options['from-file']))\n            -\u003eaddStep(function ($data) use ($options) {\n                // TODO : Write your code here...\n                return $data;\n            })\n            -\u003eaddWriter($this-\u003emyWriter)\n        ;\n    }\n\n    protected function configureOptions(OptionsResolver $optionsResolver): void\n    {\n        $optionsResolver-\u003esetDefaults(['to-file' =\u003e '/tmp/dataflow.csv', 'from-file' =\u003e null]);\n        $optionsResolver-\u003esetRequired('from-file');\n    }\n\n    public function getLabel(): string\n    {\n        return 'My First Dataflow';\n    }\n\n    public function getAliases(): iterable\n    {\n        return ['mfd'];\n    }\n}\n\n```\n\nDataflow types must be tagged with `coderhapsodie.dataflow.type`.\n\nIf you're using Symfony auto-configuration for your services, this tag will be automatically added to all services\nimplementing `DataflowTypeInterface`.\n\nOtherwise, manually add the tag `coderhapsodie.dataflow.type` in your dataflow type service configuration:\n\n```yaml\n```yaml\nCodeRhapsodie\\DataflowExemple\\DataflowType\\MyFirstDataflowType:\n  tags:\n    - { name: coderhapsodie.dataflow.type }\n```\n\n### Use options for your dataflow type\n\nThe `AbstractDataflowType` can help you define options for your Dataflow type.\n\nAdd this method in your DataflowType class:\n\n```php\n\u003c?php\n// ...\nuse Symfony\\Component\\OptionsResolver\\OptionsResolver;\n\nclass MyFirstDataflowType extends AbstractDataflowType\n{\n    // ...\n    protected function configureOptions(OptionsResolver $optionsResolver): void\n    {\n        $optionsResolver-\u003esetDefaults(['to-file' =\u003e '/tmp/dataflow.csv', 'from-file' =\u003e null]);\n        $optionsResolver-\u003esetRequired('from-file');\n    }\n\n}\n```\n\nWith this configuration, the option `fileName` is required. For an advanced usage of the option resolver, read\nthe [Symfony documentation](https://symfony.com/doc/current/components/options_resolver.html).\n\nFor asynchronous management, `AbstractDataflowType` come with two default options :\n\n- loopInterval : default to 0. Update this interval if you wish customise the `tick` loop duration.\n- emitInterval : default to 0. Update this interval to have a control when reader must emit new data in the flow\n  pipeline.\n\n### Logging\n\nAll exceptions will be caught and written in the logger.\nIf you want to add custom messages in the log, you can inject the logger in your readers / steps / writers.\nIf your DataflowType class extends `AbstractDataflowType`, the logger is accessible as `$this-\u003elogger`.\n\n```php\n\u003c?php\n// ...\nuse Symfony\\Component\\OptionsResolver\\OptionsResolver;\n\nclass MyDataflowType extends AbstractDataflowType\n{\n    // ...\n    protected function buildDataflow(DataflowBuilder $builder, array $options): void\n    {\n        $this-\u003emyWriter-\u003esetLogger($this-\u003elogger);\n    }\n\n}\n```\n\nWhen using the `code-rhapsodie:dataflow:run-pending` command, this logger will also be used to save the log in the\ncorresponding job in the database.\n\n### Check if your DataflowType is ready\n\nExecute this command to check if your DataflowType is correctly registered:\n\n```shell script\n$ bin/console debug:container --tag coderhapsodie.dataflow.type\n```\n\nThe result is like this:\n\n```\nSymfony Container Public and Private Services Tagged with \"coderhapsodie.dataflow.type\" Tag\n===========================================================================================\n\n ---------------------------------------------------------------- ---------------------------------------------------------------- \n  Service ID                                                       Class name                                                      \n ---------------------------------------------------------------- ---------------------------------------------------------------- \n  CodeRhapsodie\\DataflowExemple\\DataflowType\\MyFirstDataflowType   CodeRhapsodie\\DataflowExemple\\DataflowType\\MyFirstDataflowType  \n ---------------------------------------------------------------- ---------------------------------------------------------------- \n\n```\n\n### Readers\n\n*Readers* provide the dataflow with elements to import / export. Usually, elements are read from an external resource (\nfile, database, webservice, etc).\n\nA *Reader* can be any `iterable`.\n\nThe only constraint on the returned elements typing is that they cannot be `false`.\n\nThe reader can be a generator like this example :\n\n```php\n\u003c?php\n\nnamespace CodeRhapsodie\\DataflowExemple\\Reader;\n\nclass FileReader\n{\n    public function read(string $filename): iterable\n    {\n        if (!$filename) {\n            throw new \\Exception(\"The file name is not defined. Define it with 'setFilename' method\");\n        }\n\n        if (!$fh = fopen($filename, 'r')) {\n            throw new \\Exception(\"Unable to open file '\".$filename.\"' for read.\");\n        }\n\n        while (false !== ($read = fgets($fh))) {\n            yield explode('|', trim($read));\n        }\n    }\n}\n```\n\nYou can set up this reader as follows:\n\n```php\n$builder-\u003esetReader(($this-\u003emyReader)())\n``` \n\n### Steps\n\n*Steps* are operations performed on the elements before they are handled by the *Writers*. Usually, steps are either:\n\n- converters, that alter the element\n- filters, that conditionally prevent further operations on the element\n- generators, that can include asynchronous operations\n\nA *Step* can be any callable, taking the element as its argument, and returning either:\n\n- the element, possibly altered\n- `false`, if no further operations should be performed on this element\n\nA few examples:\n\n```php\n\u003c?php\n//[...]\n$builder-\u003eaddStep(function ($item) {\n    // Titles are changed to all caps before export\n    $item['title'] = strtoupper($item['title']);\n\n    return $item;\n});\n\n// asynchronous step with 2 scale factor\n$builder-\u003eaddStep(function ($item): \\Generator {\n    yield new \\Amp\\Delayed(1000); // asynchronous processing for 1 second long\n\n    // Titles are changed to all caps before export\n    $item['title'] = strtolower($item['title']);\n\n    return $item;\n}, 2);\n\n$builder-\u003eaddStep(function ($item) {\n    // Private items are not exported\n    if ($item['private']) {\n        return false;\n    }\n\n    return $item;\n});\n//[...]\n```\n\nNote : you can ensure writing order for asynchronous operations if all steps are scaled at 1 factor.\n\n### Writers\n\n*Writers* perform the actual import / export operations.\n\nA *Writer* must implement `CodeRhapsodie\\DataflowBundle\\DataflowType\\Writer\\WriterInterface`.\nAs this interface is not compatible with `Port\\Writer`, the\nadapter `CodeRhapsodie\\DataflowBundle\\DataflowType\\Writer\\PortWriterAdapter` is provided.\n\nThis example show how to use the predefined PhpPort Writer :\n\n```php\n$builder-\u003eaddWriter(new PortWriterAdapter(new \\Port\\FileWriter()));\n```\n\nOr your own Writer:\n\n```php\n\u003c?php\nnamespace CodeRhapsodie\\DataflowExemple\\Writer;\n\nuse CodeRhapsodie\\DataFlowBundle\\DataflowType\\Writer\\WriterInterface;\n\nclass FileWriter implements WriterInterface\n{\n    private $fh;\n\n    /** @var string */\n    private $path;\n\n    public function setDestinationFilePath(string $path) {\n        $this-\u003epath = $path;\n    }\n\n    public function prepare()\n    {\n        if (null === $this-\u003epath) {\n            throw new \\Exception('Define the destination file name before use');\n        }\n        if (!$this-\u003efh = fopen($this-\u003epath, 'w')) {\n            throw new \\Exception('Unable to open in write mode the output file.');\n        }\n    }\n\n    public function write($item)\n    {\n        fputcsv($this-\u003efh, $item);\n    }\n\n    public function finish()\n    {\n        fclose($this-\u003efh);\n    }\n}\n```\n\n#### CollectionWriter\n\nIf you want to write multiple items from a single item read, you can use the generic `CollectionWriter`. This writer\nwill iterate over any `iterable` it receives, and pass each item from that collection to your own writer that handles\nsingle items.\n\n```php\n$builder-\u003eaddWriter(new CollectionWriter($mySingleItemWriter));\n```\n\n#### DelegatorWriter\n\nIf you want to call different writers depending on what item is read, you can use the generic `DelegatorWriter`.\n\nAs an example, let's suppose our items are arrays with the first entry being either `product` or `order`. We want to use\na different writer based on that value.\n\nFirst, create your writers implementing `DelegateWriterInterface` (this interface extends `WriterInterface` so your\nwriters can still be used without the `DelegatorWriter`).\n\n```php\n\u003c?php\nnamespace CodeRhapsodie\\DataflowExemple\\Writer;\n\nuse CodeRhapsodie\\DataFlowBundle\\DataflowType\\Writer\\WriterInterface;\n\nclass ProductWriter implements DelegateWriterInterface\n{\n    public function supports($item): bool\n    {\n        return 'product' === reset($item);\n    }\n\n    public function prepare()\n    {\n    }\n\n    public function write($item)\n    {\n        // Process your product\n    }\n\n    public function finish()\n    {\n    }\n}\n```\n\n```php\n\u003c?php\nnamespace CodeRhapsodie\\DataflowExemple\\Writer;\n\nuse CodeRhapsodie\\DataFlowBundle\\DataflowType\\Writer\\WriterInterface;\n\nclass OrderWriter implements DelegateWriterInterface\n{\n    public function supports($item): bool\n    {\n        return 'order' === reset($item);\n    }\n\n    public function prepare()\n    {\n    }\n\n    public function write($item)\n    {\n        // Process your order\n    }\n\n    public function finish()\n    {\n    }\n}\n```\n\nThen, configure your `DelegatorWriter` and add it to your dataflow type.\n\n```php\n    protected function buildDataflow(DataflowBuilder $builder, array $options): void\n    {\n        // Snip add reader and steps\n\n        $delegatorWriter = new DelegatorWriter();\n        $delegatorWriter-\u003eaddDelegate(new ProductWriter());\n        $delegatorWriter-\u003eaddDelegate(new OrderWriter());\n\n        $builder-\u003eaddWriter($delegatorWriter);\n    }\n```\n\nDuring execution, the `DelegatorWriter` will simply pass each item received to its first delegate (in the order those\nwere added) that supports it. If no delegate supports an item, an exception will be thrown.\n\n## Queue\n\nAll pending dataflow job processes are stored in a queue into the database.\n\nAdd this command into your crontab for execute all queued jobs:\n\n```shell script\n$ SYMFONY_ENV=prod php bin/console code-rhapsodie:dataflow:run-pending\n```\n\n## Commands\n\nSeveral commands are provided to manage schedules and run jobs.\n\n`code-rhapsodie:dataflow:run-pending` Executes job in the queue according to their schedule.\n\nWhen messenger mode is enabled, jobs will still be created according to their schedule, but execution will be handled by\nthe messenger component instead.\n\n`code-rhapsodie:dataflow:schedule:list` Display the list of dataflows scheduled.\n\n`code-rhapsodie:dataflow:schedule:change-status` Enable or disable a scheduled dataflow\n\n`code-rhapsodie:dataflow:schedule:add` Add the schedule for a dataflow.\n\n`code-rhapsodie:dataflow:job:show` Display the last result of a job.\n\n`code-rhapsodie:dataflow:execute` Let you execute one dataflow job.\n\n`code-rhapsodie:dataflow:dump-schema` Generates schema create / update SQL queries\n\n### Work with many databases\n\nAll commands have a `--connection` option to define what Doctrine DBAL connection to use during execution.\n\nExample:\n\nThis command uses the `default` DBAL connection to generate all schema update queries.\n\n```shell script\n$ bin/console code-rhapsodie:dataflow:dump-schema --update --connection=default\n```\n\nTo execute all pending job for a specific connection use:\n\n```shell script\n# Run for dataflow DBAL connection\n$ bin/console code-rhapsodie:dataflow:run-pending --connection=dataflow\n# Run for default DBAL connection\n$ bin/console code-rhapsodie:dataflow:run-pending --connection=default\n```\n\n# Issues and feature requests\n\nPlease report issues and request features at https://github.com/code-rhapsodie/dataflow-bundle/issues.\n\nPlease note that only the last release of the 4.x and the 5.x versions of this bundle are actively supported.\n\n# Contributing\n\nContributions are very welcome. Please see [CONTRIBUTING.md](CONTRIBUTING.md) for\ndetails. Thanks to [everyone who has contributed](https://github.com/code-rhapsodie/dataflow-bundle/graphs/contributors)\nalready.\n\n# License\n\nThis package is licensed under the [MIT license](LICENSE).\n","funding_links":[],"categories":["PHP"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcode-rhapsodie%2Fdataflow-bundle","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcode-rhapsodie%2Fdataflow-bundle","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcode-rhapsodie%2Fdataflow-bundle/lists"}