{"id":13459562,"url":"https://github.com/WaylonWalker/find-kedro","last_synced_at":"2025-03-24T18:30:40.436Z","repository":{"id":57429606,"uuid":"254984683","full_name":"WaylonWalker/find-kedro","owner":"WaylonWalker","description":"kedro plugin to automatically construct pipelines using pytest style pattern matching","archived":false,"fork":false,"pushed_at":"2023-05-17T13:11:07.000Z","size":623,"stargazers_count":21,"open_issues_count":6,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-11T01:37:51.590Z","etag":null,"topics":["kedro","kedro-hook","kedro-plugin","pipelines","python"],"latest_commit_sha":null,"homepage":"https://find.kedro.dev","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/WaylonWalker.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"contributing.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null},"funding":{"github":"WaylonWalker","patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":null}},"created_at":"2020-04-12T01:07:28.000Z","updated_at":"2024-04-12T11:41:14.000Z","dependencies_parsed_at":"2024-01-15T14:49:07.260Z","dependency_job_id":"73c39427-acf4-4961-af29-f7c2723fc74b","html_url":"https://github.com/WaylonWalker/find-kedro","commit_stats":{"total_commits":56,"total_committers":4,"mean_commits":14.0,"dds":0.5,"last_synced_commit":"c985b173795afed551d3b2db057b8d6eac7b6067"},"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WaylonWalker%2Ffind-kedro","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WaylonWalker%2Ffind-kedro/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WaylonWalker%2Ffind-kedro/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WaylonWalker%2Ffind-kedro/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/WaylonWalker","download_url":"https://codeload.github.com/WaylonWalker/find-kedro/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245328077,"owners_count":20597353,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["kedro","kedro-hook","kedro-plugin","pipelines","python"],"created_at":"2024-07-31T10:00:19.664Z","updated_at":"2025-03-24T18:30:40.094Z","avatar_url":"https://github.com/WaylonWalker.png","language":"Python","funding_links":["https://github.com/sponsors/WaylonWalker"],"categories":["[Kedro plugins](https://docs.kedro.org/en/stable/extend_kedro/plugins.html)","[Plugins](https://docs.kedro.org/en/stable/extend_kedro/plugins.html)"],"sub_categories":[],"readme":"# ![Find Kedro Title](./art/find-kedro.png)\n\n`find-kedro` is a small library to enhance your kedro experience.  It looks through your modules to find kedro pipelines, nodes, and iterables (lists, sets, tuples) of nodes.  It then assembles them into a dictionary of pipelines, each module will create a separate pipeline, and `__default__` being a combination of all pipelines.  This format is compatible with the kedro `_create_pipelines` format.\n\n\n![Python package](https://github.com/WaylonWalker/find-kedro/workflows/Python%20package/badge.svg)\n\n![Test](https://github.com/WaylonWalker/find-kedro/workflows/Test/badge.svg)\n\n[![Build-Docs](https://github.com/WaylonWalker/find-kedro/workflows/Build-Docs/badge.svg?branch=master)](https://find-kedro.waylonwalker.com)\n\n\n## ![Motivation](./art/headers/1.png)\n\n`kedro` is a ✨ fantastic project that allows for super-fast prototyping of data pipelines, while yielding production-ready pipelines. `find-kedro` enhances this experience by adding a pytest like node/pipeline discovery eliminating the need to bubble up pipelines through modules.\n\nWhen working on larger pipeline projects, it is advisable to break your project down into different sub-modules which requires knowledge of building python libraries, and knowing how to import each module correctly.  While this is not too difficult, in some cases, it can trip up even the most senior engineers, losing precious feature development time to debugging a library.\n\n## ![Installation](./art/headers/2.png)\n\n`find-kedro` is deployed to pypi and can easily be `pip` installed.\n\n``` console\npip install find-kedro\n```\n\n## ![Python Usage](./art/headers/3.png)\n\nThe recommended usage of `find-kedro` is to implement it directly into your projects `run.py` module\n\n## \u003e 0.17.x +\n\nAfter `0.17.x` `find-kedro` can be added to the ProjectsHooks as the return statement of `register_pipelines` in `hooks.py`.\n\n``` python\nclass ProjectHooks:\n    @hook_impl\n    def register_pipelines(self) -\u003e Dict[str, Pipeline]:\n        return find_kedro(\n            file_patterns=[\"*.py\"],\n            directory=Path(__file__).parent / \"pipelines\",\n        )\n```\n\n### \u003c 0.17.x\n\nBefore `0.17.x` `find-kedro` can be added to the `ProjectContext` in `run.py`.\n\n``` python\nfrom kedro.context import KedroContext\nfrom find_kedro import find_kedro\n\nclass ProjectContext(KedroContext):\n    def _get_pipelines(self) -\u003e Pipeline:\n        return find_kedro()\n```\n\n### Creating nodes\n\n`find-kedro` will not execute any functions.  It will simply look for variables that match the `pattern` and identify if they are a `kedro.pipeline.Pipeline`, `kedro.pipeline.nodes.Node`, or a list of `kedro.pipeline.nodes.Node`'s.  If so, it will collect them into the dictionary of pipelines.\n\nThere are typically **three** ways that pipelines are constructed with `find-kedro`; **lists**, **single-nodes**, and **pipelines**.\n\n#### Lists\n\nAny pattern matched list will be flattened and collected into the pipeline.  Nodes can be created all at once in the list definition.\n\n``` python\n# my-proj/pipelinies/data_engineering/pipeline\nfrom kedro.pipeline import node\nfrom .nodes import split_data\n\npipeline = [\n    node(\n        split_data,\n        [\"example_iris_data\", \"params:example_test_data_ratio\"],\n        dict(\n            train_x=\"example_train_x\",\n            train_y=\"example_train_y\",\n            test_x=\"example_test_x\",\n            test_y=\"example_test_y\",\n        ),\n    )\n]\n```\n\nIt is also convenient many times to keep the node definition close to the function definition.  Many times I define the list at the top of the file, then append to it as I go.\n\n``` python\n# my-proj/pipelinies/data_engineering/pipeline\nfrom kedro.pipeline import node\nfrom .nodes import split_data\n\nnodes = []\nnodes.append(\n    node(\n        split_data,\n        [\"example_iris_data\", \"params:example_test_data_ratio\"],\n        dict(\n            train_x=\"example_train_x\",\n            train_y=\"example_train_y\",\n            test_x=\"example_test_x\",\n            test_y=\"example_test_y\",\n        ),\n    )\n)\n```\n\n#### Nodes\n\nAll pattern matched `kedro.pipeline.node.Node` objects will get collected into the pipeline.\n\n``` python\n# my-proj/pipelinies/data_engineering/pipeline\nfrom kedro.pipeline import node\nfrom .nodes import split_data\n\nsplit_node = node(\n        split_data,\n        [\"example_iris_data\", \"params:example_test_data_ratio\"],\n        dict(\n            train_x=\"example_train_x\",\n            train_y=\"example_train_y\",\n            test_x=\"example_test_x\",\n            test_y=\"example_test_y\",\n        ),\n    )\n```\n\n#### Pipeline\n\nAll pattern matched `kedro.pipeline.Pipeline` objects will get collected into the pipeline.\n\n``` python\n# my-project/pipelinies/data_engineering/pipeline\nfrom kedro.pipeline import node, Pipeline\nfrom .nodes import split_data\n\nsplit_node = Pipeline(\n    [\n        node(\n            split_data,\n            [\"example_iris_data\", \"params:example_test_data_ratio\"],\n            dict(\n                train_x=\"example_train_x\",\n                train_y=\"example_train_y\",\n                test_x=\"example_test_x\",\n                test_y=\"example_test_y\",\n            ),\n        )\n    ]\n)\n```\n\n### `create_pipeline`\n\n`find-kedro` now looks for `create_piepeline` functions, then adds those to your pipelines.\n\n``` python\n# my-project/pipelinies/data_engineering/pipeline\nfrom kedro.pipeline import node, Pipeline\nfrom .nodes import split_data\n\ndef create_pipelines():\n    return Pipeline(\n    [\n        node(\n            split_data,\n            [\"example_iris_data\", \"params:example_test_data_ratio\"],\n            dict(\n                train_x=\"example_train_x\",\n                train_y=\"example_train_y\",\n                test_x=\"example_test_x\",\n                test_y=\"example_test_y\",\n            ),\n        )\n    ]\n)\n```\n\n### Fully Qualified imports\n\nWhen using fully qualified imports `from my_proj.pipelines.data_science.nodes import split_data` instead of \nrelative imports `from .nodes split_data` you will need to make sure that your project is installed, in your current path, or you set the directory\n\n### ![CLI Usage](./art/headers/4.png)\n\nThe CLI provides a handy interface to search your project for nodes\n\n```\nUsage: find-kedro [OPTIONS]\n\nOptions:\n  --file-patterns TEXT       glob-style file patterns for Python node module\n                             discovery\n\n  --patterns TEXT            prefixes or glob names for Python pipeline, node,\n                             or list object discovery\n\n  -d, --directory DIRECTORY  Path to save the static site to\n  --version                  Prints version and exits\n  -v, --verbose              Prints extra information for debugging\n  --help                     Show this message and exit.\n```\n\nExample ran with a slightly modified default `kedro new` project.\n\n``` bash\n❯ find-kedro\n{\n  \"__default__\": [\n    \"split_data([example_iris_data,params:example_test_data_ratio]) -\u003e [example_test_x,example_test_y,example_train_x,example_train_y]\",\n    \"train_model([example_train_x,example_train_y,parameters]) -\u003e [example_model]\",\n    \"predict([example_model,example_test_x]) -\u003e [example_predictions]\",\n    \"report_accuracy([example_predictions,example_test_y]) -\u003e None\"\n  ],\n  \"src.default_kedro_159.pipelines.data_engineering.pipeline\": [\n    \"split_data([example_iris_data,params:example_test_data_ratio]) -\u003e [example_test_x,example_test_y,example_train_x,example_train_y]\"\n  ],\n  \"src.default_kedro_159.pipelines.data_science.pipeline\": [\n    \"train_model([example_train_x,example_train_y,parameters]) -\u003e [example_model]\",\n    \"predict([example_model,example_test_x]) -\u003e [example_predictions]\",\n    \"report_accuracy([example_predictions,example_test_y]) -\u003e None\"\n  ]\n}\n```\n## ![Contributing](./art/headers/5.png)\n\n**You're Awesome** for considering a contribution!  Contributions are welcome, please check out the [Contributing Guide](./contributing.md) for more information.  Please be a positive member of the community and embrace feedback\n\n## ![Versioning](./art/headers/6.png)\n\nWe use [SemVer](https://semver.org/) for versioning. For the versions available, see the [tags on this repository](./tags).\n\n\n## ![Authors](./art/headers/7.png)\n\n[![Waylon Walker](https://avatars1.githubusercontent.com/u/22648375?s=64\u0026v=4)](https://github.com/WaylonWalker) - Waylon Walker - _Original Author_\n\n[![Zain Patel](https://avatars3.githubusercontent.com/u/30357972?s=64\u0026v=4)](https://github.com/mzjp2) - Zain Patel\n\n[![Data Engineer One](https://avatars1.githubusercontent.com/u/64087279?s=64\u0026v=4)](https://github.com/dataengineerone) - Data Engineer One\n\n\n## ![License](./art/headers/8.png)\n\nThis project is licensed under the MIT License - see the LICENSE.md file for details\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FWaylonWalker%2Ffind-kedro","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FWaylonWalker%2Ffind-kedro","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FWaylonWalker%2Ffind-kedro/lists"}