{"id":13459566,"url":"https://github.com/WaylonWalker/kedro-auto-catalog","last_synced_at":"2025-03-24T18:30:39.219Z","repository":{"id":65911259,"uuid":"602246915","full_name":"WaylonWalker/kedro-auto-catalog","owner":"WaylonWalker","description":"Kedro catalog create with default configuration","archived":false,"fork":false,"pushed_at":"2023-07-14T19:34:25.000Z","size":36,"stargazers_count":6,"open_issues_count":1,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-14T07:11:26.673Z","etag":null,"topics":["data","data-science","kedro","kedro-catalog","kedro-hook","kedro-plugin"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/WaylonWalker.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-02-15T20:02:13.000Z","updated_at":"2023-11-28T20:22:03.000Z","dependencies_parsed_at":null,"dependency_job_id":"f9814bb2-dacb-4807-96ff-fca4ff6d7875","html_url":"https://github.com/WaylonWalker/kedro-auto-catalog","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WaylonWalker%2Fkedro-auto-catalog","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WaylonWalker%2Fkedro-auto-catalog/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WaylonWalker%2Fkedro-auto-catalog/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WaylonWalker%2Fkedro-auto-catalog/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/WaylonWalker","download_url":"https://codeload.github.com/WaylonWalker/kedro-auto-catalog/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245328073,"owners_count":20597352,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","data-science","kedro","kedro-catalog","kedro-hook","kedro-plugin"],"created_at":"2024-07-31T10:00:19.743Z","updated_at":"2025-03-24T18:30:38.861Z","avatar_url":"https://github.com/WaylonWalker.png","language":"Python","funding_links":[],"categories":["[Kedro plugins](https://docs.kedro.org/en/stable/extend_kedro/plugins.html)"],"sub_categories":[],"readme":"# Kedro Auto Catalog\n\n\u003cimg src=\"https://user-images.githubusercontent.com/22648375/219141193-22fdf6c4-a633-4f64-b7ee-01474a0f7dfb.png\" width=\"250\" align=right\u003e\n\nA configurable version of the built in `kedro catalog create` cli. Default\ntypes can be configured in the projects settings.py, to get these types rather\nthan `MemoryDataSets`.\n\n[![PyPI - Version](https://img.shields.io/pypi/v/kedro-auto-catalog.svg)](https://pypi.org/project/kedro-auto-catalog)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/kedro-auto-catalog.svg)](https://pypi.org/project/kedro-auto-catalog)\n\n---\n\n**Table of Contents**\n\n- [Installation](#installation)\n- [License](#license)\n\n## Installation\n\n```console\npip install kedro-auto-catalog\n```\n\n## Configuration\n\nConfigure the project defaults in `src/\u003cproject_name\u003e/settings.py` with this\ndict.\n\n```python\nAUTO_CATALOG = {\n    \"directory\": \"data\",\n    \"subdirs\": [\"raw\", \"intermediate\", \"primary\"],\n    \"layers\": [\"raw\", \"intermediate\", \"primary\"],\n    \"default_extension\": \"parquet\",\n    \"default_type\": \"pandas.ParquetDataSet\",\n}\n```\n\n## Usage\n\nTo auto create catalog entries for the `__default__` pipeline, run this from the command line.\n\n```bash\nkedro auto-catalog -p __default__\n```\n\nIf you want a reminder of what to do, use the `--help`.\n\n```bash\n❯ kedro auto-catalog --help❯\nUsage: kedro auto-catalog [OPTIONS]\n\n  Create Data Catalog YAML configuration with missing datasets.\n\n  Add configurable datasets to Data Catalog YAML configuration file for each\n  dataset in a registered pipeline if it is missing from the `DataCatalog`.\n\n  The catalog configuration will be saved to\n  `\u003cconf_source\u003e/\u003cenv\u003e/catalog/\u003cpipeline_name\u003e.yml` file.\n\n  Configure the project defaults in `src/\u003cproject_name\u003e/settings.py` with this\n  dict.\n\nOptions:\n  -e, --env TEXT       Environment to create Data Catalog YAML file in.\n                       Defaults to `base`.\n  -p, --pipeline TEXT  Name of a pipeline.  [required]\n  -h, --help           Show this message and exit.\n```\n\n## Example\n\nUsing the\n[kedro-spaceflights](https://github.com/quantumblacklabs/kedro-starter-spaceflights)\nexample, running `kedro auto-catalog -p __default__` yields the following\ncatalog in `conf/base/catalog/__default__.yml`\n\n```yaml\nX_test:\n  filepath: data/X_test.pq\n  type: pandas.ParquetDataSet\nX_train:\n  filepath: data/X_train.pq\n  type: pandas.ParquetDataSet\ny_test:\n  filepath: data/y_test.parquet\n  type: pandas.ParquetDataSet\ny_train:\n  filepath: data/y_train.parquet\n  type: pandas.ParquetDataSet\n```\n\n## subdirs and layers\n\nIf we use the example configuration with `\"subdirs\": [\"raw\", \"intermediate\",\n\"primary\"]` and `\"layers\": [\"raw\", \"intermediate\", \"primary\"]`, it will convert\nany leading subdir/layer in your dataset name into a directory. If we change y_test\nto `raw_y_test`, it will put `y_test.parquet` in the `raw` directory, and in the raw layer.\n\n```yml\nraw_y_test:\n  filepath: data/raw/y_test.parquet\n  layer: raw\n  type: pandas.ParquetDataSet\n```\n\n## License\n\n`kedro-auto-catalog` is distributed under the terms of the [MIT](https://spdx.org/licenses/MIT.html) license.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FWaylonWalker%2Fkedro-auto-catalog","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FWaylonWalker%2Fkedro-auto-catalog","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FWaylonWalker%2Fkedro-auto-catalog/lists"}