{"id":21514856,"url":"https://github.com/getindata/data-pipelines-cli","last_synced_at":"2025-04-09T20:11:49.792Z","repository":{"id":37098508,"uuid":"429075613","full_name":"getindata/data-pipelines-cli","owner":"getindata","description":"CLI for data platform","archived":false,"fork":false,"pushed_at":"2023-12-08T18:25:16.000Z","size":2210,"stargazers_count":19,"open_issues_count":3,"forks_count":3,"subscribers_count":8,"default_branch":"develop","last_synced_at":"2025-03-23T22:07:03.644Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/getindata.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2021-11-17T14:21:25.000Z","updated_at":"2024-06-28T08:25:26.000Z","dependencies_parsed_at":"2023-10-20T16:48:19.421Z","dependency_job_id":null,"html_url":"https://github.com/getindata/data-pipelines-cli","commit_stats":{"total_commits":119,"total_committers":12,"mean_commits":9.916666666666666,"dds":0.6890756302521008,"last_synced_commit":"dbc334997f87280998838a6ac13c88c3c1fbb632"},"previous_names":[],"tags_count":40,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getindata%2Fdata-pipelines-cli","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getindata%2Fdata-pipelines-cli/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getindata%2Fdata-pipelines-cli/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getindata%2Fdata-pipelines-cli/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/getindata","download_url":"https://codeload.github.com/getindata/data-pipelines-cli/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247615482,"owners_count":20967182,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-23T23:53:15.285Z","updated_at":"2025-04-09T20:11:49.731Z","avatar_url":"https://github.com/getindata.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# data-pipelines-cli\n\n[![Python Version](https://img.shields.io/badge/python-3.9%20%7C%203.10-blue.svg)](https://github.com/getindata/data-pipelines-cli)\n[![PyPI Version](https://badge.fury.io/py/data-pipelines-cli.svg)](https://pypi.org/project/data-pipelines-cli/)\n[![Downloads](https://pepy.tech/badge/data-pipelines-cli)](https://pepy.tech/project/data-pipelines-cli)\n[![Maintainability](https://api.codeclimate.com/v1/badges/e44ed9383a42b59984f6/maintainability)](https://codeclimate.com/github/getindata/data-pipelines-cli/maintainability)\n[![Test Coverage](https://api.codeclimate.com/v1/badges/e44ed9383a42b59984f6/test_coverage)](https://codeclimate.com/github/getindata/data-pipelines-cli/test_coverage)\n[![Documentation Status](https://readthedocs.org/projects/data-pipelines-cli/badge/?version=latest)](https://data-pipelines-cli.readthedocs.io/en/latest/?badge=latest)\n\nCLI for data platform\n\n## Documentation\n\nRead the full documentation at [https://data-pipelines-cli.readthedocs.io/](https://data-pipelines-cli.readthedocs.io/en/latest/index.html)\n\n## Installation\nUse the package manager [pip](https://pip.pypa.io/en/stable/) to install [dp (data-pipelines-cli)](https://pypi.org/project/data-pipelines-cli/):\n\n```bash\npip install data-pipelines-cli[bigquery,docker,datahub,gcs]\n```\n\n## Usage\nFirst, create a repository with a global configuration file that you or your organization will be using. The repository\nshould contain `dp.yml.tmpl` file looking similar to this:\n```yaml\n_templates_suffix: \".tmpl\"\n_envops:\n    autoescape: false\n    block_end_string: \"%]\"\n    block_start_string: \"[%\"\n    comment_end_string: \"#]\"\n    comment_start_string: \"[#\"\n    keep_trailing_newline: true\n    variable_end_string: \"]]\"\n    variable_start_string: \"[[\"\n\ntemplates:\n  my-first-template:\n    template_name: my-first-template\n    template_path: https://github.com/\u003cYOUR_USERNAME\u003e/\u003cYOUR_TEMPLATE\u003e.git\n\nvars:\n  username: [[ YOUR_USERNAME ]]\n```\nThanks to the [copier](https://copier.readthedocs.io/en/stable/), you can leverage tmpl template syntax to create\neasily modifiable configuration templates. Just create a `copier.yml` file next to the `dp.yml.tmpl` one and configure\nthe template questions (read more at [copier documentation](https://copier.readthedocs.io/en/stable/configuring/)).\n\nThen, run `dp init \u003cCONFIG_REPOSITORY_URL\u003e` to initialize **dp**. You can also drop `\u003cCONFIG_REPOSITORY_URL\u003e` argument,\n**dp** will get initialized with an empty config.\n\n### Project creation\n\nYou can use `dp create \u003cNEW_PROJECT_PATH\u003e` to choose one of the templates added before and create the project in the\n`\u003cNEW_PROJECT_PATH\u003e` directory. You can also use `dp create \u003cNEW_PROJECT_PATH\u003e \u003cLINK_TO_TEMPLATE_REPOSITORY\u003e` to point\ndirectly to a template repository. If `\u003cLINK_TO_TEMPLATE_REPOSITORY\u003e` proves to be the name of the template defined in\n**dp**'s config file, `dp create` will choose the template by the name instead of trying to download the repository.\n\n`dp template-list` lists all added templates.\n\n### Project update\n\nTo update your pipeline project use `dp update \u003cPIPELINE_PROJECT-PATH\u003e`. It will sync your existing project with updated\ntemplate version selected by `--vcs-ref` option (default `HEAD`).\n\n### Project deployment\n\n`dp deploy` will sync with your bucket provider. The provider will be chosen automatically based on the remote URL.\nUsually, it is worth pointing `dp deploy` to JSON or YAML file with provider-specific data like access tokens or project\nnames. E.g., to connect with Google Cloud Storage, one should run:\n```bash\necho '{\"token\": \"\u003cPATH_TO_YOUR_TOKEN\u003e\", \"project_name\": \"\u003cYOUR_PROJECT_NAME\u003e\"}' \u003e gs_args.json\ndp deploy --dags-path \"gs://\u003cYOUR_GS_PATH\u003e\" --blob-args gs_args.json\n```\nHowever, in some cases you do not need to do so, e.g. when using `gcloud` with properly set local credentials. In such\ncase, you can try to run just the `dp deploy --dags-path \"gs://\u003cYOUR_GS_PATH\u003e\"` command. Please refer to\n[documentation](https://data-pipelines-cli.readthedocs.io/en/latest/usage.html#project-deployment) for more information.\n\nWhen finished, call `dp clean` to remove compilation related directories.\n\n### Variables\nYou can put a dictionary of variables to be passed to `dbt` in your `config/\u003cENV\u003e/dbt.yml` file, following the convention\npresented in [the guide at the dbt site](https://docs.getdbt.com/docs/building-a-dbt-project/building-models/using-variables#defining-variables-in-dbt_projectyml).\nE.g., if one of the fields of `config/\u003cSNOWFLAKE_ENV\u003e/snowflake.yml` looks like this:\n```yaml\nschema: \"{{ var('snowflake_schema') }}\"\n```\nyou should put the following in your `config/\u003cSNOWFLAKE_ENV\u003e/dbt.yml` file:\n```yaml\nvars:\n  snowflake_schema: EXAMPLE_SCHEMA\n```\nand then run your `dp run --env \u003cSNOWFLAKE_ENV\u003e` (or any similar command).\n\n## Contributing\nPull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.\n\nPlease make sure to update tests as appropriate.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgetindata%2Fdata-pipelines-cli","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgetindata%2Fdata-pipelines-cli","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgetindata%2Fdata-pipelines-cli/lists"}