{"id":24024745,"url":"https://github.com/cognitedata/inso-extpipes-cli","last_synced_at":"2025-04-16T04:22:44.888Z","repository":{"id":41312438,"uuid":"454946314","full_name":"cognitedata/inso-extpipes-cli","owner":"cognitedata","description":"CLI and GitHub-Action to configure and maintain CDF Projects (Extraction-Pipelines). See README for instructions.","archived":false,"fork":false,"pushed_at":"2025-02-26T08:52:52.000Z","size":451,"stargazers_count":1,"open_issues_count":8,"forks_count":2,"subscribers_count":48,"default_branch":"main","last_synced_at":"2025-03-29T04:51:14.050Z","etag":null,"topics":["cdf","cli","github-actions","governance","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cognitedata.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-02-02T21:47:26.000Z","updated_at":"2025-02-04T09:57:32.000Z","dependencies_parsed_at":"2022-08-19T02:51:35.738Z","dependency_job_id":"b117e671-a57d-4f95-98b8-49d8d9b183bd","html_url":"https://github.com/cognitedata/inso-extpipes-cli","commit_stats":{"total_commits":24,"total_committers":3,"mean_commits":8.0,"dds":"0.41666666666666663","last_synced_commit":"9151c91cdd95ca94305209aaed9fdb8a3490141a"},"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cognitedata%2Finso-extpipes-cli","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cognitedata%2Finso-extpipes-cli/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cognitedata%2Finso-extpipes-cli/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cognitedata%2Finso-extpipes-cli/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cognitedata","download_url":"https://codeload.github.com/cognitedata/inso-extpipes-cli/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249194680,"owners_count":21228034,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cdf","cli","github-actions","governance","python"],"created_at":"2025-01-08T15:34:34.028Z","updated_at":"2025-04-16T04:22:44.865Z","avatar_url":"https://github.com/cognitedata.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"inso-extpipes-cli\n===\n# Table Of Contents\n- [inso-extpipes-cli](#inso-extpipes-cli)\n- [Table Of Contents](#table-of-contents)\n- [scope of work](#scope-of-work)\n  - [to be done](#to-be-done)\n- [how to run](#how-to-run)\n- [Extpipes CLI commands](#extpipes-cli-commands)\n  - [`Deploy` command](#deploy-command)\n  - [Configuration](#configuration)\n    - [Configuration for all commands](#configuration-for-all-commands)\n      - [Environment variables](#environment-variables)\n    - [Configuration for `deploy` command](#configuration-for-deploy-command)\n  - [run local with poetry](#run-local-with-poetry)\n  - [run local with Python and Poetry](#run-local-with-python-and-poetry)\n  - [Run locally with Docker](#run-locally-with-docker)\n    - [production build](#production-build)\n    - [development build](#development-build)\n  - [run as github action](#run-as-github-action)\n  - [Contribute](#contribute)\n  - [Versioning](#versioning)\n# scope of work\n\n- It provides a configuration driven deployment for Cognite Extraction Pipelines (named `extpipes` in short)\n- Support to run it\n  - from `poetry run`\n  - from `python -m`\n  - from `docker run`\n  - and as GitHub Action\n\n- templates used for implementation are\n  - `cognitedata/transformation-cli`\n  - `cognitedata/python-extratcion-utils`\n    - using `CogniteConfig` and `LoggingConfig`\n    - and extended with custom config sections\n  - the configuration structure and example expects a CDF Project configured with `cognitedata/inso-cdf-project-cli`\n\n## to be done\n\n- [x] `.dockerignore` (pycache)\n- [x] logs folder handling (docker volume mount)\n- [x] logger.info() or print() or click.echo(click.style(..))\n    - logger debug support\n- [ ] compile as EXE (when Python is not available on customer server)\n  - code-signed exe required for Windows\n\n# how to run\nFollow the initial setup first\n\n1. Fill out relevant configurations from `configs`\n   - Fill out/change `extpipes` from `example-config-extpipesv2.yml`\n2. Change `.env_example` to `.env`\n3. Fill out `.env`\n\n# Extpipes CLI commands\n\n## `Deploy` command\n\nThe extpipes-cli `deploy` command applies the configuration file settings to your CDF project and creates the necessary CDF Extraction-Pipelines.\n\nBy default it is **automatically deleting** CDF Extraction-Pipelines which are not\ncovered by the given configuration. You can deactivate this with the\n- `--automatic-delete no` parameter\n- or the `automatic-delete: false` in configuration-file.\n\nThe command also is the configured to run used from a GitHub-Action workflow.\n\n```bash\n➟  extpipes-cli --help\nUsage: extpipes-cli [OPTIONS] COMMAND [ARGS]...\n\nOptions:\n  --version                Show the version and exit.\n  --cdf-project-name TEXT  CDF Project to interact with the CDF API, the\n                           'CDF_PROJECT',environment variable can be used\n                           instead. Required for OAuth2.\n  --cluster TEXT           The CDF cluster where CDF Project is hosted (e.g.\n                           api, europe-west1-1),Provide this or make sure to\n                           set the 'CLCDF_USTER' environment variable.\n                           Default: api\n  --host TEXT              The CDF host where CDF Project is hosted (e.g.\n                           https://api.cognitedata.com),Provide this or make\n                           sure to set the 'CDF_HOST' environment\n                           variable.Default: https://api.cognitedata.com/\n  --client-id TEXT         IdP client ID to interact with the CDF API. Provide\n                           this or make sure to set the 'CDF_CLIENT_ID'\n                           environment variable if you want to authenticate\n                           with OAuth2.\n  --client-secret TEXT     IdP client secret to interact with the CDF API.\n                           Provide this or make sure to set the\n                           'CDF_CLIENT_SECRET' environment variable if you\n                           want to authenticate with OAuth2.\n  --token-url TEXT         IdP token URL to interact with the CDF API. Provide\n                           this or make sure to set the 'CDF_TOKEN_URL'\n                           environment variable if you want to authenticate\n                           with OAuth2.\n  --scopes TEXT            IdP scopes to interact with the CDF API, relevant\n                           for OAuth2 authentication method. The 'CDF_SCOPES'\n                           environment variable can be used instead.\n  --audience TEXT          IdP Audience to interact with the CDF API, relevant\n                           for OAuth2 authentication method. The\n                           'CDF_AUDIENCE' environment variable can be used\n                           instead.\n  --dotenv-path TEXT       Provide a relative or absolute path to an .env file\n                           (for command line usage only)\n  --debug                  Print debug information\n  --dry-run                Log only planned CDF API actions while doing\n                           nothing. Defaults to False.\n  -h, --help               Show this message and exit.\n\nCommands:\n  deploy  Deploy a list of Extraction Pipelines from a configuration file\n```\n\n```bash\n➟  extpipes-cli deploy --help\nUsage: extpipes-cli deploy [OPTIONS] [CONFIG_FILE]\n\n  Deploy a list of Extraction Pipelines from a configuration file\n\nOptions:\n  --automatic-delete  Delete extpipes which are not specified in config-file\n  -h, --help          Show this message and exit.\n```\n\n## Configuration\n\nYou must pass a YAML configuration file as an argument when running the program.\n\n### Configuration for all commands\n\n_(January'23: only one command is supported right now, but the CLI solution can be extended in the future)_\n\nAll commands share a `cognite` and a `logger` section in the YAML manifest, which is common to our Cognite Database-Extractor configuration.\n\nThe configuration file supports variable-expansion (`${EXTPIPES_**}`), which are provided either\n1. As environment-variables,\n2. Through an `.env` file (Note: this doesn't overwrite existing environment variables.)\n3. As command-line parameters\n\nBelow is an example configuration:\n\n```yaml\n# follows the same parameter structure as the DB extractor configuration\ncognite:\n  host: ${CDF_HOST}\n  project: ${CDF_PROJECT}\n  #\n  # AAD IdP login credentials:\n  #\n  idp-authentication:\n    client-id: ${CDF_CLIENT_ID}\n    secret: ${CDF_CLIENT_SECRET}\n    scopes:\n      - ${CDF_SCOPES}\n    token_url: ${CDF_TOKEN_URL}\n\n# https://docs.python.org/3/library/logging.config.html#logging-config-dictschema\nlogging:\n  version: 1\n  formatters:\n    formatter:\n      # class: \"tools.formatter.StackdriverJsonFormatter\"\n      format: \"[%(asctime)s] [%(levelname)s] [%(name)s]: %(message)s\"\n  handlers:\n    file:\n      class: \"logging.FileHandler\"\n      filename: ./logs/deploy-trading.log\n      formatter: \"formatter\"\n      mode: \"w\"\n      level: \"DEBUG\"\n    console:\n      class: \"logging.StreamHandler\"\n      level: \"DEBUG\"\n      formatter: \"formatter\"\n      stream: \"ext://sys.stderr\"\n  root:\n    level: \"DEBUG\"\n    handlers: [ \"console\", \"file\" ]\n```\n\n#### Environment variables\n\nDetails about the environment variables:\n\n- `HOST`\n  - The URL to your CDF cluster.\n  - Example: `https://westeurope-1.cognitedata.com`\n- `PROJECT`\n  - The CDF project.\n- `CLIENT_ID`\n  - The client ID of the app registration you have created for the CLI.\n- `CLIENT_SECRET`\n  - The client secret you have created for the app registration,\n- `TOKEN_URL = https://login.microsoftonline.com/\u003ctenant id\u003e/oauth2/v2.0/token`\n  - If you're using Azure AD, replace `\u003ctenant id\u003e` with your Azure tenant ID.\n- `SCOPES`\n  - Usually: `https://\u003ccluster-name\u003e.cognitedata.com/.default`\n\n### Configuration for `deploy` command\n\nIn addition to the sections described above, the configuration file for `deploy` command requires more sections (some of them optional):\n\nConfiguration example:\n\n```yaml\nextpipes:\n  features:\n    # NOT USED: extpipe-pattern only documentation atm\n    extpipe-pattern: '{source}:{short-name}:{table-name}:{suffix}'\n\n    # The default and recommended value is: true\n    # to keep the deployment in sync with configuration\n    # which means non configured extpipes get automatically deleted\n    automatic-delete: true\n\n    # can contain multiple contacts, can be overwritten on pipeline level\n    default-contacts:\n      - name: Yours Truly\n        email: yours.truly@cognite.com\n        role: admin\n        send-notification: false\n\n  pipelines:\n      # required\n      # max 255 char, external-id provided by client\n    - external-id: src:001:sap:sap_funcloc:continuous\n      # optional: str, default to external-id\n      name: src:001:sap:sap_funcloc:continuous\n      # optional: str\n      description: describe or defaults to auto-generated description, that it is \"deployed through extpipes-cli@v3.0.0\"\n      # optional: str\n      data-set-external-id: src:001:sap\n      # optional: \"On trigger\", \"Continuous\" or cron expression\n      schedule: Continuous\n      # optional: [{},{}]\n      # defaults to features.default-contacts (if exist)\n      contacts:\n        - name: Fizz Buzz\n          email: fizzbuzz@cognite.com\n          role: admin\n          send-notification: true\n      # optional: str\n      source: az-func\n      # optional: {}\n      metadata:\n        version: extpipes-cli@v3.1.0\n      # optional: str max 10000 char\n      # Documentation text field, supports Markdown for text formatting.\n      documentation: Documentation which can include Mermaid diagrams?\n      # optional: str\n      # Usually user email is expected here, defaults to extpipes + version?\n      created-by: extpipes-cli@v3.1.0\n\n      # optional: [{},{}]\n      raw-tables:\n        - db-name: src:001:sap\n          table-name: sap_funcloc\n\n      # optional: {}\n      extpipe-config:\n        # str\n        config: |\n          nested yaml/json/ini which is simply a string for this config\n        # optional: str\n        description: describe the config, or autogenerate?\n```\n\n## run local with poetry\n\n```bash\npoetry build\npoetry install\npoetry update\n\npoetry run extpipes-cli deploy --debug configs/example-config-extpipes.yml\n```\n\n## run local with Python and Poetry\n\n```bash\npoetry shell\n# extpipes-cli is defined in pyproject.toml\nextpipes-cli deploy ./configs/example-config-extpipes.yml\n```\n\n## Run locally with Docker\n\n### production build\n- `.dockerignore` file\n- volumes for `configs` (to read) and `logs` folder (to write)\n\n```bash\ndocker build -t extpipes-cli:prod --target=production .\n\n# ${PWD} because only absolute paths can be mounted\n# poerty project is deplopyed to /opt/extpipes-cli/\ndocker run --env-file=.env --volume ${PWD}/configs:/configs --volume ${PWD}/logs:/opt/extpipes-cli/logs extpipes-cli:prod deploy /configs/config-deploy-example.yml\n```\n\n### development build\n\nDebugging the Docker container with all dev-dependencies and poetry installed\n\n- volumes for `configs` (to read) and `logs` folder (to write)\n- volumes for `src` (to read/write)\n\n```bash\n# using the 'development' target of the Dockerfile multi-stages\n➟  docker build -t extpipes-cli:dev --target=development .\n\n# start bash in container\n➟  docker run --env-file=.env --volume ${PWD}/configs:/configs --volume ${PWD}/logs:/logs --volume ${PWD}/src://opt/extpipes-cli/src -it --entrypoint /bin/bash extpipes-cli:dev\n\n# run project from inside container\n\u003e poetry shell\n\u003e extpipes-cli --help\n\u003e extpipes-cli --dry-run yes deploy /configs/config-deploy-example.yml\n# logs are available on your host in mounted '.logs/' folder\n# 'src/' changes are mounted to your host ./src folder\n```\n\n## run as github action\n\n```yaml\njobs:\n  deploy:\n    name: Deploy Extraction Pipelines\n    environment: dev\n    runs-on: ubuntu-latest\n    # environment variables\n    env:\n      PROJECT: yourcdfproject\n      CLUSTER: bluefield\n      IDP_TENANT: abcde-12345\n      HOST: https://bluefield.cognitedata.com/\n      - name: Deploy extpipes\n        # best practice is to use a tagged release (and not '@main')\n        # find a released tag here: https://github.com/cognitedata/inso-extpipes-cli/releases\n        uses: cognitedata/inso-expipes-cli@v2.2.1\n        env:\n            CLIENT_ID: ${{ secrets.CLIENT_ID }}\n            CLIENT_SECRET: ${{ secrets.CLIENT_SECRET }}\n            HOST: ${{ env.HOST }}\n            PROJECT: ${{ env.PROJECT }}\n            TOKEN_URL: https://login.microsoftonline.com/${{ env.IDP_TENANT }}/oauth2/v2.0/token\n            SCOPES: ${{ env.HOST }}.default\n        # additional parameters for running the action\n        with:\n          config_file: ./configs/example-config-extpipes.yml\n```\n\n## Contribute\n\n1. `poetry install`\n2. To run all checks locally - which is typically needed if the GitHub check is failing - e.g. you haven't set up `pre-commit` to run automatically:\n\n   - `poetry install \u0026\u0026 poetry shell`\n   - `pre-commit install`  # Only needed if not installed\n   - `pre-commit run --all-files`\n\n## Versioning\n\n- Remark: with new version change, manually changes required in\n  - the version on `pyproject.toml`\n  - the version in `src/extpipes/__init__` (used by `--version` parameter).\n  - the `action.yml` file\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcognitedata%2Finso-extpipes-cli","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcognitedata%2Finso-extpipes-cli","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcognitedata%2Finso-extpipes-cli/lists"}