{"id":37433864,"url":"https://github.com/damavis/airflow-pentaho-plugin","last_synced_at":"2026-01-16T06:38:20.771Z","repository":{"id":42986559,"uuid":"250022548","full_name":"damavis/airflow-pentaho-plugin","owner":"damavis","description":"Pentaho plugin for Apache Airflow - Orquestate pentaho transformations and jobs from Airflow","archived":false,"fork":false,"pushed_at":"2025-09-26T12:05:02.000Z","size":152,"stargazers_count":40,"open_issues_count":3,"forks_count":17,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-10-28T19:05:12.938Z","etag":null,"topics":["airflow","airflow-plugin","data-engineering","pentaho-data-integration"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/damavis.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2020-03-25T15:52:02.000Z","updated_at":"2025-09-26T12:05:06.000Z","dependencies_parsed_at":"2024-05-14T11:28:47.801Z","dependency_job_id":"93587493-7045-49de-ae3d-b7a3f4253abf","html_url":"https://github.com/damavis/airflow-pentaho-plugin","commit_stats":{"total_commits":99,"total_committers":5,"mean_commits":19.8,"dds":"0.12121212121212122","last_synced_commit":"59bc7997f967892737112236826e29cd8e797eef"},"previous_names":[],"tags_count":33,"template":false,"template_full_name":null,"purl":"pkg:github/damavis/airflow-pentaho-plugin","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/damavis%2Fairflow-pentaho-plugin","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/damavis%2Fairflow-pentaho-plugin/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/damavis%2Fairflow-pentaho-plugin/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/damavis%2Fairflow-pentaho-plugin/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/damavis","download_url":"https://codeload.github.com/damavis/airflow-pentaho-plugin/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/damavis%2Fairflow-pentaho-plugin/sbom","scorecard":{"id":317931,"data":{"date":"2025-08-11","repo":{"name":"github.com/damavis/airflow-pentaho-plugin","commit":"f8340c565d2729a4faf4882d2efc272a180d11e5"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":4.8,"checks":[{"name":"Code-Review","score":0,"reason":"Found 2/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Token-Permissions","score":10,"reason":"GitHub workflow tokens follow principle of least privilege","details":["Info: topLevel 'contents' permission set to 'read': .github/workflows/python-publish.yml:16","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Pinned-Dependencies","score":1,"reason":"dependency not pinned by hash detected -- score normalized to 1","details":["Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/python-publish.yml:24: update your workflow using https://app.stepsecurity.io/secureworkflow/damavis/airflow-pentaho-plugin/python-publish.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/python-publish.yml:26: update your workflow using https://app.stepsecurity.io/secureworkflow/damavis/airflow-pentaho-plugin/python-publish.yml/master?enable=pin","Warn: pipCommand not pinned by hash: .github/workflows/python-publish.yml:31","Warn: pipCommand not pinned by hash: .github/workflows/python-publish.yml:32","Warn: pipCommand not pinned by hash: .github/workflows/python-publish.yml:33","Info:   0 out of   2 GitHub-owned GitHubAction dependencies pinned","Info:   1 out of   1 third-party GitHubAction dependencies pinned","Info:   0 out of   3 pipCommand dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE.txt:0","Info: FSF or OSI recognized license: Apache License 2.0: LICENSE.txt:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Packaging","score":10,"reason":"packaging workflow detected","details":["Info: Project packages its releases by way of GitHub Actions.: .github/workflows/python-publish.yml:19"],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 4 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-18T00:40:46.091Z","repository_id":42986559,"created_at":"2025-08-18T00:40:46.092Z","updated_at":"2025-08-18T00:40:46.092Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28477906,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-16T06:30:42.265Z","status":"ssl_error","status_checked_at":"2026-01-16T06:30:16.248Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","airflow-plugin","data-engineering","pentaho-data-integration"],"created_at":"2026-01-16T06:38:20.101Z","updated_at":"2026-01-16T06:38:20.765Z","avatar_url":"https://github.com/damavis.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Pentaho Airflow plugin\n\n[![Build Status](https://api.travis-ci.com/damavis/airflow-pentaho-plugin.svg?branch=master)](https://app.travis-ci.com/damavis/airflow-pentaho-plugin)\n[![codecov](https://codecov.io/gh/damavis/airflow-pentaho-plugin/branch/master/graph/badge.svg)](https://codecov.io/gh/damavis/airflow-pentaho-plugin)\n[![PyPI](https://img.shields.io/pypi/v/airflow-pentaho-plugin)](https://pypi.org/project/airflow-pentaho-plugin/)\n[![PyPI - Downloads](https://img.shields.io/pypi/dm/airflow-pentaho-plugin)](https://pypi.org/project/airflow-pentaho-plugin/)\n\nThis plugins runs Jobs and Transformations through Carte servers.\nIt allows to orchestrate a massive number of trans/jobs taking care\nof the dependencies between them, even between different instances.\nThis is done by using `CarteJobOperator` and `CarteTransOperator`\n\nIt also runs Pan (transformations) and Kitchen (Jobs) in local mode,\nboth from repository and local XML files. For this approach, use\n`KitchenOperator` and `PanOperator`\n\n## Requirements\n\n1. A Apache Airflow system deployed.\n2. One or many working PDI CE installations.\n3. A Carte server for Carte Operators.\n\n## Setup\n\nThe same setup process must be performed on webserver, scheduler\nand workers (that runs this tasks) to get it working. If you want to\ndeploy specific workers to run this kind of tasks, see\n[Queues](https://airflow.apache.org/docs/stable/concepts.html#queues),\nin **Airflow** *Concepts* section.\n\n### Pip package\n\nFirst of all, the package should be installed via `pip install` command.\n\n```bash\npip install airflow-pentaho-plugin\n```\n\n### Airflow connection\n\nThen, a new connection needs to be added to Airflow Connections, to do this,\ngo to Airflow web UI, and click on `Admin -\u003e Connections` on the top menu.\nNow, click on `Create` tab.\n\nUse HTTP connection type. Enter the **Conn Id**, this plugin uses `pdi_default`\nby default, the username and the password for your Pentaho Repository.\n\nAt the bottom of the form, fill the **Extra** field with `pentaho_home`, the\npath where your pdi-ce is placed, and `rep`, the repository name for this\nconnection, using a json formatted string like it follows.\n\n```json\n{\n    \"pentaho_home\": \"/opt/pentaho\",\n    \"rep\": \"Default\"\n}\n```\n\n### Carte\n\nIn order to use `CarteJobOperator`, the connection should be set different. Fill\n`host` (including `http://` or `https://`) and `port` for Carte hostname and port,\n`username` and `password` for PDI repository, and `extra` as it follows.\n\n```json\n{\n    \"rep\": \"Default\",\n    \"carte_username\": \"cluster\",\n    \"carte_password\": \"cluster\"\n}\n```\n\n## Usage\n\n### CarteJobOperator\n\nCarteJobOperator is responsible for running jobs in remote slave servers. Here\nit is an example of `CarteJobOperator` usage.\n\n```python\n# For versions before 2.0\n# from airflow.operators.airflow_pentaho import CarteJobOperator\n\nfrom airflow_pentaho.operators.carte import CarteJobOperator\n\n# ... #\n\n# Define the task using the CarteJobOperator\navg_spent = CarteJobOperator(\n    conn_id='pdi_default',\n    task_id=\"average_spent\",\n    job=\"/home/bi/average_spent\",\n    params={\"date\": \"{{ ds }}\"},  # Date in yyyy-mm-dd format\n    dag=dag)\n\n# ... #\n\nsome_task \u003e\u003e avg_spent \u003e\u003e another_task\n```\n\n### KitchenOperator\n\nKitchen operator is responsible for running Jobs. Lets suppose that we have\na defined *Job* saved on `/home/bi/average_spent` in our repository with\nthe argument `date` as input parameter. Lets define the task using the\n`KitchenOperator`.\n\n```python\n# For versions before 2.0\n# from airflow.operators.airflow_pentaho import KitchenOperator\n\nfrom airflow_pentaho.operators.kettle import KitchenOperator\n\n# ... #\n\n# Define the task using the KitchenOperator\navg_spent = KitchenOperator(\n    conn_id='pdi_default',\n    queue=\"pdi\",\n    task_id=\"average_spent\",\n    directory=\"/home/bi\",\n    job=\"average_spent\",\n    params={\"date\": \"{{ ds }}\"},  # Date in yyyy-mm-dd format\n    dag=dag)\n\n# ... #\n\nsome_task \u003e\u003e avg_spent \u003e\u003e another_task\n```\n\n### CarteTransOperator\n\nCarteTransOperator is responsible for running transformations in remote slave\nservers. Here it is an example of `CarteTransOperator` usage.\n\n```python\n# For versions before 2.0\n# from airflow.operators.airflow_pentaho import CarteTransOperator\n\nfrom airflow_pentaho.operators.carte import CarteTransOperator\n\n# ... #\n\n# Define the task using the CarteJobOperator\nenriche_customers = CarteTransOperator(\n    conn_id='pdi_default',\n    task_id=\"enrich_customer_data\",\n    job=\"/home/bi/enrich_customer_data\",\n    params={\"date\": \"{{ ds }}\"},  # Date in yyyy-mm-dd format\n    dag=dag)\n\n# ... #\n\nsome_task \u003e\u003e enrich_customers \u003e\u003e another_task\n```\n\n### PanOperator\n\nPan operator is responsible for running transformations. Lets suppose that\nwe have one saved on `/home/bi/clean_somedata`. Lets define the task using the\n`PanOperator`. In this case, the transformation receives a parameter that\ndetermines the file to be cleaned.\n\n```python\n# For versions before 2.0\n# from airflow.operators.airflow_pentaho import PanOperator\n\nfrom airflow_pentaho.operators.kettle import PanOperator\n\n# ... #\n\n# Define the task using the PanOperator\nclean_input = PanOperator(\n    conn_id='pdi_default',\n    queue=\"pdi\",\n    task_id=\"cleanup\",\n    directory=\"/home/bi\",\n    trans=\"clean_somedata\",\n    params={\"file\": \"/tmp/input_data/{{ ds }}/sells.csv\"},\n    dag=dag)\n\n# ... #\n\nsome_task \u003e\u003e clean_input \u003e\u003e another_task\n```\n\nFor more information, please see `sample_dags/pdi_flow.py`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdamavis%2Fairflow-pentaho-plugin","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdamavis%2Fairflow-pentaho-plugin","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdamavis%2Fairflow-pentaho-plugin/lists"}