{"id":15060642,"url":"https://github.com/hrialan/dataform-prune","last_synced_at":"2026-03-09T10:02:02.259Z","repository":{"id":246372622,"uuid":"820939597","full_name":"hrialan/dataform-prune","owner":"hrialan","description":"An open-source tool for automating the cleanup of outdated objects in Dataform configurations, optimizing data workflows with seamless CI/CD integration.","archived":false,"fork":false,"pushed_at":"2024-07-08T14:17:19.000Z","size":1520,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-04T17:50:32.832Z","etag":null,"topics":["automation","bigquery","data-analytics","dataform"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hrialan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-27T13:34:05.000Z","updated_at":"2024-07-08T14:17:03.000Z","dependencies_parsed_at":"2024-06-27T17:03:57.868Z","dependency_job_id":"bb5e05a1-4a56-4ce8-a3be-ef2620979700","html_url":"https://github.com/hrialan/dataform-prune","commit_stats":{"total_commits":44,"total_committers":2,"mean_commits":22.0,"dds":0.09090909090909094,"last_synced_commit":"40469015265494a767eb78db6d73a3099b8bb1f8"},"previous_names":["hrialan/dataform-prune"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/hrialan/dataform-prune","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrialan%2Fdataform-prune","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrialan%2Fdataform-prune/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrialan%2Fdataform-prune/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrialan%2Fdataform-prune/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hrialan","download_url":"https://codeload.github.com/hrialan/dataform-prune/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrialan%2Fdataform-prune/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30290921,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-09T02:57:19.223Z","status":"ssl_error","status_checked_at":"2026-03-09T02:56:26.373Z","response_time":61,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automation","bigquery","data-analytics","dataform"],"created_at":"2024-09-24T23:01:54.897Z","updated_at":"2026-03-09T10:02:02.234Z","avatar_url":"https://github.com/hrialan.png","language":"JavaScript","readme":"# Dataform Prune\n\nAn open-source tool for automating the cleanup of outdated objects in Dataform configurations, optimizing data workflows with seamless CI/CD integration.\n\n## Overview\n\n**dataform-prune** is an open-source solution that optimizes Dataform configurations by removing obsolete or unused warehouse objects. Seamlessly integrate it into your CI/CD pipelines using JavaScript scripts and a Docker image for improved performance and storage efficiency.\n\n## Features\n\n- **Automated Cleanup:** Easily remove outdated tables, views, and datasets.\n- **Storage Optimization:** Maintain a lean and performant data warehouse.\n- **Seamless Integration:** Use within your CI/CD pipelines for regular, automated maintenance.\n\n## Usage\n\n### Prerequisites\n\n- Node.js\n- Dataform CLI and a Dataform project\n- Google Cloud Platform (GCP) account with BigQuery access\n\n### Manual Usage\n\nFirst, create a compilation file of your Dataform project by running the following commands in the root of your Dataform project:\n\n```sh\ndataform install\ndataform compile --json \u003e dataform-output.json\n```\n\nA new JSON file (`dataform-output.json`) will be created in the root of your Dataform project, containing all defined actions and datasets. You can now clone the Dataform-Prune repository and run the following command:\n\n```sh\nnode prune.js --dataformOutputFile /path/to/the/just/created/json/file \\\n              --bqTableRegexToIgnore /regex/to/ignore/tables/in/your/warehouse \\\n              --bqTableNamesToIgnore /comma/separated/table/names/to/ignore/in/your/warehouse \\\n              --deleteUnmanagedBqTables /true/if/you/want/to/delete/unmanaged/tables/in/your/warehouse\n```\n\ne.g.\n```sh\nnode prune.js --dataformOutputFile dataform-output.json \\\n              --bqTableRegexToIgnore \"^t_prm_|^t_test\" \\\n              --bqTableNamesToIgnore \"table1,table2\" \\\n              --deleteUnmanagedBqTables true\n```\n\nEnsure you have BigQuery admin permissions in the project where the tables are located for the script to run correctly.\n\n\n### CI/CD Pipeline and Automation\n\nTo automate the pruning process, you can use this tool in a CI/CD pipeline with Google Cloud Build.\n\nA Docker image for this tool is available on Docker Hub. You can directly use this image in your Cloud Build configuration file.\n\nExample Cloud Build configuration file:\n\n```yaml\nsteps:\n  - name: 'node'\n    id: 'Compile Dataform project'\n    entrypoint: 'sh'\n    args:\n      - '-c'\n      - |\n        npm install -g @dataform/cli@^2.9.0\n        dataform install\n        dataform compile --json \u003e dataform-output.json\n\n  - name: 'hrialan/dataform-prune:latest'\n    id: 'Dataform prune'\n    args: [\"--dataformOutputFile\", \"dataform-output.json\",\n           \"--bqTableRegexToIgnore\", \"^t_prm_|^v_am\"],\n           \"--deleteUnmanagedBqTables\", \"true\",\n           \"--autoApprove\", \"true\"]\n```\n\n⚠️ Caution: With the `--autoApprove` flag set to true, the tool will delete the tables/views without asking for confirmation.\n\nTo follow best practices in production, initially set `deleteUnmanagedBqTables` to false when creating a PR, and set it to true when merging the PR to your default branch. This can easily be configured in your CI/CD file.\n\n## Contributing\nWe welcome contributions! If you'd like to contribute, please fork the repository and use a feature branch. Pull requests are warmly welcome.\n\n## License\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n## Contact\nFor any inquiries or support, please open an issue on GitHub or contact me at `dataform-prune@hrialan.simpelogin.com`.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhrialan%2Fdataform-prune","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhrialan%2Fdataform-prune","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhrialan%2Fdataform-prune/lists"}