{"id":13633362,"url":"https://github.com/InfuseAI/piperider","last_synced_at":"2025-04-18T10:34:52.712Z","repository":{"id":36953501,"uuid":"476144964","full_name":"InfuseAI/piperider","owner":"InfuseAI","description":"Code review for data in dbt","archived":false,"fork":false,"pushed_at":"2024-03-13T03:13:14.000Z","size":34203,"stargazers_count":480,"open_issues_count":19,"forks_count":23,"subscribers_count":14,"default_branch":"main","last_synced_at":"2024-10-29T21:41:49.990Z","etag":null,"topics":["code-review","continuous-integration","data-exploration","data-observability","data-pipeline","data-profiler","data-profiling","data-quality","data-reliability","data-science","data-testing","data-visualization","dbt","dbt-metrics","eda","exploratory-data-analysis","pull-requests","python","reporting"],"latest_commit_sha":null,"homepage":"https://www.piperider.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/InfuseAI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2022-03-31T04:05:11.000Z","updated_at":"2024-09-30T09:03:01.000Z","dependencies_parsed_at":"2023-01-17T08:01:39.774Z","dependency_job_id":"97a2f900-f541-4813-bc15-2d1ad7458640","html_url":"https://github.com/InfuseAI/piperider","commit_stats":{"total_commits":2542,"total_committers":30,"mean_commits":84.73333333333333,"dds":0.7891424075531078,"last_synced_commit":"e2fdb2c34a592e91842eaad228446899a79f4265"},"previous_names":["infuseai/piperider-cli"],"tags_count":176,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InfuseAI%2Fpiperider","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InfuseAI%2Fpiperider/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InfuseAI%2Fpiperider/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InfuseAI%2Fpiperider/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/InfuseAI","download_url":"https://codeload.github.com/InfuseAI/piperider/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223779721,"owners_count":17201220,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["code-review","continuous-integration","data-exploration","data-observability","data-pipeline","data-profiler","data-profiling","data-quality","data-reliability","data-science","data-testing","data-visualization","dbt","dbt-metrics","eda","exploratory-data-analysis","pull-requests","python","reporting"],"created_at":"2024-08-01T23:00:34.802Z","updated_at":"2024-11-09T02:32:04.763Z","avatar_url":"https://github.com/InfuseAI.png","language":"Python","readme":"[![ci-tests](https://github.com/infuseai/piperider-cli/actions/workflows/tests.yaml/badge.svg)](https://github.com/infuseai/piperider-cli/actions/workflows/tests.yaml/badge.svg)\n[![codecov](https://codecov.io/gh/InfuseAI/piperider/branch/main/graph/badge.svg?token=iVbQKGM1JA)](https://codecov.io/gh/InfuseAI/piperider)\n[![release](https://img.shields.io/github/release/infuseAI/piperider-cli/all.svg?style=flat-square)](https://github.com/infuseAI/piperider-cli/releases)\n[![pipy](https://img.shields.io/pypi/v/piperider?style=flat-square)](https://pypi.org/project/piperider/)\n[![python](https://img.shields.io/pypi/pyversions/piperider?style=flat-square)](https://pypi.org/project/piperider/)\n[![downloads](https://img.shields.io/pypi/dw/piperider?style=flat-square)](https://pypi.org/project/piperider/#files)\n[![license](https://img.shields.io/github/license/infuseai/piperider?style=flat-square)](https://github.com/InfuseAI/piperider/blob/main/LICENSE)\n[![InfuseAI Discord Invite](https://img.shields.io/discord/664381609771925514?color=%237289DA\u0026label=chat\u0026logo=discord\u0026logoColor=white\u0026style=flat-square)](https://discord.com/invite/5zb2aK9KBV)\n\n\u003cp align=\"left\"\u003e\n  \u003ca href=\"https://docs.piperider.io/\" alt=\"documentation site\" title=\"Piperider Documentation\"\u003e Docs \u003c/a\u003e |\n  \u003ca href=\"https://discord.com/invite/5zb2aK9KBV\"\u003e Discord \u003c/a\u003e |\n  \u003ca href=\"https://blog.infuseai.io/data-reliability-automated-with-piperider-7a823521ef11\"\u003e Blog \u003c/a\u003e \n\u003c/p\u003e\n\n\u003e \\[!IMPORTANT\\]\n\u003e PipeRider has been superseded by [Recce](https://datarecce.io). We recommend that users requiring pre-merge data validation checks migrate to Recce. PipeRider will not longer be updated on a regular basis. You are still welcome to open a PR with bug fixes or feature requests. For questions and help regarding this update, please contact [product@piperider.io](mailto:product@infuseai.io) or leave a message in the [Recce Discord](https://discord.gg/VpwXRC34jz).\n\n# Code review for data in dbt\n\nPipeRider automatically compares your data to highlight the difference in impacted downstream dbt models so you can\nmerge your Pull Requests with confidence.\n\n### How it works:\n\n- Easy to connect your datasource -\u003e PipeRider leverages\n  the [connection profiles in your dbt project](https://docs.getdbt.com/docs/get-started/connection-profiles) to connect\n  to the data warehouse\n- Generate profiling statistics of your models to get a high-level overview of your data\n- Compare target branch changes with the main branch in a HTML report\n- Post a quick summary of the data changes to your PR, so others can be confident too\n\n### Core concepts\n\n- **Easy to install**: Leveraging dbt's configuration settings, PipeRider can be installed within 2 minutes\n- **Fast comparison**: by collecting profiling statistics (e.g. uniqueness, averages, quantiles, histogram) and metric\n  queries, comparing downstream data impact takes little time, speeding up your team's review time\n- **Valuable insights**: various profiling statistics displayed in the HTML report give fast insights into your data\n\n# Quickstart\n\n1. **Install PipeRider**\n\n   ```bash\n   pip install piperider[\u003cconnector\u003e]\n   ```\n\n   You can find all supported data source connectors [here](https://docs.piperider.io/reference/supported-data-sources).\n\n1. **Add PipeRider tag on your model**: Go to your dbt project, and add the PipeRider tag on the model you want to\n   profile.\n\n   ```sql\n   --models/staging/stg_customers.sql\n   {{ config(\n      tags=[\"piperider\"]\n   ) }}\n\n   select ...\n   ```\n\n   and show the models would be run by piperider\n\n   ```\n    dbt list -s tag:piperider --resource-type model\n   ```\n\n1. **Run PipeRider**\n\n   ```bash\n   piperider run\n   ```\n\nTo see the full quick start guide, please refer\nto [PipeRider documentation](https://docs.piperider.io/get-started/quick-start)\n\n# Features\n\n- **Model profiling**: PipeRider can profile your [dbt models](https://docs.getdbt.com/docs/build/models) and obtain\n  information such as basic data composition, quantiles, histograms, text length, top categories, and more.\n- **Metric queries**: PipeRider can integrate with [dbt metrics](https://docs.getdbt.com/docs/build/metrics) and present\n  the time-series data of metrics in the report.\n- **HTML report**: PipeRider generates a static HTML report each time it runs, which can be viewed locally or shared.\n- **Report comparison**: You can compare two previously generated reports or use a single command to compare the\n  differences between the current branch and the main branch. The latter is designed specifically for code review\n  scenarios. In our pull requests on GitHub, we not only want to know which files have been changed, but also the impact\n  of these changes on the data. PipeRider can easily generate comparison reports with a single command to provide this\n  information.\n- **CI integration**: The key to CI is automation, and in the code review process, automating this workflow is even more\n  meaningful. PipeRider can easily integrate into your CI process. When new commits are pushed to your PR branch,\n  reports can be automatically generated to provide reviewers with more confidence in the changes made when reviewing.\n\n# Example Report Demo\n\nWe use the example project [git-repo-analytics](https://github.com/InfuseAI/git-repo-analytics) to demonstrate how to\nuse piperider+dbt+duckdb to analyze [dbt-core](https://github.com/dbt-labs/dbt-core) repository. Here is the generated\nresult (daily update)\n\n[Run Report](https://piperider-github-readme.s3.ap-northeast-1.amazonaws.com/single-run-report/index.html)\n\n[Comparison Report](https://piperider-github-readme.s3.ap-northeast-1.amazonaws.com/comparison-report/index.html)\n\n[Comparison Summary in a PR](https://github.com/InfuseAI/git-repo-analytics/pull/19)\n\n# PipeRider Cloud (beta)\n\n[PipeRider Cloud](http://cloud.piperider.io/) allows you to upload reports and share them with your team members. For\ninformation on pricing plans, please refer to the [pricing page](https://www.piperider.io/#pricing).\n\n# PipeRider Compare Action\n\nPipeRider provides the [PipeRider Compare Action](https://github.com/marketplace/actions/piperider-compare-action) to\nquickly integrate into your Github Actions workflow. It has the following features:\n\n- Automatically generates a report comparing the PR branch to the main branch\n- Uploads the report to GitHub artifacts or PipeRider cloud\n- Adds a comment to the pull request with a comparison summary and a link to the report.\n\nYou can refer to\nexample [workflow yaml](https://github.com/InfuseAI/jaffle_shop/blob/main/.github/workflows/pr-compare.yml) and\nthe [example pull request](https://github.com/InfuseAI/jaffle_shop/pull/19).\n\n# Development\n\nSee [setup dev environment](DEVELOP.md) and the [contributing guildlines](CONTRIBUTING.md) to get started.\n\n**We love chatting with our users! [Let us know](mailto:product@infuseai.io) if you have any questions, feedback, or\nneed help trying out PipeRider! :heart:**\n","funding_links":[],"categories":["Data","Data Quality","Data Tracking","Python"],"sub_categories":["Data Tracking"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FInfuseAI%2Fpiperider","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FInfuseAI%2Fpiperider","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FInfuseAI%2Fpiperider/lists"}