{"id":15066766,"url":"https://github.com/gibbsbravo/datadelta","last_synced_at":"2025-08-18T19:31:37.895Z","repository":{"id":40555076,"uuid":"440552255","full_name":"gibbsbravo/DataDelta","owner":"gibbsbravo","description":"The best Python package for comparing two dataframes","archived":false,"fork":false,"pushed_at":"2021-12-29T21:50:19.000Z","size":526,"stargazers_count":10,"open_issues_count":1,"forks_count":2,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-12-14T00:46:04.714Z","etag":null,"topics":["analytics","comparison","data","data-analytics","database","database-management","databases","dataops","dataops-platform","devops","pandas","pandas-dataframe","testing","testing-tools","version-control"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gibbsbravo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-12-21T14:51:22.000Z","updated_at":"2024-12-07T18:48:10.000Z","dependencies_parsed_at":"2022-08-27T21:41:21.050Z","dependency_job_id":null,"html_url":"https://github.com/gibbsbravo/DataDelta","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gibbsbravo%2FDataDelta","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gibbsbravo%2FDataDelta/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gibbsbravo%2FDataDelta/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gibbsbravo%2FDataDelta/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gibbsbravo","download_url":"https://codeload.github.com/gibbsbravo/DataDelta/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230268748,"owners_count":18199806,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analytics","comparison","data","data-analytics","database","database-management","databases","dataops","dataops-platform","devops","pandas","pandas-dataframe","testing","testing-tools","version-control"],"created_at":"2024-09-25T01:11:54.090Z","updated_at":"2024-12-18T12:13:14.310Z","avatar_url":"https://github.com/gibbsbravo.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv id=\"top\"\u003e\u003c/div\u003e\n\n\u003c!-- PROJECT LOGO --\u003e\n\u003cbr /\u003e\n\u003cdiv align=\"center\"\u003e\n  \u003ca href=\"https://github.com/gibbsbravo/DataDelta\"\u003e\n    \u003cimg src=\"https://github.com/gibbsbravo/DataDelta/blob/master/images/DataDeltaLogo.png?raw=true\" alt=\"Logo\" width=\"200\" height=\"200\"\u003e\n  \u003c/a\u003e\n  \u003cp align=\"center\" style=\"font-weight: bold; font-style:italic\"\u003e\n    The best Python package for comparing two dataframes\n    \u003cbr /\u003e\n    \u003ca href=\"https://github.com/gibbsbravo/DataDelta\"\u003e\u003cstrong\u003eExplore the docs »\u003c/strong\u003e\u003c/a\u003e\n    \u003cbr /\u003e\n    \n  \u003c/p\u003e\n\u003c/div\u003e\n\n\u003c!-- TABLE OF CONTENTS --\u003e\n\u003cdetails\u003e\n  \u003csummary\u003eTable of Contents\u003c/summary\u003e\n  \u003col\u003e\n    \u003cli\u003e\n      \u003ca href=\"#about-the-project\"\u003eAbout The Project\u003c/a\u003e\n    \u003c/li\u003e\n    \u003cli\u003e\n      \u003ca href=\"#getting-started\"\u003eGetting Started\u003c/a\u003e\n      \u003cul\u003e\n        \u003cli\u003e\u003ca href=\"#dependencies\"\u003eDependencies\u003c/a\u003e\u003c/li\u003e\n        \u003cli\u003e\u003ca href=\"#installation\"\u003eInstallation\u003c/a\u003e\u003c/li\u003e\n      \u003c/ul\u003e\n    \u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#usage-examples\"\u003eUsage Examples\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#contributing\"\u003eContributing\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#example-html-report-output\"\u003eExample HTML Report Output\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#license\"\u003eLicense\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#contact\"\u003eContact\u003c/a\u003e\u003c/li\u003e\n  \u003c/ol\u003e\n\u003c/details\u003e\n\n\u003c!-- ABOUT THE PROJECT --\u003e\n\n## About The Project\n\nDataDelta is a very useful Python package for easily comparing two pandas dataframes for use in data analysis, data engineering, and tracking table changes across time.\n\nDataDelta generates a \u003ca href=\"#example-html-report-output\"\u003ereport\u003c/a\u003e as both a Python dict and HTML file that summarizes the key changes between two dataframes through completing a series of tests (that can also be selected individually). The Python report is intended for use as part of a DevOps / DataOps pipeline for testing to ensure table changes are expected.\n\n\u003ca href=\"https://github.com/gibbsbravo/DataDelta/issues\"\u003eReport Bug\u003c/a\u003e\n·\n\u003ca href=\"https://github.com/gibbsbravo/DataDelta/issues\"\u003eRequest Feature\u003c/a\u003e\n\n\u003cp align=\"right\"\u003e(\u003ca href=\"#top\"\u003eback to top\u003c/a\u003e)\u003c/p\u003e\n\n\u003c!-- GETTING STARTED --\u003e\n\n## Getting Started\n\nDataDelta is easy to install through pip or feel free to clone locally to make changes.\n\n### Dependencies\n\nDataDelta has very few dependencies:\n\n- \u003ca href='https://pandas.pydata.org/'\u003epandas\u003c/a\u003e: _a fast, powerful, flexible and easy to use open source data analysis and manipulation tool_ - DataDelta is built on for comparing dataframes\n- \u003ca href='https://numpy.org/'\u003enumpy\u003c/a\u003e: _The fundamental package for scientific computing with Python_ - used for transformations and calculations\n- \u003ca href='https://jinja.palletsprojects.com/en/3.0.x/'\u003ejinja2\u003c/a\u003e: _a fast, expressive, extensible templating engine_ - used to generate the HTML report\n- \u003ca href='https://docs.pytest.org/en/6.2.x/'\u003epytest\u003c/a\u003e (optional): _a mature full-featured Python testing tool that helps you write better programs_ - used for testing\n\n### Installation\n\n- Install using Pip through PyPI:\n  ```sh\n  pip install datadelta\n  ```\n\nOR\n\n- Clone the repo locally:\n  ```sh\n  git clone https://github.com/gibbsbravo/DataDelta.git\n  ```\n\n\u003cp align=\"right\"\u003e(\u003ca href=\"#top\"\u003eback to top\u003c/a\u003e)\u003c/p\u003e\n\n\u003c!-- USAGE EXAMPLES --\u003e\n\n## Usage Examples\n\n- Quick starter code to get summary dataframe changes report:\n\n  ```sh\n  import pandas as pd\n  import datadelta as delta\n\n  old_df = pd.read_csv('MainTestData_old_df.csv') # Add your old dataframe here\n  new_df = pd.read_csv('MainTestData_new_df.csv') # Add your new dataframe here\n  primary_key = 'A' # Set the primary key\n  column_subset = None # Specify the subset of columns of interest or leave None to compare all columns\n\n  # The consolidated_report dictionary will contain the summary changes\n  consolidated_report, record_changes_comparison_df = delta.create_consolidated_report(\n      old_df, new_df, primary_key, column_subset)\n\n  # This will create a report named datadelta_html_report.html in the current working directory containing the summary changes\n  delta.export_html_report(consolidated_report, record_changes_comparison_df,\n                        export_file_name='datadelta_html_report.html',\n                        overwrite_existing_file=False)\n  ```\n\n- Get dataframe summary:\n\n  ```sh\n    import pandas as pd\n    import datadelta as delta\n\n    new_df = pd.read_csv('MainTestData_new_df.csv') # Add your new dataframe here\n\n    # Returns a report summarizing the key attributes and values of a dataframe\n    summary_report = delta.get_df_summary(\n      input_df=new_df, primary_key=primary_key, column_subset=column_subset, max_cols=15)\n  ```\n\n- Get record count changes report:\n\n  ```sh\n    old_df = pd.read_csv('MainTestData_old_df.csv') # Add your old dataframe here\n    new_df = pd.read_csv('MainTestData_new_df.csv') # Add your new dataframe here\n    primary_key = 'A' # Set the primary key\n    column_subset = None # Specify the subset of columns of interest or leave None to compare all columns\n\n    # Returns a report summarizing any changes to the number of records (and composition) between two dataframes\n    record_count_change_report = delta.check_record_count(\n      old_df, new_df, primary_key)\n  ```\n\nOther functions include:\n\n- check_column_names: Returns a report summarizing any changes to column names between two dataframes\n- check_datatypes: Returns a report summarizing any columns with different datatypes\n- check_chg_in_values: Returns a report summarizing any records with changes in values\n- get_records_in_both_tables: Returns the records found in both dataframes\n- get_record_changes_comparison_df: Returns a dataframe comparing any records with changes in values by column\n- export_html_report: Exports an html report of the differences between two dataframes\n\n\u003cp align=\"right\"\u003e(\u003ca href=\"#top\"\u003eback to top\u003c/a\u003e)\u003c/p\u003e\n\n\u003c!-- CONTRIBUTING --\u003e\n\n## Contributing\n\nContributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are **greatly appreciated**.\n\nIf you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag \"enhancement\".\nDon't forget to give the project a star! Thanks again!\n\n1. Fork the Project\n2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)\n3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to the Branch (`git push origin feature/AmazingFeature`)\n5. Open a Pull Request\n\n\u003cp align=\"right\"\u003e(\u003ca href=\"#top\"\u003eback to top\u003c/a\u003e)\u003c/p\u003e\n\n\u003c!-- Example Report --\u003e\n\n## Example HTML Report Output\n\n![Report Screenshot][report-screenshot]\n\n\u003cp align=\"right\"\u003e(\u003ca href=\"#top\"\u003eback to top\u003c/a\u003e)\u003c/p\u003e\n\n\u003c!-- LICENSE --\u003e\n\n## License\n\nDistributed under the GNU General Public License v3 (GPLV3) License. See `LICENSE.txt` for more information.\n\n\u003cp align=\"right\"\u003e(\u003ca href=\"#top\"\u003eback to top\u003c/a\u003e)\u003c/p\u003e\n\n\u003c!-- CONTACT --\u003e\n\n## Contact\n\nAndrew Gibbs-Bravo - andrewgbravo@gmail.com\n\nProject Link: [https://github.com/gibbsbravo/DataDelta](https://github.com/gibbsbravo/DataDelta)\n\n\u003cp align=\"right\"\u003e(\u003ca href=\"#top\"\u003eback to top\u003c/a\u003e)\u003c/p\u003e\n\n\u003c!-- MARKDOWN LINKS \u0026 IMAGES --\u003e\n\u003c!-- https://www.markdownguide.org/basic-syntax/#reference-style-links --\u003e\n\n[contributors-shield]: https://img.shields.io/github/contributors/gibbsbravo/DataDelta.svg?style=for-the-badge\n[contributors-url]: https://github.com/gibbsbravo/DataDelta/graphs/contributors\n[forks-shield]: https://img.shields.io/github/forks/gibbsbravo/DataDelta.svg?style=for-the-badge\n[forks-url]: https://github.com/gibbsbravo/DataDelta/network/members\n[stars-shield]: https://img.shields.io/github/stars/gibbsbravo/DataDelta.svg?style=for-the-badge\n[stars-url]: https://github.com/gibbsbravo/DataDelta/stargazers\n[issues-shield]: https://img.shields.io/github/issues/gibbsbravo/DataDelta.svg?style=for-the-badge\n[issues-url]: https://github.com/gibbsbravo/DataDelta/issues\n[license-shield]: https://img.shields.io/github/license/gibbsbravo/DataDelta.svg?style=for-the-badge\n[license-url]: https://github.com/gibbsbravo/DataDelta/blob/master/LICENSE.txt\n[linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=for-the-badge\u0026logo=linkedin\u0026colorB=555\n[report-screenshot]: https://github.com/gibbsbravo/DataDelta/blob/master/images/DatasetComparisonReport.png?raw=true\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgibbsbravo%2Fdatadelta","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgibbsbravo%2Fdatadelta","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgibbsbravo%2Fdatadelta/lists"}