{"id":49249388,"url":"https://github.com/senzing-garage/g2audit","last_synced_at":"2026-04-24T23:35:51.760Z","repository":{"id":38331805,"uuid":"370737808","full_name":"senzing-garage/g2audit","owner":"senzing-garage","description":"Distributed with Senzing API package","archived":false,"fork":false,"pushed_at":"2026-02-13T23:22:35.000Z","size":680,"stargazers_count":0,"open_issues_count":6,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2026-02-14T01:48:04.527Z","etag":null,"topics":["g2-python","g2tool","senzing-cleanup","senzing-g2-python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/senzing-garage.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2021-05-25T15:16:22.000Z","updated_at":"2026-02-13T18:49:37.000Z","dependencies_parsed_at":"2024-01-27T17:30:25.207Z","dependency_job_id":"7fd9eda2-d345-4bbb-b4d3-47af7992cabe","html_url":"https://github.com/senzing-garage/g2audit","commit_stats":null,"previous_names":["senzing-garage/g2audit"],"tags_count":15,"template":false,"template_full_name":null,"purl":"pkg:github/senzing-garage/g2audit","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/senzing-garage%2Fg2audit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/senzing-garage%2Fg2audit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/senzing-garage%2Fg2audit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/senzing-garage%2Fg2audit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/senzing-garage","download_url":"https://codeload.github.com/senzing-garage/g2audit/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/senzing-garage%2Fg2audit/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32245150,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-24T13:21:15.438Z","status":"ssl_error","status_checked_at":"2026-04-24T13:21:15.005Z","response_time":64,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["g2-python","g2tool","senzing-cleanup","senzing-g2-python"],"created_at":"2026-04-24T23:35:51.048Z","updated_at":"2026-04-24T23:35:51.743Z","avatar_url":"https://github.com/senzing-garage.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# g2audit\n\n## Overview\n\nThe [G2Audit.py] utility compares two entity resolution result sets and computes the precision, recall and F1 scores between them. It\ncan be used to compare different runs to a truth set to determine which one is best or to determine the full effect a configuration change had\ngainst prior run of the same data. There are many articles that describe this including:\n\n- https://senzing.zendesk.com/hc/en-us/articles/360045624093-Understanding-the-G2Audit-statistics\n- https://senzing.zendesk.com/hc/en-us/articles/360050643034-Exploratory-Data-Analysis-4-Comparing-ER-results\n- https://senzing.zendesk.com/hc/en-us/articles/360051016033-How-to-create-an-entity-resolution-truth-set\n\nThis project is designed to be used along with:\n\n- https://github.com/senzing-garage/g2snapshot to extract the entity resolution result set from a Senzing database\n- https://github.com/senzing-garage/g2explorer to explore the audit result statistics and examples (requires the data to be loaded into Senzing)\n\nUsage:\n\n```console\npython3 G2Audit.py --help\nusage: G2Audit.py [-h] [-n NEWERFILE] [-p PRIORFILE] [-o OUTPUTROOT] [-D]\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -n NEWERFILE, --newer_csv_file NEWERFILE\n                        the latest entity map file\n  -p PRIORFILE, --prior_csv_file PRIORFILE\n                        the prior entity map file\n  -o OUTPUTROOT, --output_file_root OUTPUTROOT\n                        the output file root name (both a .csv and a .json file\n                        will be created)\n  -D, --debug           print debug statements\n```\n\n## Contents\n\n1. [Prerequisites]\n2. [Installation]\n3. [Typical use]\n4. [Output files]\n\n### Prerequisites\n\n- Python 3.6 or higher\n\n_Plenty of RAM! This process runs very fast as it loads each data set into memory. This is not a problem if your control or truth set is under a million records. But if you get into the\n10s or 100s of million records, you will need to run this on a computer with enough RAM to load both sets into memory at the same time._\n\n### Installation\n\n1. Place the the following file in a directory of your choice:\n   - [G2Audit.py]\n\n### Typical use\n\n#### For comparing to a truthset to find the best result\n\n```console\npython3 G2Audit.py -n /path/to/candidate1-result.csv -p /path/to/truthset.csv -o /path/to/audit1-result\n\npython3 G2Audit.py -n /path/to/candidate2-result.csv -p /path/to/truthset.csv -o /path/to/audit2-result\n```\n\nYou will find the precision, recall and F1 scores in the audit1-result1.json and audit-result2.json files along with examples of entities that have been split or merged.\n\n#### For analyzing the full effect of a configuration change\n\n```console\npython3 G2Audit.py -n /path/to/v1-config-test1-result.csv -p /path/to/v2-config-test1-result.csv -o /path/to/audit-result-cfg2-cfg1\n```\n\nThis would determine the effect the version 2 config changes had on the test1 data set compared to the version 1 config result on the same data.\n\nConfiguration updates are usually made to reduce false positives or negatives on specific examples reported by users. Performing this kind of an audit can help ensure their examples\nwere corrected without drastically affecting the overall precision and recall scores.\n\n### Output files\n\n#### json statistics file\n\n![Alt text](images/json-file-screenshot.jpg?raw=true \"Screen shot\")\n\n#### csv statistics file\n\n![Alt text](images/csv-file-screenshot.jpg?raw=true \"Screen shot\")\n\n- AUDIT_ID groups the records involved in a split or merged entity.\n- AUDIT_CATEGORY is either \"SPLIT\" or \"MERGED\" or \"SPLIT+MERGED\" and applies to the group so is the same for every record in the AUDIT_ID.\n- AUDIT_RESULT is either \"new_positive\", \"new_negative\", or \"same\" and shows that particular record's role in the audit result.\n- DATA_SOURCE and RECORD_ID indicate the specific record.\n- PRIOR_ID indicates the unique entity or cluster ID in the prior or truth set.\n- PRIOR_SCORE indicates the reported score for this record in relation to the prior entity reported by the prior or truth set.\n- NEWER_ID indicates the unique entity or cluster ID in the newer or candidate result set.\n- NEWER_SCORE indicates the reported score for this record in relation to the newer entity reported by the newer or candidate result set.\n\n_The scores are usually only provided on data run through the Senzing software which even provides scores of records that were not matched.\nFor instance in the screen shot above, line 8 shows that even though the entity was split, there was still a relationship created on name and\ndate of birth._\n\n[G2Audit.py]: G2Audit.py\n[Installation]: #Installation\n[Output files]: #Output-files\n[Prerequisites]: #Prerequisites\n[Typical use]: #Typical-use\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsenzing-garage%2Fg2audit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsenzing-garage%2Fg2audit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsenzing-garage%2Fg2audit/lists"}