{"id":20619348,"url":"https://github.com/dedupeio/address-matching","last_synced_at":"2025-04-15T11:55:02.337Z","repository":{"id":14218340,"uuid":"16925188","full_name":"dedupeio/address-matching","owner":"dedupeio","description":"Python script for matching a list of messy addresses against a gazetteer using dedupe.","archived":false,"fork":false,"pushed_at":"2020-03-31T20:47:01.000Z","size":51230,"stargazers_count":62,"open_issues_count":5,"forks_count":19,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-03-28T19:53:40.312Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dedupeio.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-02-17T20:11:57.000Z","updated_at":"2024-11-21T15:52:18.000Z","dependencies_parsed_at":"2022-07-10T05:32:15.298Z","dependency_job_id":null,"html_url":"https://github.com/dedupeio/address-matching","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dedupeio%2Faddress-matching","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dedupeio%2Faddress-matching/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dedupeio%2Faddress-matching/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dedupeio%2Faddress-matching/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dedupeio","download_url":"https://codeload.github.com/dedupeio/address-matching/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249067775,"owners_count":21207395,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-16T12:11:25.801Z","updated_at":"2025-04-15T11:55:02.318Z","avatar_url":"https://github.com/dedupeio.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"address-matching\n================\n\nPython script for matching a list of messy addresses against a gazetteer using dedupe. This also functions as a pseudo geocoder if your Gazetteer has lat/long information.\n\nPart of the [Dedupe.io](https://dedupe.io/) cloud service and open source toolset for de-duplicating and finding fuzzy matches in your data.\n\n## Setup\nHere's how to get this script working - without having dedupe already installed.\n```bash\ngit clone git@github.com:datamade/address-matching.git\ncd address-matching\npip install \"numpy\u003e=1.6\"\npip install -r requirements.txt\n```\n\n## Gazetteer\nYou will need a Gazetteer of all unique addresses in a given area. For this example, we used the [Cook County Address Point shapefile](https://datacatalog.cookcountyil.gov/GIS-Maps/ccgisdata-Address-Point-Chicago/jev2-4wjs).\n\n\n## List addresses you want to match\nThis program takes a list of addresses and matches them to individual records in the Gazetteer. For this example, we are using a messy list of early childhood education locations in Chicago. This file can have multiple entries referring to the same place. \n\n## Usage\nOnce you have a Gazetteer and a messy input file, run `address_matching.py`\n\n```bash\npython address_matching.py\n```\n\nYou will be prompted to label some training pairs for dedupe to do its thing. [More on this here](https://github.com/datamade/dedupe/blob/master/README.md#training).\n\nThe output will be saved to `address_matching_output.csv`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdedupeio%2Faddress-matching","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdedupeio%2Faddress-matching","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdedupeio%2Faddress-matching/lists"}