{"id":13609784,"url":"https://github.com/anhaidgroup/deepmatcher","last_synced_at":"2026-01-17T16:13:24.536Z","repository":{"id":77417408,"uuid":"112776167","full_name":"anhaidgroup/deepmatcher","owner":"anhaidgroup","description":"Python package for performing Entity and Text Matching using Deep Learning.","archived":false,"fork":false,"pushed_at":"2024-06-18T11:46:06.000Z","size":5904,"stargazers_count":604,"open_issues_count":74,"forks_count":131,"subscribers_count":19,"default_branch":"master","last_synced_at":"2025-09-21T01:29:11.629Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/anhaidgroup.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-12-01T19:01:11.000Z","updated_at":"2025-09-05T18:28:37.000Z","dependencies_parsed_at":"2024-01-17T00:18:12.619Z","dependency_job_id":"9bee698c-c029-44ba-9756-775733bdc23e","html_url":"https://github.com/anhaidgroup/deepmatcher","commit_stats":null,"previous_names":["sidharthms/deepmatcher"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/anhaidgroup/deepmatcher","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anhaidgroup%2Fdeepmatcher","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anhaidgroup%2Fdeepmatcher/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anhaidgroup%2Fdeepmatcher/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anhaidgroup%2Fdeepmatcher/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/anhaidgroup","download_url":"https://codeload.github.com/anhaidgroup/deepmatcher/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anhaidgroup%2Fdeepmatcher/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28511852,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-17T13:38:16.342Z","status":"ssl_error","status_checked_at":"2026-01-17T13:37:44.060Z","response_time":85,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T19:01:38.020Z","updated_at":"2026-01-17T16:13:24.516Z","avatar_url":"https://github.com/anhaidgroup.png","language":"Python","readme":"DeepMatcher\n=============\n\n.. image:: https://travis-ci.org/anhaidgroup/deepmatcher.svg?branch=master\n    :target: https://travis-ci.org/anhaidgroup/deepmatcher\n\n.. image:: https://img.shields.io/badge/License-BSD%203--Clause-blue.svg\n    :target: https://opensource.org/licenses/BSD-3-Clause\n\nDeepMatcher is a Python package for performing entity and text matching using deep learning.\nIt provides built-in neural networks and utilities that enable you to train and apply\nstate-of-the-art deep learning models for entity matching in less than 10 lines of code.\nThe models are also easily customizable - the modular design allows any subcomponent to be\naltered or swapped out for a custom implementation.\n\nAs an example, given labeled tuple pairs such as the following:\n\n.. image:: https://raw.githubusercontent.com/anhaidgroup/deepmatcher/master/docs/source/_static/match_input_ex.png\n\nDeepMatcher uses labeled tuple pairs and trains a neural network to perform matching, i.e., to\npredict match / non-match labels. The trained network can then be used to obtain labels for\nunlabeled tuple pairs.\n\nPaper and Data\n****************\n\nFor details on the architecture of the models used, take a look at our paper `Deep\nLearning for Entity Matching`_ (SIGMOD '18). All public datasets used in\nthe paper can be downloaded from the `datasets page \u003cDatasets.md\u003e`__.\n\nQuick Start: DeepMatcher in 30 seconds\n******************************************\n\nThere are four main steps in using DeepMatcher:\n\n1. Data processing: Load and process labeled training, validation and test CSV data.\n\n.. code-block:: python\n\n   import deepmatcher as dm\n   train, validation, test = dm.data.process(path='data_directory',\n       train='train.csv', validation='validation.csv', test='test.csv')\n\n2. Model definition: Specify neural network architecture. Uses the built-in hybrid\n   model (as discussed in section 4.4 of `our paper\n   \u003chttp://pages.cs.wisc.edu/~anhai/papers1/deepmatcher-sigmod18.pdf\u003e`__) by default. Can\n   be customized to your heart's desire.\n\n.. code-block:: python\n\n   model = dm.MatchingModel()\n\n3. Model training: Train neural network.\n\n.. code-block:: python\n\n   model.run_train(train, validation, best_save_path='best_model.pth')\n\n4. Application: Evaluate model on test set and apply to unlabeled data.\n\n.. code-block:: python\n\n   model.run_eval(test)\n\n   unlabeled = dm.data.process_unlabeled(path='data_directory/unlabeled.csv', trained_model=model)\n   model.run_prediction(unlabeled)\n\nInstallation\n**************\n\nWe currently support only Python versions 3.5+. Installing using pip is recommended:\n\n.. code-block::\n\n   pip install deepmatcher\n\nTutorials\n**********\n\n**Using DeepMatcher:**\n\n1. `Getting Started`_: A more in-depth guide to help you get familiar with the basics of\n   using DeepMatcher.\n2. `Data Processing`_: Advanced guide on what data processing involves and how to\n   customize it.\n3. `Matching Models`_: Advanced guide on neural network architecture for entity matching\n   and how to customize it.\n\n**Entity Matching Workflow:**\n\n`End to End Entity Matching`_: A guide to develop a complete entity\nmatching workflow. The tutorial discusses how to use DeepMatcher with `Magellan`_ to\nperform blocking, sampling, labeling and matching to obtain matching tuple pairs from two\ntables.\n\n**DeepMatcher for other matching tasks:**\n\n`Question Answering with DeepMatcher`_: A tutorial on how to use DeepMatcher for question\nanswering. Specifically, we will look at `WikiQA`_, a benchmark dataset for the task of\nAnswer Selection.\n\nAPI Reference\n***************\n\nAPI docs `are here`_.\n\nSupport\n**********\n\nTake a look at `the FAQ \u003cFAQ.md\u003e`__ for common issues. If you run into any issues or have questions not answered in the FAQ,\nplease `file GitHub issues`_ and we will address them asap.\n\nThe Team\n**********\n\nDeepMatcher was developed by University of Wisconsin-Madison grad students Sidharth Mudgal\nand Han Li, under the supervision of Prof. AnHai Doan and Prof. Theodoros Rekatsinas.\n\n.. _`Deep Learning for Entity Matching`: http://pages.cs.wisc.edu/~anhai/papers1/deepmatcher-sigmod18.pdf\n.. _`Prof. AnHai Doan's data repository`: https://sites.google.com/site/anhaidgroup/useful-stuff/data\n.. _`Magellan`: https://sites.google.com/site/anhaidgroup/projects/magellan\n.. _`Getting Started`: https://nbviewer.jupyter.org/github/anhaidgroup/deepmatcher/blob/master/examples/getting_started.ipynb\n.. _`Data Processing`: https://nbviewer.jupyter.org/github/anhaidgroup/deepmatcher/blob/master/examples/data_processing.ipynb\n.. _`Matching Models`: https://nbviewer.jupyter.org/github/anhaidgroup/deepmatcher/blob/master/examples/matching_models.ipynb\n.. _`End to End Entity Matching`: https://nbviewer.jupyter.org/github/anhaidgroup/deepmatcher/blob/master/examples/end_to_end_em.ipynb\n.. _`are here`: https://anhaidgroup.github.io/deepmatcher/html/\n.. _`Question Answering with DeepMatcher`: https://nbviewer.jupyter.org/github/anhaidgroup/deepmatcher/blob/master/examples/question_answering.ipynb\n.. _`WikiQA`: https://aclweb.org/anthology/D15-1237\n.. _`file GitHub issues`: https://github.com/anhaidgroup/deepmatcher/issues\n","funding_links":[],"categories":["Python","Open-Source Software","文本数据和NLP"],"sub_categories":["End-to-End Entity Resolution"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanhaidgroup%2Fdeepmatcher","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fanhaidgroup%2Fdeepmatcher","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanhaidgroup%2Fdeepmatcher/lists"}