{"id":20858443,"url":"https://github.com/nickcrews/mismo","last_synced_at":"2026-03-16T02:27:23.469Z","repository":{"id":183313167,"uuid":"501938875","full_name":"NickCrews/mismo","owner":"NickCrews","description":"The SQL/Ibis powered sklearn of record linkage","archived":false,"fork":false,"pushed_at":"2024-10-29T22:52:54.000Z","size":3852,"stargazers_count":14,"open_issues_count":29,"forks_count":3,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-10-30T00:47:36.006Z","etag":null,"topics":["deduplication","duckdb","entity-resolution","ibis","python","record-linkage","sql"],"latest_commit_sha":null,"homepage":"https://nickcrews.github.io/mismo/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"lgpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NickCrews.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"docs/contributing.md","funding":null,"license":"LICENSE.txt","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-06-10T07:12:41.000Z","updated_at":"2024-10-29T22:49:44.000Z","dependencies_parsed_at":"2023-10-12T13:16:12.951Z","dependency_job_id":"1b3fa0f3-3ff7-41fb-b9b0-b41e34f9b6f6","html_url":"https://github.com/NickCrews/mismo","commit_stats":null,"previous_names":["nickcrews/mismo"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NickCrews%2Fmismo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NickCrews%2Fmismo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NickCrews%2Fmismo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NickCrews%2Fmismo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NickCrews","download_url":"https://codeload.github.com/NickCrews/mismo/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225130757,"owners_count":17425506,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deduplication","duckdb","entity-resolution","ibis","python","record-linkage","sql"],"created_at":"2024-11-18T04:46:01.917Z","updated_at":"2026-03-16T02:27:23.461Z","avatar_url":"https://github.com/NickCrews.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Mismo\n\n[![PyPI - Version](https://img.shields.io/pypi/v/mismo.svg)](https://pypi.org/project/mismo)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/mismo.svg)](https://pypi.org/project/mismo)\n\nThe SQL/Ibis powered sklearn of record linkage.\n\nStill in alpha stage. Breaking changes will happen frequently\nand with no warning. Once things are more stabilized I\nwill come up with a stability policy. Any suggestions as\nto how you want the API to look like would be greatly appreciated.\nI do use this in my work, so at least I do decent job of\nensuring correctness.\n\n-----\n\n## Goals\n\nMismo tries to be the sklearn of record linkage, backed by the scalability\nand power of SQL and [Ibis](https://ibis-project.org/). It is made of many small\ndata structures and functions, each with a well-defined and standard API\nthat allows them to be composed together and extended easily.\nNone of the other record linkage packages I have seen, such as\n[Splink](https://github.com/moj-analytical-services/splink),\n[Dedupe](https://www.github.com/dedupeio/dedupe), or\n[Record Linkage Toolkit](https://github.com/J535D165/recordlinkage),\nhad all of these properties, so I decided to make my own.\n\nSee [Goals and Alternatives](https://nickcrews.github.io/mismo/concepts/goals_and_alternatives)\nfor a more detailed discussion of the goals of Mismo and how it compares to other\nrecord linkage packages.\n\n## Features\n- Supports larger-than-memory datasets, executed on powerful SQL engines.\n  Use DuckDB for prototyping and for jobs up to maybe ~10M records,\n  or Spark or other distributed backends for larger tasks, without\n  needing to change your code!\n- Use the clean, strong-typed, pythonic, Dataframe APIs of [Ibis](https://ibis-project.org/).\n- Small, modular functions and data structures that are easy to plug together\n  and extend.\n- Layered API: Use top-level APIs if your task is common enough that it is\n  supported out of the box.\n\n## Installation\n\n[`mismo` is available on PyPI](https://pypi.org/project/mismo/).\nI try to publish semver'ed releases after most changes.\n\nIf I forget to do this, then there are also[prereleases on PyPI](https://pypi.org/project/mismo/#history).\nThese are published every week by a github action using the HEAD commit of this repo.\n\nYou can also install directly from a branch or a specific commit from github:\n\n```console\nuv pip install \"mismo[viz] @ git+https://github.com/NickCrews/mismo@\u003cSOME-SHA-OR-BRANCH\u003e\"\n```\n\n## Examples\n\nSee the [example notebook](https://nickcrews.github.io/mismo/examples/patent_deduplication).\n\n## Documentation\n\nSee the [documentation](https://nickcrews.github.io/mismo).\n\n## Contributing\n\nSee the [contributing guide](https://nickcrews.github.io/mismo/contributing/).\n\n## License\n\n`mismo` is distributed under the terms of the\n[LGPL-3.0-or-later](https://spdx.org/licenses/LGPL-3.0-or-later.html) license.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnickcrews%2Fmismo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnickcrews%2Fmismo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnickcrews%2Fmismo/lists"}