{"id":20874494,"url":"https://github.com/wikidata/soweego","last_synced_at":"2025-04-07T08:18:32.591Z","repository":{"id":43961262,"uuid":"140611498","full_name":"Wikidata/soweego","owner":"Wikidata","description":"Link Wikidata items to large catalogs","archived":false,"fork":false,"pushed_at":"2025-03-03T18:41:20.000Z","size":8248,"stargazers_count":96,"open_issues_count":61,"forks_count":9,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-03-31T06:07:15.939Z","etag":null,"topics":["data-matching","entity-linking","entity-resolution","identifiers","knowledge-graph","record-linkage","wikidata","wikimedia"],"latest_commit_sha":null,"homepage":"https://meta.wikimedia.org/wiki/Grants:Project/Hjfocs/soweego_2","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Wikidata.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-07-11T18:00:28.000Z","updated_at":"2025-02-27T18:45:48.000Z","dependencies_parsed_at":"2024-01-14T03:50:30.575Z","dependency_job_id":"57caac3c-628a-4abf-888e-881e8da4f71a","html_url":"https://github.com/Wikidata/soweego","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Wikidata%2Fsoweego","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Wikidata%2Fsoweego/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Wikidata%2Fsoweego/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Wikidata%2Fsoweego/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Wikidata","download_url":"https://codeload.github.com/Wikidata/soweego/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247615385,"owners_count":20967184,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-matching","entity-linking","entity-resolution","identifiers","knowledge-graph","record-linkage","wikidata","wikimedia"],"created_at":"2024-11-18T06:33:08.404Z","updated_at":"2025-04-07T08:18:32.568Z","avatar_url":"https://github.com/Wikidata.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# soweego: link Wikidata to large catalogs\n[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/Wikidata/soweego/master.svg)](https://results.pre-commit.ci/latest/github/Wikidata/soweego/master)\n[![Documentation Status](https://readthedocs.org/projects/soweego/badge/?version=latest)](https://soweego.readthedocs.io/en/latest/?badge=latest)\n[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat\u0026labelColor=ef8336)](https://pycqa.github.io/isort/)\n[![License](https://img.shields.io/github/license/Wikidata/soweego.svg)](https://www.gnu.org/licenses/gpl-3.0.html)\n\n*soweego* is a pipeline that connects [Wikidata](https://wikidata.org/) to large-scale third-party catalogs.\n\n*soweego* is the only system that makes *statisticians, epidemiologists, historians,* and *computer scientists* agree.\nWhy? Because it performs *record linkage, data matching,* and *entity resolution* at the same time.\nToo easy, they all seem to be [synonyms](https://en.wikipedia.org/wiki/Record_linkage#Naming_conventions)!\n\nOh, *soweego* also embeds [Machine Learning](https://en.wikipedia.org/wiki/Machine_learning) and advocates for [Linked Data](https://en.wikipedia.org/wiki/Linked_data).\n\n![Is soweego similar to the Go game?](https://upload.wikimedia.org/wikipedia/commons/9/96/Crosscut.jpg)\n\n# Official Project Pages\n*soweego* is made possible thanks to the [Wikimedia Foundation](https://wikimediafoundation.org/):\n- https://meta.wikimedia.org/wiki/Grants:Project/Hjfocs/soweego\n- https://meta.wikimedia.org/wiki/Grants:Project/Hjfocs/soweego_2\n\n# Documentation\nhttps://soweego.readthedocs.io/\n\n# Highlights\n- Run the whole [pipeline](#run-the-pipeline), or\n- use the [command line](#use-the-command-line);\n- [import](https://soweego.readthedocs.io/en/latest/importer.html) large catalogs into a SQL database;\n- [gather](https://soweego.readthedocs.io/en/latest/wikidata.html) live Wikidata datasets;\n- [connect](https://soweego.readthedocs.io/en/latest/linker.html) them to target catalogs via *rule-based* and *supervised* linkers;\n- [upload](https://soweego.readthedocs.io/en/latest/ingester.html) links to Wikidata and [Mix'n'match](https://tools.wmflabs.org/mix-n-match/);\n- [synchronize](https://soweego.readthedocs.io/en/latest/validator.html#module-soweego.validator.checks) Wikidata to imported catalogs;\n- [enrich](https://soweego.readthedocs.io/en/latest/validator.html#module-soweego.validator.enrichment) Wikidata items with relevant statements.\n\n# Get Ready\nInstall [Docker](https://docs.docker.com/install/) and [Compose](https://docs.docker.com/compose/install/), then enter *soweego*:\n\n```\n$ git clone -b v1.1 https://github.com/Wikidata/soweego.git\n$ cd soweego\n$ ./docker/run.sh\nBuilding soweego\n...\n\nroot@70c9b4894a30:/app/soweego#\n```\n\nNow it's too late to get out!\n\n# Run the Pipeline\nPiece of cake:\n\n```\n:/app/soweego# python -m soweego run CATALOG\n```\n\nPick `CATALOG` from `discogs`, `imdb`, or `musicbrainz`.\n\nThese steps are executed by default:\n1. import the target catalog into a local database;\n2. link Wikidata to the target with a supervised linker;\n3. synchronize Wikidata to the target.\n\nResults are in `/app/shared/results`.\n\n# Use the Command Line\nYou can launch every single *soweego* action with CLI commands:\n\n```\n:/app/soweego# python -m soweego\nUsage: soweego [OPTIONS] COMMAND [ARGS]...\n\n  Link Wikidata to large catalogs.\n\nOptions:\n  -l, --log-level \u003cTEXT CHOICE\u003e...\n                                  Module name followed by one of [DEBUG, INFO,\n                                  WARNING, ERROR, CRITICAL]. Multiple pairs\n                                  allowed.\n  --help                          Show this message and exit.\n\nCommands:\n  importer  Import target catalog dumps into a SQL database.\n  ingester  Take soweego output into Wikidata items.\n  linker    Link Wikidata items to target catalog identifiers.\n  run       Launch the whole pipeline.\n  sync      Sync Wikidata to target catalogs.\n```\n\nJust two things to remember:\n1. you can always get `--help`;\n2. each command may have sub-commands.\n\n# Contribute\nThe best way is to [import a new catalog](https://soweego.readthedocs.io/en/latest/new_catalog.html).\nPlease also have a look at the [guidelines](CONTRIBUTING.md).\n\n# License\nThe source code is under the terms of the [GNU General Public License, version 3](https://www.gnu.org/licenses/gpl.html).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwikidata%2Fsoweego","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwikidata%2Fsoweego","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwikidata%2Fsoweego/lists"}