{"id":15014062,"url":"https://github.com/explosion/wikid","last_synced_at":"2025-10-14T15:06:02.164Z","repository":{"id":62202178,"uuid":"555303372","full_name":"explosion/wikid","owner":"explosion","description":"Generate a SQLite database from Wikipedia \u0026 Wikidata dumps.","archived":false,"fork":false,"pushed_at":"2024-03-27T10:56:42.000Z","size":136,"stargazers_count":33,"open_issues_count":3,"forks_count":6,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-04-05T14:11:10.705Z","etag":null,"topics":["wikidata","wikipedia"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/explosion.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-10-21T10:12:39.000Z","updated_at":"2025-02-05T20:16:40.000Z","dependencies_parsed_at":"2024-03-27T11:53:40.437Z","dependency_job_id":"fde0a09f-ec7b-43a8-8246-d5a9c66f70df","html_url":"https://github.com/explosion/wikid","commit_stats":{"total_commits":13,"total_committers":3,"mean_commits":4.333333333333333,"dds":"0.23076923076923073","last_synced_commit":"72d702cb6bec41d1901ccb3857e698eb4fb5c166"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/explosion/wikid","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/explosion%2Fwikid","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/explosion%2Fwikid/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/explosion%2Fwikid/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/explosion%2Fwikid/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/explosion","download_url":"https://codeload.github.com/explosion/wikid/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/explosion%2Fwikid/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279019293,"owners_count":26086709,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-14T02:00:06.444Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["wikidata","wikipedia"],"created_at":"2024-09-24T19:45:08.562Z","updated_at":"2025-10-14T15:06:02.146Z","avatar_url":"https://github.com/explosion.png","language":"Python","readme":"\u003c!-- SPACY PROJECT: AUTO-GENERATED DOCS START (do not remove) --\u003e\n\n# 🪐 spaCy Project: wikid\n\n[![tests](https://github.com/explosion/wikid/actions/workflows/tests.yml/badge.svg)](https://github.com/explosion/wikid/actions/workflows/tests.yml)\n[![spaCy](https://img.shields.io/static/v1?label=made%20with%20%E2%9D%A4%20and\u0026message=spaCy\u0026color=09a3d5\u0026style=flat-square)](https://spacy.io)\n\u003cbr/\u003e _No REST for the `wikid`_ :jack_o_lantern: - generate a SQLite database\nand a spaCy `KnowledgeBase` from Wikipedia \u0026 Wikidata dumps. `wikid` was\ndesigned with the use case of named entity linking (NEL) with spaCy in mind.\n\u003cbr/\u003e Note this repository is still in an experimental stage, so the public API\nmight change at any time.\n\n## 📋 project.yml\n\nThe [`project.yml`](project.yml) defines the data assets required by the\nproject, as well as the available commands and workflows. For details, see the\n[spaCy projects documentation](https://spacy.io/usage/projects).\n\n### ⏯ Commands\n\nThe following commands are defined by the project. They can be executed using\n[`spacy project run [name]`](https://spacy.io/api/cli#project-run). Commands are\nonly re-run if their inputs have changed.\n\n| Command          | Description                                                                                                   |\n| ---------------- | ------------------------------------------------------------------------------------------------------------- |\n| `parse`          | Parse Wiki dumps. This can take a long time if you're not using the filtered dumps!                           |\n| `download_model` | Download spaCy language model.                                                                                |\n| `create_kb`      | Creates KB utilizing SQLite database with Wiki content.                                                       |\n| `delete_db`      | Deletes SQLite database generated in step parse_wiki_dumps with data parsed from Wikidata and Wikipedia dump. |\n| `clean`          | Delete all generated artifacts except for SQLite database.                                                    |\n\n### ⏭ Workflows\n\nThe following workflows are defined by the project. They can be executed using\n[`spacy project run [name]`](https://spacy.io/api/cli#project-run) and will run\nthe specified commands in order. Commands are only re-run if their inputs have\nchanged.\n\n| Workflow | Steps                                              |\n| -------- | -------------------------------------------------- |\n| `all`    | `parse` \u0026rarr; `download_model` \u0026rarr; `create_kb` |\n\n### 🗂 Assets\n\nThe following assets are defined by the project. They can be fetched by running\n[`spacy project assets`](https://spacy.io/api/cli#project-assets) in the project\ndirectory.\n\n| File                                            | Source | Description                                                     |\n| ----------------------------------------------- | ------ | --------------------------------------------------------------- |\n| `assets/wikidata_entity_dump.json.bz2`          | URL    | Wikidata entity dump. Download can take a long time!            |\n| `assets/wikipedia_dump.xml.bz2`                 | URL    | Wikipedia dump. Download can take a long time!                  |\n| `assets/wikidata_entity_dump_filtered.json.bz2` | URL    | Filtered Wikidata entity dump for demo purposes (English only). |\n| `assets/wikipedia_dump_filtered.xml.bz2`        | URL    | Filtered Wikipedia dump for demo purposes (English only).       |\n\n\u003c!-- SPACY PROJECT: AUTO-GENERATED DOCS END (do not remove) --\u003e\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fexplosion%2Fwikid","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fexplosion%2Fwikid","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fexplosion%2Fwikid/lists"}