{"id":15134752,"url":"https://github.com/rosette-api-community/rosettepedia","last_synced_at":"2025-10-23T11:30:45.866Z","repository":{"id":84699316,"uuid":"92202034","full_name":"rosette-api-community/rosettepedia","owner":"rosette-api-community","description":"Augment Rosette API entity extraction results with information from Wikipedia.","archived":false,"fork":false,"pushed_at":"2019-04-22T17:30:52.000Z","size":25,"stargazers_count":7,"open_issues_count":0,"forks_count":0,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-01-30T18:03:30.198Z","etag":null,"topics":["entities","entity-extraction","language","mediawiki","mediawiki-api","natural-language-processing","nlp","python","wikidata","wikipedia"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rosette-api-community.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-05-23T17:41:50.000Z","updated_at":"2022-11-16T15:50:52.000Z","dependencies_parsed_at":null,"dependency_job_id":"1248d95f-1bcf-4f2a-9d2f-88c4be5192d4","html_url":"https://github.com/rosette-api-community/rosettepedia","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rosette-api-community%2Frosettepedia","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rosette-api-community%2Frosettepedia/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rosette-api-community%2Frosettepedia/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rosette-api-community%2Frosettepedia/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rosette-api-community","download_url":"https://codeload.github.com/rosette-api-community/rosettepedia/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":237811593,"owners_count":19370152,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["entities","entity-extraction","language","mediawiki","mediawiki-api","natural-language-processing","nlp","python","wikidata","wikipedia"],"created_at":"2024-09-26T05:24:11.638Z","updated_at":"2025-10-23T11:30:45.490Z","avatar_url":"https://github.com/rosette-api-community.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Rosettepedia\n\nThis repository includes Python code demonstrating how to combine [Rosette API](https://developer.rosette.com/) entity-extraction results with results from the [MediaWiki API](https://www.mediawiki.org/wiki/API:Main_page) to provide additional information about entities based on information available in Wikipedia Infoboxes and Wikidata.\n\n## Setup\n\n### Installing Dependencies with Virtualenv\nThe script is written for Python 3.  If you are alright with installing external Python packages globally, you may skip this section.\n\nYou can install the dependencies using `virtualenv` so that you don't alter your global site packages.\n\nThe process for installing the dependencies using `virtualenv` is as follows for `bash` or similar shells:\n\nEnsure your `virtualenv` is up to date.\n\n    $ pip install -U virtualenv\n\n**Note**: You may need to use `pip3` depending on your Python installation.\n\n`cd` into the repository directory (where this `README.md` file is located) and create a Python 3 virtual environment with:\n\n    $ python3 $(which virtualenv) .\n\nActivate the virtual environment:\n\n    $ source bin/activate\n\nOnce you've activated the virtual environment you can proceed to install the requirements safely without affecting your globabl site packages.\n\n### Installing the Dependencies\nYou can install the dependencies via `pip` (or `pip3` depending on your installation of Python 3) as follows using the provided `requirements.txt`:\n\n    $ pip install -r requirements.txt\n\n## Running `rosettepedia.py`\nYou can use the script from the commandline as follows:\n\n    ./rosettepedia.py -h\n    usage: rosettepedia.py [-h] [-i INPUT] [-u] [-k KEY] [-a API_URL]\n                           [-l LANGUAGE] -w WIKIPEDIA_LANGUAGE [-v]\n\n    Augment Rosette API entity extraction results with information from Wikipedia.\n\n    optional arguments:\n      -h, --help            show this help message and exit\n      -i INPUT, --input INPUT\n                            Path to a file containing input data (if not specified\n                            data is read from stdin) (default: None)\n      -u, --content-uri     Specify that the input is a URI (otherwise load text\n                            from file) (default: False)\n      -k KEY, --key KEY     Rosette API Key (default: None)\n      -a API_URL, --api-url API_URL\n                            Alternative Rosette API URL (default:\n                            https://api.rosette.com/rest/v1/)\n      -l LANGUAGE, --language LANGUAGE\n                            A three-letter (ISO 639-2 T) code that will override\n                            automatic language detection (default: None)\n      -w WIKIPEDIA_LANGUAGE, --wikipedia-language WIKIPEDIA_LANGUAGE\n                            A three-letter (ISO 639-2 T) code that determines\n                            which Wikipedia language to use for looking up Infobox\n                            information if available (default: None)\n      -v, --verbose, --adm  Output verbosely (i.e., get the full Annotated Data\n                            Model (ADM) as JSON) (default: False)\n\n**Note**: If you prefer not to enter your Rosette API key every time you run the script you can set up an environment variable `$ROSETTE_USER_KEY`.\n\n**Note**: See the [Rosette API developer documentation](https://developer.rosette.com/features-and-functions#-entity-linking) for languages that support entity linking.  As of writing, the API supports Chinese (`zho`), English (`eng`), Japanese (`jpn`), and Spanish (`spa`).\n\n### Examples\nThe simplest way to use the script is to simply pipe in a string:\n\n    $ echo \"OPEC will meet in Vienna this week.\" | ./rosettepedia.py -w eng \u003e opec.json\n    Extracting entities via Rosette API ...\n    Done!\n    Augmenting entities via MediaWiki API ...\n    fetching \"en\" Infobox/Wikidata for entity: Q7795 (OPEC) ...\n    fetching \"en\" Infobox/Wikidata for entity: Q1741 (Vienna) ...\n    Done!\n\n\nWe can inspect the results with [`jq`](https://stedolan.github.io/jq/):\n\n    $ jq .entities opec.json \n    [\n      {\n        \"type\": \"ORGANIZATION\",\n        \"mention\": \"OPEC\",\n        \"normalized\": \"OPEC\",\n        \"count\": 1,\n        \"entityId\": \"Q7795\",\n        \"wikipedia\": {\n          \"infobox\": {\n            \"name\": \"Organization of the Petroleum Exporting Countries\",\n            \"image_flag\": \"Flag of OPEC.svg\",\n            \"image_map\": \"OPEC.svg\",\n            \"org_type\": \"International cartel\",\n            \"membership_type\": \"Membership\",\n            \"admin_center_type\": \"Headquarters\",\n            \"admin_center\": \"Vienna, Austria\",\n            \"languages_type\": \"Official language\",\n            \"languages\": \"English\",\n            \"leader_title1\": \"Secretary General\",\n            \"leader_name1\": \"Mohammed Barkindo\",\n            \"established\": \"Baghdad, Iraq\",\n            \"established_event1\": \"Statute\",\n            \"established_date1\": \"10–14 September 1960\",\n            \"established_event2\": \"In effect\",\n            \"established_date2\": \"January 1961\",\n            \"currency\": \"(US$ /bbl)\"\n          },\n          \"wikidata\": {\n            \"website\": \"http://www.opec.org\",\n            \"image\": \"OPEC-building-01.jpg\",\n            \"instance\": \"international organization\",\n            \"category\": \"Category:OPEC\"\n          },\n          \"title\": \"OPEC\",\n          \"url\": \"https://en.wikipedia.org/wiki/OPEC\"\n        }\n      },\n      {\n        \"type\": \"LOCATION\",\n        \"mention\": \"Vienna\",\n        \"normalized\": \"Vienna\",\n        \"count\": 1,\n        \"entityId\": \"Q1741\",\n        \"wikipedia\": {\n          \"infobox\": {\n            \"name\": \"Vienna\",\n            \"native_name\": \"Wien\",\n            \"settlement_type\": \"Capital city\",\n            \"image_flag\": \"Flag of Wien.svg\",\n            \"image_seal\": \"Vienna seal 1926.svg\",\n            \"image_shield\": \"Wien 3 Wappen.svg\",\n            \"shield_size\": \"80px\",\n            \"image_map\": \"Wien in Austria.svg\",\n            \"map_caption\": \"Location of Vienna in Austria\",\n            \"subdivision_type\": \"Country\",\n            \"subdivision_name\": \"Austria\",\n            \"leader_party\": \"SPÖ\",\n            \"leader_title\": \"Mayor and Governor\",\n            \"leader_name\": \"Michael Häupl\",\n            \"leader_title1\": \"Vice-Mayors and Vice-Governors\",\n            \"area_magnitude\": \"2 chaiz\",\n            \"area_total_km2\": \"414.65\",\n            \"area_land_km2\": \"395.26\",\n            \"area_water_km2\": \"19.39\",\n            \"elevation_m\": \"151 (Lobau) – 542 (Hermannskogel)\",\n            \"elevation_ft\": \"495–1778\",\n            \"population_total\": \"1,867,960\",\n            \"population_as_of\": \"1. January 2017\",\n            \"population_density_km2\": \"4326.1\",\n            \"population_metro\": \"2,600,000\",\n            \"population_blank2_title\": \"Ethnicity\",\n            \"population_blank2\": \"61.2% Austrian38.8% Other\",\n            \"population_demonym\": \"Viennese, Wiener\",\n            \"population_note\": \"Statistik Austria, VCÖ – Mobilität mit Zukunft\",\n            \"postal_code_type\": \"Postal code\",\n            \"postal_code\": \"1010–1423, 1600, 1601, 1810, 1901\",\n            \"website\": \"www.wien.gv.at\",\n            \"footnotes\": \"frameless|x30px\",\n            \"blank1_name\": \"- GDP total (2014)http://ec.europa.eu/eurostat/documents/2995521/7192292/1-26022016-AP-EN.pdf/602b34e8-abba-439e-b555-4c3cb1dbbe6e\",\n            \"blank1_info\": \"€82 billion/ US$110 billion\",\n            \"blank2_name\": \"- GDP per capita(2014)http://ec.europa.eu/eurostat/documents/2995521/7192292/1-26022016-AP-EN.pdf/602b34e8-abba-439e-b555-4c3cb1dbbe6e\",\n            \"blank2_info\": \"€47,300/ US$63,000XE.com average GBP/ USD ex. rate in 2014\",\n            \"timezone\": \"CET\",\n            \"utc_offset\": \"+1\",\n            \"timezone_DST\": \"CEST\",\n            \"utc_offset_DST\": \"+2\",\n            \"blank_name\": \"Vehicle registration\",\n            \"blank_info\": \"W\"\n          },\n          \"wikidata\": {\n            \"image\": \"Collage von Wien.jpg\",\n            \"coordinates\": {\n              \"latitude\": 48.20833,\n              \"longitude\": 16.373064,\n              \"altitude\": null,\n              \"precision\": 1e-06,\n              \"globe\": \"http://www.wikidata.org/entity/Q2\"\n            },\n            \"website\": \"https://www.wien.gv.at/\",\n            \"instance\": [\n              \"city\",\n              \"capital\",\n              \"city with millions of inhabitants\",\n              \"federal capital\",\n              \"municipality of Austria\",\n              \"place with town rights and privileges\",\n              \"statuatory city of Austria\",\n              \"state of Austria\",\n              \"district of Austria\",\n              \"metropolis\",\n              \"tourist destination\"\n            ],\n            \"country\": [\n              \"Austria\",\n              \"First Republic of Austria\",\n              \"Austria-Hungary\",\n              \"Republic of German-Austria\",\n              \"Austrian Empire\",\n              \"Federal State of Austria\",\n              \"Nazi Germany\",\n              \"Habsburg Empire\",\n              \"Archduchy of Austria\",\n              \"Duchy of Austria\",\n              \"March of Austria\",\n              \"Duchy of Bavaria\",\n              \"Allied-occupied Austria\"\n            ],\n            \"category\": \"Category:Vienna\"\n          },\n          \"title\": \"Vienna\",\n          \"url\": \"https://en.wikipedia.org/wiki/Vienna\"\n        }\n      }\n    ]\n\nAnother way to use the script is to have Rosette API extract content from a web page by supplying a URL and using the `-u/--content-uri` option:\n\n    $ ./rosettepedia.py -u -i 'https://ja.wikipedia.org/wiki/アメリカスカップ' -w jpn \u003e アメリカスカップ.json\n    Extracting entities via Rosette API ...\n    ...\n    Done!\n    $ jq '.entities[]|select(.entityId == \"Q29\")' アメリカスカップ.json\n    {\n      \"type\": \"LOCATION\",\n      \"mention\": \"Español\",\n      \"normalized\": \"Español\",\n      \"count\": 1,\n      \"entityId\": \"Q29\",\n      \"wikipedia\": {\n        \"infobox\": {},\n        \"wikidata\": {\n          \"coordinates\": {\n            \"latitude\": 40,\n            \"longitude\": -3,\n            \"altitude\": null,\n            \"precision\": 1,\n            \"globe\": \"http://www.wikidata.org/entity/Q2\"\n          },\n          \"image\": \"Relief Map of Spain.png\",\n          \"continent\": [\n            \"ヨーロッパ\",\n            \"アフリカ\"\n          ],\n          \"instance\": [\n            \"主権国家\",\n            \"国\",\n            \"欧州連合加盟国\",\n            \"国際連合加盟国\",\n            \"欧州評議会加盟国\"\n          ],\n          \"category\": \"Category:スペイン\",\n          \"country\": \"スペイン\"\n        },\n        \"title\": \"スペイン\",\n        \"url\": \"https://ja.wikipedia.org/wiki/スペイン\"\n      }\n    }\n\nSince Rosette API resolves entities independent of language, you can get Wikipedia Infobox/Wikidata information in a different language from the document using the `-w/--wikipedia-language` option.  For example you can get German Wikipedia info for the entities extracted from a Japanese document:\n\n    $ ./rosettepedia.py -u -i 'https://ja.wikipedia.org/wiki/アメリカスカップ' -w deu \u003e アメリカスカップ.deu.json\n    Extracting entities via Rosette API ...\n    ...\n    Done!\n    $ jq '.entities[]|select(.entityId == \"Q29\")' アメリカスカップ.deu.json\n    {\n      \"type\": \"LOCATION\",\n      \"mention\": \"Español\",\n      \"normalized\": \"Español\",\n      \"count\": 1,\n      \"entityId\": \"Q29\",\n      \"wikipedia\": {\n        \"infobox\": {\n          \"NAME-AMTSSPRACHE\": \"Reino de España\",\n          \"NAME-DEUTSCH\": \"Königreich Spanien\",\n          \"BILD-FLAGGE\": \"Flag of Spain.svg\",\n          \"ARTIKEL-FLAGGE\": \"Flagge Spaniens\",\n          \"BILD-WAPPEN\": \"Escudo de España (mazonado).svg\",\n          \"BILD-WAPPEN-BREITE\": \"120px\",\n          \"ARTIKEL-WAPPEN\": \"Wappen Spaniens\",\n          \"WAHLSPRUCH\": \"„Plus Ultra“lat., „Darüber hinaus“\",\n          \"AMTSSPRACHE\": \"Spanisch\\namtlich regional:\\n Aragonesisch\\n Aranesisch\\n Asturisch\\n Baskisch\\n Galicisch\\n Katalanisch\",\n          \"HAUPTSTADT\": \"Madrid\",\n          \"STAATSFORM\": \"parlamentarische Erbmonarchie\",\n          \"REGIERUNGSSYSTEM\": \"parlamentarische Demokratie\",\n          \"STAATSOBERHAUPT\": \"König Felipe VI.\",\n          \"REGIERUNGSCHEF\": \"Regierungspräsident Mariano Rajoy\",\n          \"FLÄCHE\": \"505.970Europäische Union (Eurostat): Spanien – Länderinfo, Stand 2014.\",\n          \"EINWOHNER\": \"46.438.422 (1. Januar 2016)\",\n          \"BEV-DICHTE\": \"92\",\n          \"BEV-ZUNAHME\": \"–0,02% (2015–2016)\",\n          \"BIP\": \"2011World Economic Outlook Database, April 2012 des Internationalen Währungsfonds\\n $ 1.493 Milliarden (12.)\\n $ 1.413 Milliarden (13.)\\n $ 32.360 (27.)\\n $ 30.626 (29.)\",\n          \"BIP-ERWEITERT\": \"* Total (nominal)\\n Total (KKP)\\n BIP/Einw. (nominal)\\n BIP/Einw. (KKP)\",\n          \"HDI\": \"0,869 (27.) (2013)Human Development Report Office: Spain – Country Profile: Human Development Indicators, abgerufen am 26. Oktober 2014\",\n          \"WÄHRUNG\": \"Euro (EUR)\",\n          \"NATIONALHYMNE\": \"Marcha Real155x125px\",\n          \"ZEITZONE\": \"UTC+1 MEZUTC+2 MESZ (März bis Oktober)Kanarische Inseln:UTC±0UTC+1 (März bis Oktober)\",\n          \"KFZ-KENNZEICHEN\": \"E\",\n          \"ISO 3166\": \"ES, ESP, 724\",\n          \"INTERNET-TLD\": \".es\",\n          \"TELEFON-VORWAHL\": \"+34\",\n          \"BILD-LAGE\": \"Spain in the European Union on the globe (Europe centered).svg\",\n          \"BILD-LAGE-IMAGEMAP\": \"EuropaGlobus1\"\n        },\n        \"wikidata\": {\n          \"coordinates\": {\n            \"latitude\": 40,\n            \"longitude\": -3,\n            \"altitude\": null,\n            \"precision\": 1,\n            \"globe\": \"http://www.wikidata.org/entity/Q2\"\n          },\n          \"image\": \"Relief Map of Spain.png\",\n          \"continent\": [\n            \"Europa\",\n            \"Afrika\"\n          ],\n          \"instance\": [\n            \"souveräner Staat\",\n            \"Land\",\n            \"Mitgliedstaat der Europäischen Union\",\n            \"Mitgliedstaat der Vereinten Nationen\",\n            \"Mitglied des Europarats\"\n          ],\n          \"category\": \"Kategorie:Spanien\",\n          \"country\": \"Spanien\"\n        },\n        \"title\": \"Spanien\",\n        \"url\": \"https://de.wikipedia.org/wiki/Spanien\"\n      }\n    }\n\nGiven the additional information provided by the `wikipedia` extended attributes, you can filter down to only those entities that satisfy certain properties.  For instance, you can query for only those entities that have geo-coordinates:\n\n    $ jq '.entities[]|select(.wikipedia.wikidata|has(\"coordinates\"))' アメリカスカップ.json\n    ...\n    {\n      \"type\": \"LOCATION\",\n      \"mention\": \"JPN\",\n      \"normalized\": \"JPN\",\n      \"count\": 1,\n      \"entityId\": \"Q17\",\n      \"wikipedia\": {\n        \"infobox\": {},\n        \"wikidata\": {\n          \"coordinates\": {\n            \"latitude\": 35,\n            \"longitude\": 136,\n            \"altitude\": null,\n            \"precision\": 1,\n            \"globe\": \"http://www.wikidata.org/entity/Q2\"\n          },\n          \"instance\": [\n            \"主権国家\",\n            \"国\",\n            \"島国\",\n            \"国際連合加盟国\"\n          ],\n          \"continent\": \"アジア\",\n          \"category\": \"Category:日本\",\n          \"country\": \"日本\"\n        },\n        \"title\": \"日本\",\n        \"url\": \"https://ja.wikipedia.org/wiki/日本\"\n      }\n    }\n\nOr find those entities that have a linked website:\n\n    $ jq '.entities[]|select(.wikipedia.wikidata|has(\"website\"))' アメリカスカップ.json\n    ...\n    {\n      \"type\": \"ORGANIZATION\",\n      \"mention\": \"国際ルール\",\n      \"normalized\": \"国際ルール\",\n      \"count\": 1,\n      \"entityId\": \"Q46199\",\n      \"wikipedia\": {\n        \"infobox\": {\n          \"名称\": \"国際バスケットボール連盟\",\n          \"略称\": \"FIBA\",\n          \"設立\": \"1932年\",\n          \"本部\": \"・ジュネーヴ\",\n          \"会長\": \"ホラシオ・ムラトーレ\",\n          \"事務総長\": \"パトリック・バウマン\",\n          \"ウェブサイト\": \"http://www.fiba.com/\"\n        },\n        \"wikidata\": {\n          \"website\": \"http://www.fiba.com/\",\n          \"image\": \"FIBA headquarter.JPG\",\n          \"category\": null,\n          \"instance\": \"国際競技連盟\"\n        },\n        \"title\": \"国際バスケットボール連盟\",\n        \"url\": \"https://ja.wikipedia.org/wiki/国際バスケットボール連盟\"\n      }\n    }\n    ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frosette-api-community%2Frosettepedia","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frosette-api-community%2Frosettepedia","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frosette-api-community%2Frosettepedia/lists"}