{"id":13595400,"url":"https://github.com/facebookresearch/GENRE","last_synced_at":"2025-04-09T13:31:52.276Z","repository":{"id":39340525,"uuid":"301511303","full_name":"facebookresearch/GENRE","owner":"facebookresearch","description":"Autoregressive Entity Retrieval","archived":false,"fork":false,"pushed_at":"2023-07-06T06:48:28.000Z","size":11272,"stargazers_count":783,"open_issues_count":17,"forks_count":103,"subscribers_count":18,"default_branch":"main","last_synced_at":"2025-04-04T04:47:43.130Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/facebookresearch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2020-10-05T19:01:01.000Z","updated_at":"2025-03-25T09:23:45.000Z","dependencies_parsed_at":"2022-07-14T03:40:30.844Z","dependency_job_id":"0cdbb062-c4b0-4815-bbe4-cc5ae3d3137d","html_url":"https://github.com/facebookresearch/GENRE","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2FGENRE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2FGENRE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2FGENRE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2FGENRE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/facebookresearch","download_url":"https://codeload.github.com/facebookresearch/GENRE/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248049301,"owners_count":21039200,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T16:01:49.387Z","updated_at":"2025-04-09T13:31:50.984Z","avatar_url":"https://github.com/facebookresearch.png","language":"Python","funding_links":[],"categories":["Python","Development","其他_NLP自然语言处理"],"sub_categories":["Tools","其他_文本生成、文本对话"],"readme":"![](Genre-TwoColor-Light-BG.png)\n\nThe GENRE (Generative ENtity REtrieval) system as presented in [Autoregressive Entity Retrieval](https://arxiv.org/abs/2010.00904) implemented in pytorch.\n\n```bibtex\n@inproceedings{decao2021autoregressive,\n  author    = {Nicola {De Cao} and\n               Gautier Izacard and\n               Sebastian Riedel and\n               Fabio Petroni},\n  title     = {Autoregressive Entity Retrieval},\n  booktitle = {9th International Conference on Learning Representations, {ICLR} 2021,\n               Virtual Event, Austria, May 3-7, 2021},\n  publisher = {OpenReview.net},\n  year      = {2021},\n  url       = {https://openreview.net/forum?id=5k8F6UU39V},\n}\n```\n\n![](mGenre-TwoColor-Light-BG.png)\n\nThe mGENRE system as presented in [Multilingual Autoregressive Entity Linking](https://arxiv.org/abs/2103.12528)\n\n```bibtex\n@article{de-cao-etal-2022-multilingual,\n    title = \"Multilingual Autoregressive Entity Linking\",\n    author = \"De Cao, Nicola  and\n      Wu, Ledell  and\n      Popat, Kashyap  and\n      Artetxe, Mikel  and\n      Goyal, Naman  and\n      Plekhanov, Mikhail  and\n      Zettlemoyer, Luke  and\n      Cancedda, Nicola  and\n      Riedel, Sebastian  and\n      Petroni, Fabio\",\n    journal = \"Transactions of the Association for Computational Linguistics\",\n    volume = \"10\",\n    year = \"2022\",\n    address = \"Cambridge, MA\",\n    publisher = \"MIT Press\",\n    url = \"https://aclanthology.org/2022.tacl-1.16\",\n    doi = \"10.1162/tacl_a_00460\",\n    pages = \"274--290\",\n}\n```\n\n**Please consider citing our works if you use code from this repository.**\n\nIn a nutshell, (m)GENRE uses a sequence-to-sequence approach to entity retrieval (e.g., linking), based on fine-tuned [BART](https://arxiv.org/abs/1910.13461) architecture or [mBART](https://arxiv.org/abs/2001.08210) (for multilingual). (m)GENRE performs retrieval generating the unique entity name conditioned on the input text using constrained beam search to only generate valid identifiers. Here an example of generation for Wikipedia page retrieval for open-domain question answering:\n\n![](GENRE-animation-QA.gif)\n\nFor end-to-end entity linking GENRE re-generates the input text annotated with a markup:\n\n![](GENRE-animation-EL.gif)\n\nGENRE achieves state-of-the-art results on multiple datasets.\n\nmGENRE performs multilingual entity linking in 100+ languages treating language as latent variables and marginalizing over them:\n\n![](mGENRE-animation-EL.gif)\n\n## Main dependencies\n* python\u003e=3.7\n* pytorch\u003e=1.6\n* fairseq\u003e=0.10 (optional for training GENRE) **NOTE: fairseq is going though changing without backward compatibility. Install `fairseq` from source and use [this](https://github.com/nicola-decao/fairseq/tree/fixing_prefix_allowed_tokens_fn) commit for reproducibilty. See [here](https://github.com/pytorch/fairseq/pull/3276) for the current PR that should fix `fairseq/master`.**\n* transformers\u003e=4.2 (optional for inference of GENRE)\n\n## Examples \u0026 Usage\n\nFor a full review of (m)GENRE API see:\n* [examples for GENRE](https://github.com/facebookresearch/GENRE/blob/main/examples_genre) on how to use GENRE for both pytorch fairseq and huggingface transformers;\n* [examples for mGENRE](https://github.com/facebookresearch/GENRE/blob/main/examples_mgenre) on how to use mGENRE.\n\n### GENRE\nAfter importing and loading the model and a prefix tree (trie), you would generate predictions (in this example for Entity Disambiguation) with a simple call like:\n\n```python\nimport pickle\n\nfrom genre.fairseq_model import GENRE\nfrom genre.trie import Trie\n\n# load the prefix tree (trie)\nwith open(\"../data/kilt_titles_trie_dict.pkl\", \"rb\") as f:\n    trie = Trie.load_from_dict(pickle.load(f))\n\n# load the model\nmodel = GENRE.from_pretrained(\"models/fairseq_entity_disambiguation_aidayago\").eval()\n\n# generate Wikipedia titles\nmodel.sample(\n    sentences=[\"Einstein was a [START_ENT] German [END_ENT] physicist.\"],\n    prefix_allowed_tokens_fn=lambda batch_id, sent: trie.get(sent.tolist()),\n)\n```\n\n\n\n\n    [[{'text': 'Germany', 'score': tensor(-0.1856)},\n      {'text': 'Germans', 'score': tensor(-0.5461)},\n      {'text': 'German Empire', 'score': tensor(-2.1858)}]\n\n\n### mGENRE\nMaking predictions with mGENRE is very similar, but we additionally need to map `(title, language_ID)` to Wikidata IDs and (optionally) marginalize over predictions of the same entity:\n\n```python\nimport pickle\n\nfrom genre.fairseq_model import mGENRE\nfrom genre.trie import MarisaTrie, Trie\n\nwith open(\"../data/lang_title2wikidataID-normalized_with_redirect.pkl\", \"rb\") as f:\n    lang_title2wikidataID = pickle.load(f)\n\n# memory efficient prefix tree (trie) implemented with `marisa_trie`\nwith open(\"../data/titles_lang_all105_marisa_trie_with_redirect.pkl\", \"rb\") as f:\n    trie = pickle.load(f)\n\n# generate Wikipedia titles and language IDs\nmodel = mGENRE.from_pretrained(\"../models/fairseq_multilingual_entity_disambiguation\").eval()\n\nmodel.sample(\n    sentences=[\"[START] Einstein [END] era un fisico tedesco.\"],\n    # Italian for \"[START] Einstein [END] was a German physicist.\"\n    prefix_allowed_tokens_fn=lambda batch_id, sent: [\n        e for e in trie.get(sent.tolist()) if e \u003c len(model.task.target_dictionary)\n    ],\n    text_to_id=lambda x: max(lang_title2wikidataID[\n        tuple(reversed(x.split(\" \u003e\u003e \")))\n    ], key=lambda y: int(y[1:])),\n    marginalize=True,\n)\n```\n\n\n\n\n    [[{'id': 'Q937',\n       'texts': ['Albert Einstein \u003e\u003e it',\n        'Alberto Einstein \u003e\u003e it',\n        'Einstein \u003e\u003e it'],\n       'scores': tensor([-0.0808, -1.4619, -1.5765]),\n       'score': tensor(-0.0884)},\n      {'id': 'Q60197',\n       'texts': ['Alfred Einstein \u003e\u003e it'],\n       'scores': tensor([-1.4337]),\n       'score': tensor(-3.2058)},\n      {'id': 'Q15990626',\n       'texts': ['Albert Einstein (disambiguation) \u003e\u003e en'],\n       'scores': tensor([-1.0998]),\n       'score': tensor(-3.6478)}]]\n\n\n\n## Models \u0026 Datasets\n\nFor **GENRE** use [this](https://github.com/facebookresearch/GENRE/blob/main/scripts_genre/download_all_models.sh) script to download all models and [this](https://github.com/facebookresearch/GENRE/blob/main/scripts_genre/download_all_datasets.sh) to download all datasets. See [here](https://github.com/facebookresearch/GENRE/blob/main/examples_genre) the list of all individual models for each task and for both pytorch fairseq and huggingface transformers. See the [example](https://github.com/facebookresearch/GENRE/blob/main/examples_genre) on how to download additional optional files like the prefix tree (trie) for KILT Wikipedia.\n\nFor **mGENRE** we only have a model available [here](https://dl.fbaipublicfiles.com/GENRE/fairseq_multilingual_entity_disambiguation.tar.gz). See the [example](https://github.com/facebookresearch/GENRE/blob/main/examples_mgenre) on how to download additional optional files like the prefix tree (trie) for Wikipedia in all languages and the mapping between titles and Wikidata IDs.\n\nPre-trained **mBART** model on 125 languages available [here](https://dl.fbaipublicfiles.com/GENRE/mbart.cc100.tar.gz).\n\n## Troubleshooting\nIf the module cannot be found, preface the python command with `PYTHONPATH=.`\n\n## Licence\nGENRE is licensed under the CC-BY-NC 4.0 license. The text of the license can be found [here](https://github.com/facebookresearch/GENRE/blob/main/LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookresearch%2FGENRE","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffacebookresearch%2FGENRE","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookresearch%2FGENRE/lists"}