{"id":13512731,"url":"https://github.com/studio-ousia/luke","last_synced_at":"2026-04-03T20:06:34.728Z","repository":{"id":37397911,"uuid":"171815215","full_name":"studio-ousia/luke","owner":"studio-ousia","description":"LUKE -- Language Understanding with Knowledge-based Embeddings","archived":false,"fork":false,"pushed_at":"2023-11-19T14:32:06.000Z","size":4270,"stargazers_count":722,"open_issues_count":14,"forks_count":101,"subscribers_count":24,"default_branch":"master","last_synced_at":"2025-03-31T00:31:40.059Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/studio-ousia.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-02-21T06:40:46.000Z","updated_at":"2025-03-19T16:09:42.000Z","dependencies_parsed_at":"2024-01-13T21:46:25.806Z","dependency_job_id":null,"html_url":"https://github.com/studio-ousia/luke","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/studio-ousia/luke","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/studio-ousia%2Fluke","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/studio-ousia%2Fluke/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/studio-ousia%2Fluke/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/studio-ousia%2Fluke/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/studio-ousia","download_url":"https://codeload.github.com/studio-ousia/luke/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/studio-ousia%2Fluke/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31374088,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-03T17:53:18.093Z","status":"ssl_error","status_checked_at":"2026-04-03T17:53:17.617Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T04:00:29.713Z","updated_at":"2026-04-03T20:06:34.705Z","avatar_url":"https://github.com/studio-ousia.png","language":"Jupyter Notebook","funding_links":[],"categories":["Papers","Jupyter Notebook"],"sub_categories":["Language Models"],"readme":"\u003cimg src=\"resources/luke_logo.png\" width=\"200\" alt=\"LUKE\"\u003e\n\n[![CircleCI](https://circleci.com/gh/studio-ousia/luke.svg?style=svg\u0026circle-token=49524bfde04659b8b54509f7e0f06ec3cf38f15e)](https://circleci.com/gh/studio-ousia/luke)\n\n---\n\n**LUKE** (**L**anguage **U**nderstanding with **K**nowledge-based\n**E**mbeddings) is a new pretrained contextualized representation of words and\nentities based on transformer. It was proposed in our paper\n[LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057).\nIt achieves state-of-the-art results on important NLP benchmarks including\n**[SQuAD v1.1](https://rajpurkar.github.io/SQuAD-explorer/)** (extractive\nquestion answering),\n**[CoNLL-2003](https://www.clips.uantwerpen.be/conll2003/ner/)** (named entity\nrecognition), **[ReCoRD](https://sheng-z.github.io/ReCoRD-explorer/)**\n(cloze-style question answering),\n**[TACRED](https://nlp.stanford.edu/projects/tacred/)** (relation\nclassification), and\n**[Open Entity](https://www.cs.utexas.edu/~eunsol/html_pages/open_entity.html)**\n(entity typing).\n\nThis repository contains the source code to pretrain the model and fine-tune it\nto solve downstream tasks.\n\n## News\n\n**November 9, 2022: The large version of LUKE-Japanese is available**\n\nThe large version of LUKE-Japanese is available on the Hugging Face Model Hub:\n\n- [luke-japanese-large](https://huggingface.co/studio-ousia/luke-japanese-large)\n- [luke-japanese-large-lite](https://huggingface.co/studio-ousia/luke-japanese-large-lite)\n\nThis model achieves state-of-the-art results on three datasets in\n[JGLUE](https://github.com/yahoojapan/JGLUE).\n\n| Model                         | MARC-ja   | JSTS                | JNLI      | JCommonsenseQA |\n| ----------------------------- | --------- | ------------------- | --------- | -------------- |\n|                               | acc       | Pearson/Spearman    | acc       | acc            |\n| **LUKE Japanese large**       | **0.965** | **0.932**/**0.902** | **0.927** | 0.893          |\n| _Baselines:_                  |           |\n| Tohoku BERT large             | 0.955     | 0.913/0.872         | 0.900     | 0.816          |\n| Waseda RoBERTa large (seq128) | 0.954     | 0.930/0.896         | 0.924     | **0.907**      |\n| Waseda RoBERTa large (seq512) | 0.961     | 0.926/0.892         | 0.926     | 0.891          |\n| XLM RoBERTa large             | 0.964     | 0.918/0.884         | 0.919     | 0.840          |\n\n**October 27, 2022: The Japanese version of LUKE is available**\n\nThe Japanese version of LUKE is now available on the Hugging Face Model Hub:\n\n- [luke-japanese-base](https://huggingface.co/studio-ousia/luke-japanese-base)\n- [luke-japanese-base-lite](https://huggingface.co/studio-ousia/luke-japanese-base-lite)\n\nThis model outperforms other base-sized models on four datasets in\n[JGLUE](https://github.com/yahoojapan/JGLUE).\n\n| Model                  | MARC-ja   | JSTS                | JNLI      | JCommonsenseQA |\n| ---------------------- | --------- | ------------------- | --------- | -------------- |\n|                        | acc       | Pearson/Spearman    | acc       | acc            |\n| **LUKE Japanese base** | **0.965** | **0.916**/**0.877** | **0.912** | **0.842**      |\n| _Baselines:_           |           |\n| Tohoku BERT base       | 0.958     | 0.909/0.868         | 0.899     | 0.808          |\n| NICT BERT base         | 0.958     | 0.910/0.871         | 0.902     | 0.823          |\n| Waseda RoBERTa base    | 0.962     | 0.913/0.873         | 0.895     | 0.840          |\n| XLM RoBERTa base       | 0.961     | 0.877/0.831         | 0.893     | 0.687          |\n\n**April 13, 2022: The mLUKE fine-tuning code is available**\n\n[The example code](examples) is updated. Now it is based on\n[allennlp](https://github.com/allenai/allennlp) and\n[transformers](https://github.com/huggingface/transformers). You can reproduce\nthe experiments in the [LUKE](https://arxiv.org/abs/2010.01057) and\n[mLUKE](https://arxiv.org/abs/2110.08151) papers with this implementation. For\nthe details, please see `README.md` under each example directory. The older code\nused in [the LUKE paper](https://arxiv.org/abs/2010.01057) has been moved to\n[`examples/legacy`](examples/legacy).\n\n**April 13, 2022: The detailed instructions for pretraining LUKE models are\navailable**\n\nFor those interested in pretraining LUKE models, we explain how to prepare\ndatasets and run the pretraining code on [`pretraining.md`](pretraining.md).\n\n**November 24, 2021: Entity disambiguation example is available**\n\nThe example code of entity disambiguation based on LUKE has been added to this\nrepository. This model was originally proposed in\n[our paper](https://arxiv.org/abs/1909.00426), and achieved state-of-the-art\nresults on five standard entity disambiguation datasets: AIDA-CoNLL, MSNBC,\nAQUAINT, ACE2004, and WNED-WIKI.\n\nFor further details, please refer to\n[`examples/entity_disambiguation`](examples/entity_disambiguation).\n\n**August 3, 2021: New example code based on Hugging Face Transformers and\nAllenNLP is available**\n\nNew fine-tuning examples of three downstream tasks, i.e., _NER_, _relation\nclassification_, and _entity typing_, have been added to LUKE. These examples\nare developed based on Hugging Face Transformers and AllenNLP. The fine-tuning\nmodels are defined using simple AllenNLP's Jsonnet config files!\n\nThe example code is available in [`examples`](examples).\n\n**May 5, 2021: LUKE is added to Hugging Face Transformers**\n\nLUKE has been added to the\n[master branch of the Hugging Face Transformers library](https://github.com/huggingface/transformers).\nYou can now solve entity-related tasks (e.g., named entity recognition, relation\nclassification, entity typing) easily using this library.\n\nFor example, the LUKE-large model fine-tuned on the TACRED dataset can be used\nas follows:\n\n```python\nfrom transformers import LukeTokenizer, LukeForEntityPairClassification\nmodel = LukeForEntityPairClassification.from_pretrained(\"studio-ousia/luke-large-finetuned-tacred\")\ntokenizer = LukeTokenizer.from_pretrained(\"studio-ousia/luke-large-finetuned-tacred\")\ntext = \"Beyoncé lives in Los Angeles.\"\nentity_spans = [(0, 7), (17, 28)]  # character-based entity spans corresponding to \"Beyoncé\" and \"Los Angeles\"\ninputs = tokenizer(text, entity_spans=entity_spans, return_tensors=\"pt\")\noutputs = model(**inputs)\nlogits = outputs.logits\npredicted_class_idx = int(logits[0].argmax())\nprint(\"Predicted class:\", model.config.id2label[predicted_class_idx])\n# Predicted class: per:cities_of_residence\n```\n\nWe also provide the following three Colab notebooks that show how to reproduce\nour experimental results on CoNLL-2003, TACRED, and Open Entity datasets using\nthe library:\n\n- [Reproducing experimental results of LUKE on CoNLL-2003 Using Hugging Face Transformers](https://colab.research.google.com/github/studio-ousia/luke/blob/master/notebooks/huggingface_conll_2003.ipynb)\n- [Reproducing experimental results of LUKE on TACRED Using Hugging Face Transformers](https://colab.research.google.com/github/studio-ousia/luke/blob/master/notebooks/huggingface_tacred.ipynb)\n- [Reproducing experimental results of LUKE on Open Entity Using Hugging Face Transformers](https://colab.research.google.com/github/studio-ousia/luke/blob/master/notebooks/huggingface_open_entity.ipynb)\n\nPlease refer to the\n[official documentation](https://huggingface.co/transformers/master/model_doc/luke.html)\nfor further details.\n\n**November 5, 2021: LUKE-500K (base) model**\n\nWe released LUKE-500K (base), a new pretrained LUKE model which is smaller than\nexisting LUKE-500K (large). The experimental results of the LUKE-500K (base) and\nLUKE-500K (large) on SQuAD v1 and CoNLL-2003 are shown as follows:\n\n| Task                          | Dataset                                                      | Metric | LUKE-500K (base) | LUKE-500K (large) |\n| ----------------------------- | ------------------------------------------------------------ | ------ | ---------------- | ----------------- |\n| Extractive Question Answering | [SQuAD v1.1](https://rajpurkar.github.io/SQuAD-explorer/)    | EM/F1  | 86.1/92.3        | 90.2/95.4         |\n| Named Entity Recognition      | [CoNLL-2003](https://www.clips.uantwerpen.be/conll2003/ner/) | F1     | 93.3             | 94.3              |\n\nWe tuned only the batch size and learning rate in the experiments based on\nLUKE-500K (base).\n\n## Comparison with State-of-the-Art\n\nLUKE outperforms the previous state-of-the-art methods on five important NLP\ntasks:\n\n| Task                           | Dataset                                                                      | Metric | LUKE-500K (large) | Previous SOTA                                                             |\n| ------------------------------ | ---------------------------------------------------------------------------- | ------ | ----------------- | ------------------------------------------------------------------------- |\n| Extractive Question Answering  | [SQuAD v1.1](https://rajpurkar.github.io/SQuAD-explorer/)                    | EM/F1  | **90.2**/**95.4** | 89.9/95.1 ([Yang et al., 2019](https://arxiv.org/abs/1906.08237))         |\n| Named Entity Recognition       | [CoNLL-2003](https://www.clips.uantwerpen.be/conll2003/ner/)                 | F1     | **94.3**          | 93.5 ([Baevski et al., 2019](https://arxiv.org/abs/1903.07785))           |\n| Cloze-style Question Answering | [ReCoRD](https://sheng-z.github.io/ReCoRD-explorer/)                         | EM/F1  | **90.6**/**91.2** | 83.1/83.7 ([Li et al., 2019](https://www.aclweb.org/anthology/D19-6011/)) |\n| Relation Classification        | [TACRED](https://nlp.stanford.edu/projects/tacred/)                          | F1     | **72.7**          | 72.0 ([Wang et al. , 2020](https://arxiv.org/abs/2002.01808))             |\n| Fine-grained Entity Typing     | [Open Entity](https://www.cs.utexas.edu/~eunsol/html_pages/open_entity.html) | F1     | **78.2**          | 77.6 ([Wang et al. , 2020](https://arxiv.org/abs/2002.01808))             |\n\nThese numbers are reported in\n[our EMNLP 2020 paper](https://arxiv.org/abs/2010.01057).\n\n## Installation\n\nLUKE can be installed using [Poetry](https://python-poetry.org/):\n\n```bash\npoetry install\n\n# If you want to run pretraining for LUKE\npoetry install --extras \"pretraining opennlp\"\n# If you want to run pretraining for mLUKE\npoetry install --extras \"pretraining icu\"\n```\n\nThe virtual environment automatically created by Poetry can be activated by\n`poetry shell`.\n\n**A note on installing `torch`**\n\nThe pytorch installed via `poetry install` does not necessarily match your\nhardware. In such case, see [the official site](https://pytorch.org/) and\nreinstall the correct version with the `pip` command.\n\n```bash\npoetry run pip3 uninstall torch torchvision torchaudio\n# Example for Linux with CUDA 11.3\npoetry run pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113\n```\n\n## Released Models\n\nOur pretrained models can be used with the\n[transformers](https://github.com/huggingface/transformers) library. The model\ndocumentations can be found in the following links:\n[LUKE](https://huggingface.co/docs/transformers/main/en/model_doc/luke) and\n[mLUKE](https://huggingface.co/docs/transformers/main/en/model_doc/mluke).\n\nCurrently, the following models are available on\n[the Hugging Face Model Hub](https://huggingface.co/models).\n\n|           Name            |                                         model_name                                          | Entity Vocab Size | Params |\n| :-----------------------: | :-----------------------------------------------------------------------------------------: | :---------------: | :----: |\n|      **LUKE (base)**      |           [studio-ousia/luke-base](https://huggingface.co/studio-ousia/luke-base)           |       500K        | 253 M  |\n|     **LUKE (large)**      |          [studio-ousia/luke-large](https://huggingface.co/studio-ousia/luke-large)          |       500K        | 484 M  |\n|     **mLUKE (base)**      |          [studio-ousia/mluke-base](https://huggingface.co/studio-ousia/mluke-base)          |       1.2M        | 586 M  |\n|     **mLUKE (large)**     |         [studio-ousia/mluke-large](https://huggingface.co/studio-ousia/mluke-large)         |       1.2M        | 868 M  |\n| **LUKE Japanese (base)**  |  [studio-ousia/luke-japanese-base](https://huggingface.co/studio-ousia/luke-japanese-base)  |       570K        | 281 M  |\n| **LUKE Japanese (large)** | [studio-ousia/luke-japanese-large](https://huggingface.co/studio-ousia/luke-japanese-large) |       570K        | 562 M  |\n\n### Lite Models\n\nThe entity embeddings cause a large memory footprint as they contain all the\nWikipedia entities that we used in pretraining. However, in some downstream\ntasks (e.g., entity typing, named entity recognition, and relation\nclassification), we only need special entity embeddings such as `[MASK]`. Also,\nyou may want to only use the word representations.\n\nWith such use-cases in mind, to make our models easier to use, we have uploaded\nlite models only with special entity embeddings. These models perform exactly\nthe same as the full models but have much fewer parameters, which enable\nfine-tuning the model with small GPUs.\n\n|           Name            |                                              model_name                                               | Params |\n| :-----------------------: | :---------------------------------------------------------------------------------------------------: | :----: |\n|      **LUKE (base)**      |           [studio-ousia/luke-base-lite](https://huggingface.co/studio-ousia/luke-base-lite)           | 125 M  |\n|     **LUKE (large)**      |          [studio-ousia/luke-large-lite](https://huggingface.co/studio-ousia/luke-large-lite)          | 356 M  |\n|     **mLUKE (base)**      |          [studio-ousia/mluke-base-lite](https://huggingface.co/studio-ousia/mluke-base-lite)          | 279 M  |\n|     **mLUKE (large)**     |         [studio-ousia/mluke-large-lite](https://huggingface.co/studio-ousia/mluke-large-lite)         | 561 M  |\n| **LUKE Japanese (base)**  |  [studio-ousia/luke-japanese-base-lite](https://huggingface.co/studio-ousia/luke-japanese-base-lite)  | 134 M  |\n| **LUKE Japanese (large)** | [studio-ousia/luke-japanese-large-lite](https://huggingface.co/studio-ousia/luke-japanese-large-lite) | 415 M  |\n\n## Fine-tuning LUKE models\n\nWe release the fine-tuning code based on\n[allennlp](https://github.com/allenai/allennlp) and\n[transformers](https://github.com/huggingface/transformers) under\n[`examples`](examples). You can run fine-tuning experiments very easily with\npre-defined config files and the `allennlp train` command. For the details and\nexample commands for each task, please see the task directory under\n[`examples`](examples).\n\n## Pretraining LUKE models\n\nThe detailed instructions for pretraining luke models can be found on\n[`pretraining.md`](pretraining.md).\n\n## Citation\n\nIf you use LUKE in your work, please cite the\n[original paper](https://aclanthology.org/2020.emnlp-main.523/).\n\n```\n@inproceedings{yamada-etal-2020-luke,\n    title = \"{LUKE}: Deep Contextualized Entity Representations with Entity-aware Self-attention\",\n    author = \"Yamada, Ikuya  and\n      Asai, Akari  and\n      Shindo, Hiroyuki  and\n      Takeda, Hideaki  and\n      Matsumoto, Yuji\",\n    booktitle = \"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)\",\n    year = \"2020\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2020.emnlp-main.523\",\n    doi = \"10.18653/v1/2020.emnlp-main.523\",\n}\n```\n\nFor mLUKE, please cite\n[this paper](https://aclanthology.org/2022.acl-long.505/).\n\n```\n@inproceedings{ri-etal-2022-mluke,\n    title = \"m{LUKE}: {T}he Power of Entity Representations in Multilingual Pretrained Language Models\",\n    author = \"Ri, Ryokan  and\n      Yamada, Ikuya  and\n      Tsuruoka, Yoshimasa\",\n    booktitle = \"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)\",\n    year = \"2022\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2022.acl-long.505\",\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstudio-ousia%2Fluke","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstudio-ousia%2Fluke","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstudio-ousia%2Fluke/lists"}