https://github.com/mrpeerat/mrefined
mReFinED: An Efficient End-to-End Multilingual Entity Linking System
https://github.com/mrpeerat/mrefined
deep-learning entity-linking machine-learning multilingual multilingual-entity-linking nlp
Last synced: about 1 year ago
JSON representation
mReFinED: An Efficient End-to-End Multilingual Entity Linking System
- Host: GitHub
- URL: https://github.com/mrpeerat/mrefined
- Owner: mrpeerat
- License: other
- Created: 2023-12-27T03:30:48.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-01-03T14:36:28.000Z (over 2 years ago)
- Last Synced: 2025-04-14T12:55:10.787Z (about 1 year ago)
- Topics: deep-learning, entity-linking, machine-learning, multilingual, multilingual-entity-linking, nlp
- Language: Python
- Homepage: https://aclanthology.org/2023.findings-emnlp.1007/
- Size: 225 KB
- Stars: 4
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# Overview
We propose mReFinED, the first end-to-end MEL model. mReFinED supports 9 languages: AR, EN, ES, DE, FA, JA, TA, and TR. Our experimental results in the research paper demonstrated that mReFinED outperformed the best existing work in the end-to-end MEL task while being 44 times faster compared to existing state-of-the-art (mGENRE).
# mReFinED's Paper
The mReFinED model architecture is described in the paper below (https://aclanthology.org/2023.findings-emnlp.1007):
```bibtex
@inproceedings{limkonchotiwat-etal-2023-mrefined,
title = "m{R}e{F}in{ED}: An Efficient End-to-End Multilingual Entity Linking System",
author = "Limkonchotiwat, Peerat and
Cheng, Weiwei and
Christodoulopoulos, Christos and
Saffari, Amir and
Lehmann, Jens",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023",
month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.findings-emnlp.1007",
doi = "10.18653/v1/2023.findings-emnlp.1007",
pages = "15080--15089",
}
```
## mReFinED
- This is the replica of mReFinED from [Amazon's mReFinED](https://github.com/amazon-science/ReFinED/tree/mrefined).
- We improve the training and inference codes to make them easier to reproduce.
- We also provide the mReFinED model and training data :)
## Hardware Requirements
- mReFinED has a low hardware requirement. For fast inference speed, a GPU should be used, but this is not a strict requirement.
- We create training data for 15 days (CPU only). However, the process can be sped up using GPUs (~2 days).
- We use 8 V100 in the training step for ~10 days.
- For the inference setting, we use only a single V100.
# Model, Data, and Codes
## Materials
- **Model**: XXXXXXX
- **Training data**: XXXXXXXX
## Example Script
- mReFinED: Creating training data
```
cd mReFinED/src/
export PYTHONPATH=$PYTHONPATH:src
python refined/offline_data_generation/preprocess_all_multilingual_combine.py
```
- mReFinED: Training
```
cd mReFinED/src/
export PYTHONPATH=$PYTHONPATH:src
bash refined/training/train/multilingual_train.sh
```
- Mention Detection For Unlabeled Entity in Wikipedia. However, we can skip this step using [WikiNN](https://huggingface.co/Babelscape/wikineural-multilingual-ner) instead.
```
cd mReFinED/src/refined/training/train
python multilingual_md_train_xtreme.py
python md_on_wiki.py
python multilingual_md_train_xtreme_wikipedia.py
```
- mReFinED: Inference
```python
print('hi')
```
- mReFinED on Mewsli-9
```python
print('hi')
```
## Security
See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.
## License
This library is licensed under the CC-BY-NC 4.0 License.
## Contact us
If you have questions please open Github issues instead of sending us emails, as some of the listed email addresses are no longer active.