Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/izuna385/jel

Japanese Entity Linker.
https://github.com/izuna385/jel

allennlp entity-linking jel natural-language-processing python pytorch question-answering transformers

Last synced: 1 day ago
JSON representation

Japanese Entity Linker.

Awesome Lists containing this project

README

        

# jel: Japanese Entity Linker
* jel - Japanese Entity Linker - is Bi-encoder based entity linker for japanese.

# Usage
* Currently, `link` and `question` methods are supported.

## `el.link`
* This returnes named entity and its candidate ones from Wikipedia titles.
```python
from jel import EntityLinker
el = EntityLinker()

el.link('今日は東京都のマックにアップルを買いに行き、スティーブジョブスとドナルドに会い、堀田区に引っ越した。')
>> [
{
"text": "東京都",
"label": "GPE",
"span": [
3,
6
],
"predicted_normalized_entities": [
[
"東京都庁",
0.1084
],
[
"東京",
0.0633
],
[
"国家地方警察東京都本部",
0.0604
],
[
"東京都",
0.0598
],
...
]
},
{
"text": "アップル",
"label": "ORG",
"span": [
11,
15
],
"predicted_normalized_entities": [
[
"アップル",
0.2986
],
[
"アップル インコーポレイテッド",
0.1792
],

]
}
```

## `el.question`
* This returnes candidate entity for any question from Wikipedia titles.
```python
>>> linker.question('日本の総理大臣は?')
[('菅内閣', 0.05791765857101555), ('枢密院', 0.05592481946602986), ('党', 0.05430194711042564), ('総選挙', 0.052795400668513175)]
```

## Setup
```
$ pip install jel
$ python -m spacy download ja_core_news_md
```

## Run as API
```
$ uvicorn jel.api.server:app --reload --port 8000 --host 0.0.0.0 --log-level trace
```

### Example
```
# link
$ curl localhost:8000/link -X POST -H "Content-Type: application/json" \
-d '{"sentence": "日本の総理は菅総理だ。"}'

# question
$ curl localhost:8000/question -X POST -H "Content-Type: application/json" \
-d '{"sentence": "日本で有名な総理は?"}
```

## Test
`$ python pytest`

## Notes
* faiss==1.5.3 from pip causes error _swigfaiss.
* To solve this, see [this issue](https://github.com/facebookresearch/faiss/issues/821#issuecomment-573531694).

## LICENSE
Apache 2.0 License.

## CITATION
```
@INPROCEEDINGS{manabe2019chive,
author = {真鍋陽俊, 岡照晃, 海川祥毅, 髙岡一馬, 内田佳孝, 浅原正幸},
title = {複数粒度の分割結果に基づく日本語単語分散表現},
booktitle = "言語処理学会第25回年次大会(NLP2019)",
year = "2019",
pages = "NLP2019-P8-5",
publisher = "言語処理学会",
}
```