https://github.com/chakki-works/namaco
Character Based Named Entity Recognition.
https://github.com/chakki-works/namaco
deep-learning keras machine-learning named-entity-recognition natural-language-processing
Last synced: about 2 months ago
JSON representation
Character Based Named Entity Recognition.
- Host: GitHub
- URL: https://github.com/chakki-works/namaco
- Owner: chakki-works
- Created: 2017-10-11T00:24:21.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-04-03T09:11:25.000Z (about 7 years ago)
- Last Synced: 2025-04-04T17:02:23.319Z (3 months ago)
- Topics: deep-learning, keras, machine-learning, named-entity-recognition, natural-language-processing
- Language: Python
- Homepage:
- Size: 5.24 MB
- Stars: 40
- Watchers: 5
- Forks: 10
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# namaco
***namaco*** is a library for character-based Named Entity Recognition.
namaco will especially focus on Japanese and Chinese named entity recognition.# Demo
The following demo shows Chinese Named Entity Recognition:
## Feature Support
namaco would provide following features:
* learning model by your data.
* tagging sentences by learned model.## Install
To install namaco, simply run:```
$ pip install namaco
```## Data format
The data must be in the following format(tsv):```
安 B-PERSON
倍 E-PERSON
首 O
相 O
が O
訪 O
米 S-LOC
し O
た O
本 B-DATE
日 E-DATE
```## Get Started
### Import
First, import the necessary modules:
```python
import os
import namaco
from namaco.data.reader import load_data_and_labels
from namaco.data.preprocess import prepare_preprocessor
from namaco.config import ModelConfig, TrainingConfig
from namaco.models import CharNER
```
They include loading modules, a preprocessor and configs.Then, set parameters to use later:
```python
DATA_ROOT = 'data/ja/ner'
SAVE_ROOT = './models' # trained model
LOG_ROOT = './logs' # checkpoint, tensorboard
model_file = os.path.join(SAVE_ROOT, 'model.h5')
model_config = ModelConfig()
training_config = TrainingConfig()
```### Loading data
After importing the modules, read data for training and validation:
```python
train_path = os.path.join(DATA_ROOT, 'train.txt')
valid_path = os.path.join(DATA_ROOT, 'valid.txt')
x_train, y_train = load_data_and_labels(train_path)
x_valid, y_valid = load_data_and_labels(valid_path)
```After reading the data, prepare preprocessor and model:
```python
p = prepare_preprocessor(x_train, y_train)
model = CharNER(model_config, p.vocab_size(), p.tag_size())
```Now we are ready for training :)
### Training a model
Let's train a model. For training a model, we can use ***Trainer***.
Trainer manages everything about training.
Prepare an instance of Trainer class and give train data and valid data to train method:
```python
trainer = namaco.Trainer(model,
model.loss,
training_config,
log_dir=LOG_ROOT,
save_path=model_file,
preprocessor=p)
trainer.train(x_train, y_train, x_valid, y_valid)
```If training is progressing normally, progress bar would be displayed as follows:
```commandline
...
Epoch 3/15
702/703 [============================>.] - ETA: 0s - loss: 60.0129 - f1: 89.70
703/703 [==============================] - 319s - loss: 59.9278
Epoch 4/15
702/703 [============================>.] - ETA: 0s - loss: 59.9268 - f1: 90.03
703/703 [==============================] - 324s - loss: 59.8417
Epoch 5/15
702/703 [============================>.] - ETA: 0s - loss: 58.9831 - f1: 90.67
703/703 [==============================] - 297s - loss: 58.8993
...
```### Tagging a sentence
We can use ***Tagger*** for tagging text.
Prepare an instance of Tagger class and give text to tag method:
```python
tagger = namaco.Tagger(model_file, preprocessor=p, tokenizer=list)
```Let's try to tag a sentence, `安倍首相が訪米した`
We can do it as follows:
```python
>>> sent = '安倍首相が訪米した'
>>> tagger.analyze(sent)
{
"language": "jp",
"text": "安倍首相が訪米した",
"entities": [
{
"text": "安倍",
"type": "Person",
"score": 0.972231
"beginOffset": 0,
"endOffset": 2,
},
{
"text": "米",
"type": "Location",
"score": 0.941431
"beginOffset": 6,
"endOffset": 7,
}
]
}
```