Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hironsan/neraug
A text augmentation tool for named entity recognition.
https://github.com/hironsan/neraug
deep-learning machine-learning natural-language-processing nlp
Last synced: 19 days ago
JSON representation
A text augmentation tool for named entity recognition.
- Host: GitHub
- URL: https://github.com/hironsan/neraug
- Owner: Hironsan
- License: mit
- Created: 2021-07-21T06:52:25.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2021-07-22T11:39:07.000Z (over 3 years ago)
- Last Synced: 2024-10-11T08:24:09.466Z (about 1 month ago)
- Topics: deep-learning, machine-learning, natural-language-processing, nlp
- Language: Python
- Homepage:
- Size: 121 KB
- Stars: 53
- Watchers: 6
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# neraug
This python library helps you with augmenting text data for named entity recognition.
## Augmentation Example
![](./docs/images/example.png)
Reference from [An Analysis of Simple Data Augmentation for Named Entity Recognition](https://aclanthology.org/2020.coling-main.343/)## Installation
To install the library:
```bash
pip install neraug
```## Usage
One of the example algorithms: `DictionaryReplacement`:
```python
>>> from neraug.augmentator import DictionaryReplacement
>>> from neraug.scheme import IOBES>>> ne_dic = {'Tokyo Big Sight': 'LOC'}
>>> augmentator = DictionaryReplacement(ne_dic, str.split, IOBES)
>>> x = ['I', 'went', 'to', 'Tokyo']
>>> y = ['O', 'O', 'O', 'S-LOC']
>>> x_augs, y_augs = augmentator.augment(x, y, n=1)
>>> x_augs
[['I', 'went', 'to', 'Tokyo', 'Big', 'Sight']]
>>> y_augs
[['O', 'O', 'O', 'B-LOC', 'I-LOC', 'E-LOC']]
```The library supports the following algorithms:
- DictionaryReplacement
- LabelWiseTokenReplacement
- MentionReplacement
- ShuffleWithinSegmentand supports the following scheme:
- IOB2
- IOBES
- BILOU## Reference
Appreciate for the following research:
- [An Analysis of Simple Data Augmentation for Named Entity Recognition](https://aclanthology.org/2020.coling-main.343/)
## Citation
```latex
@misc{neraug,
title={neraug: A data augmentation tool for named entity recognition},
author={Hiroki Nakayama},
url={https://github.com/Hironsan/neraug},
year={2021}
}
```