https://github.com/oya163/nepali-ner

Named Entity Recognition in Nepali Language
https://github.com/oya163/nepali-ner

flask heroku named-entity-recognition nepali nepali-language ner

Last synced: about 1 year ago
JSON representation

Named Entity Recognition in Nepali Language

Host: GitHub
URL: https://github.com/oya163/nepali-ner
Owner: oya163
License: mit
Created: 2019-11-15T19:02:44.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2023-01-12T18:29:52.000Z (over 3 years ago)
Last Synced: 2025-03-29T21:11:17.836Z (over 1 year ago)
Topics: flask, heroku, named-entity-recognition, nepali, nepali-language, ner
Language: Jupyter Notebook
Homepage: https://nepner.herokuapp.com/
Size: 79.3 MB
Stars: 10
Watchers: 3
Forks: 4
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Named Entity Recognition for Nepali Language

Code to reproduce [Named Entity Recognition for Nepali Language](https://arxiv.org/abs/1908.05828)

We publicly release Nepali NER Dataset version 1 and version 2. We have named this dataset as EBIQUITY as we published this paper/dataset while working in EBIQUITY lab in UMBC. They are further divided into raw and stemmed (brute-force approach) version.

* v1 - IO tagging scheme

* v2 - BIO tagging scheme with corrections. Correction details are stated in README.txt inside the dataset folder. **Recommended to use**

National Nepali Corpus can be found [here](https://www.sketchengine.eu/nepali-national-corpus/)

Nepali sentences were collected from online news website of the year [2015-2016](https://github.com/sndsabin/Nepali-News-Classifier) and [2009-2010](https://pdfs.semanticscholar.org/c8c4/d371c9b8a759b3927de6c2b0f1fa98f4501c.pdf)

## Dataset statistics

Based on number of tokens for entities

| Entities        | EBIQUITY | ILPRL |

|-----------------|------|-------|

| PER             | 5059 | 262   |

| ORG             | 3811 | 180   |

| LOC             | 2313 | 273   |

| MISC            | 0    | 461   |

| Total sentences | 3606 | 548   |

## Embedding comparison

| Embeddings          | Raw       | Stemmed |

|---------------------|-----------|---------|

| Random              | 73.98     | 76.410  |

| Word2Vec_CBOW       | 74.465    | 82.230  |

| Word2Vec_Skip Gram  | 76.873    | 84.330  |

| GloVe               | 75.718    | 83.833  |

| fastText_Pretrained | 80.403    | 82.068  |

| fastText_CBOW       | 78.343    | 81.415  |

| fastText_Skip Gram  | **81.793**    | **85.535**  |

## Results

These results are obtained using [conlleval] (https://www.clips.uantwerpen.be/conll2000/chunking/conlleval.txt) tools

| Model                | EBIQUITY | ILPRL  |

|------------------------|----------|--------|

| Stanford CRF           | 75.160   | 56.250 |

| BiLSTM                 | 85.535   | 77.718 |

| BiLSTM + POS           | 84.235   | 81.963 |

| BiLSTM + CNN (C)       | 86.520   | 80.045 |

| BiLSTM + CNN (G)       | **86.893**   | 80.843 |

| BiLSTM + CNN (C) + POS | 84.970   | 81.860 |

| BiLSTM + CNN (G) + POS | 85.210   | **82.190** |

## Comparison

| Model                   | EBIQUITY | ILPRL  |

|---------------------------|----------|--------|

| Bam et al. SVM            | 66.26    | 46.26  |

| Ma and Hovy w/ glove      | 83.63    | 72.1   |

| Lample et al. w/ word2vec | 86.49    | 78.48  |

| BiLSTM + CNN (G)          | **86.893**   | 80.843 |

| BiLSTM + CNN (G) + POS    | 85.210   | **82.190** |

## Usage

To run 5-fold cross validation for BiLSTM + POS + Grapheme-level CNN model

    python main.py -k 5 -d cuda:0 -p -g

## Web App

- A simple flask based [web app](https://nepner.herokuapp.com/)

## Reference

- https://github.com/bamtercelboo/pytorch_NER_BiLSTM_CNN_CRF

## Contact

- osingh1@umbc.edu

## Citation

If this dataset helped you in your research, feel free to cite the paper :smile:

	@INPROCEEDINGS{8998477,

	author={O. M. {Singh} and A. {Padia} and A. {Joshi}},

	booktitle={2019 IEEE 5th International Conference on Collaboration and Internet Computing (CIC)},

	title={Named Entity Recognition for Nepali Language},

	year={2019},

	volume={},

	number={},

	pages={184-190},

	keywords={Named Entity Recognition;Nepali;Low-resource;BiLSTM;CNN;Grapheme},

	doi={10.1109/CIC48465.2019.00031},

	ISSN={null},

	month={Dec},}

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/oya163/nepali-ner

Awesome Lists containing this project

README