{"id":25670102,"url":"https://github.com/oya163/nepali-ner","last_synced_at":"2025-04-23T02:22:08.440Z","repository":{"id":87687614,"uuid":"221986119","full_name":"oya163/nepali-ner","owner":"oya163","description":"Named Entity Recognition in Nepali Language","archived":false,"fork":false,"pushed_at":"2023-01-12T18:29:52.000Z","size":83147,"stargazers_count":10,"open_issues_count":0,"forks_count":4,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-29T21:11:17.836Z","etag":null,"topics":["flask","heroku","named-entity-recognition","nepali","nepali-language","ner"],"latest_commit_sha":null,"homepage":"https://nepner.herokuapp.com/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oya163.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-11-15T19:02:44.000Z","updated_at":"2024-09-09T17:19:41.000Z","dependencies_parsed_at":null,"dependency_job_id":"b290021b-e89a-4c18-955c-e4e0821640b1","html_url":"https://github.com/oya163/nepali-ner","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oya163%2Fnepali-ner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oya163%2Fnepali-ner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oya163%2Fnepali-ner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oya163%2Fnepali-ner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oya163","download_url":"https://codeload.github.com/oya163/nepali-ner/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250354962,"owners_count":21416820,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["flask","heroku","named-entity-recognition","nepali","nepali-language","ner"],"created_at":"2025-02-24T11:29:34.446Z","updated_at":"2025-04-23T02:22:08.432Z","avatar_url":"https://github.com/oya163.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Named Entity Recognition for Nepali Language\r\n\r\nCode to reproduce [Named Entity Recognition for Nepali Language](https://arxiv.org/abs/1908.05828)\r\n\r\nWe publicly release Nepali NER Dataset version 1 and version 2. We have named this dataset as EBIQUITY as we published this paper/dataset while working in EBIQUITY lab in UMBC. They are further divided into raw and stemmed (brute-force approach) version.\r\n\r\n* v1 - IO tagging scheme\r\n* v2 - BIO tagging scheme with corrections. Correction details are stated in README.txt inside the dataset folder. **Recommended to use**\r\n\r\nNational Nepali Corpus can be found [here](https://www.sketchengine.eu/nepali-national-corpus/)\r\n\r\nNepali sentences were collected from online news website of the year [2015-2016](https://github.com/sndsabin/Nepali-News-Classifier) and [2009-2010](https://pdfs.semanticscholar.org/c8c4/d371c9b8a759b3927de6c2b0f1fa98f4501c.pdf)\r\n\r\n## Dataset statistics\r\n\r\nBased on number of tokens for entities\r\n\r\n| Entities        | EBIQUITY | ILPRL |\r\n|-----------------|------|-------|\r\n| PER             | 5059 | 262   |\r\n| ORG             | 3811 | 180   |\r\n| LOC             | 2313 | 273   |\r\n| MISC            | 0    | 461   |\r\n| Total sentences | 3606 | 548   |\r\n\r\n## Embedding comparison\r\n| Embeddings          | Raw       | Stemmed |\r\n|---------------------|-----------|---------|\r\n| Random              | 73.98     | 76.410  |\r\n| Word2Vec_CBOW       | 74.465    | 82.230  |\r\n| Word2Vec_Skip Gram  | 76.873    | 84.330  |\r\n| GloVe               | 75.718    | 83.833  |\r\n| fastText_Pretrained | 80.403    | 82.068  |\r\n| fastText_CBOW       | 78.343    | 81.415  |\r\n| fastText_Skip Gram  | **81.793**    | **85.535**  |\r\n\r\n## Results\r\n\r\nThese results are obtained using [conlleval] (https://www.clips.uantwerpen.be/conll2000/chunking/conlleval.txt) tools\r\n\r\n| Model                | EBIQUITY | ILPRL  |\r\n|------------------------|----------|--------|\r\n| Stanford CRF           | 75.160   | 56.250 |\r\n| BiLSTM                 | 85.535   | 77.718 |\r\n| BiLSTM + POS           | 84.235   | 81.963 |\r\n| BiLSTM + CNN (C)       | 86.520   | 80.045 |\r\n| BiLSTM + CNN (G)       | **86.893**   | 80.843 |\r\n| BiLSTM + CNN (C) + POS | 84.970   | 81.860 |\r\n| BiLSTM + CNN (G) + POS | 85.210   | **82.190** |\r\n\r\n## Comparison\r\n\r\n| Model                   | EBIQUITY | ILPRL  |\r\n|---------------------------|----------|--------|\r\n| Bam et al. SVM            | 66.26    | 46.26  |\r\n| Ma and Hovy w/ glove      | 83.63    | 72.1   |\r\n| Lample et al. w/ word2vec | 86.49    | 78.48  |\r\n| BiLSTM + CNN (G)          | **86.893**   | 80.843 |\r\n| BiLSTM + CNN (G) + POS    | 85.210   | **82.190** |\r\n\r\n## Usage\r\n\r\nTo run 5-fold cross validation for BiLSTM + POS + Grapheme-level CNN model\r\n\r\n    python main.py -k 5 -d cuda:0 -p -g\r\n\r\n\r\n## Web App\r\n- A simple flask based [web app](https://nepner.herokuapp.com/)\r\n\r\n\r\n## Reference\r\n- https://github.com/bamtercelboo/pytorch_NER_BiLSTM_CNN_CRF\r\n\r\n\r\n## Contact\r\n- osingh1@umbc.edu\r\n\r\n\r\n## Citation\r\n\r\nIf this dataset helped you in your research, feel free to cite the paper :smile:\r\n\r\n\t@INPROCEEDINGS{8998477,\r\n\tauthor={O. M. {Singh} and A. {Padia} and A. {Joshi}},\r\n\tbooktitle={2019 IEEE 5th International Conference on Collaboration and Internet Computing (CIC)},\r\n\ttitle={Named Entity Recognition for Nepali Language},\r\n\tyear={2019},\r\n\tvolume={},\r\n\tnumber={},\r\n\tpages={184-190},\r\n\tkeywords={Named Entity Recognition;Nepali;Low-resource;BiLSTM;CNN;Grapheme},\r\n\tdoi={10.1109/CIC48465.2019.00031},\r\n\tISSN={null},\r\n\tmonth={Dec},}\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foya163%2Fnepali-ner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foya163%2Fnepali-ner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foya163%2Fnepali-ner/lists"}