{"id":17132046,"url":"https://github.com/dayyass/pytorch-ner","last_synced_at":"2025-04-13T05:34:51.539Z","repository":{"id":49420992,"uuid":"314604171","full_name":"dayyass/pytorch-ner","owner":"dayyass","description":"Pipeline for training NER models using PyTorch.","archived":false,"fork":false,"pushed_at":"2022-07-19T09:19:47.000Z","size":942,"stargazers_count":58,"open_issues_count":21,"forks_count":9,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-11T02:14:17.334Z","etag":null,"topics":["deep-learning","hacktoberfest","lstm","machine-learning","named-entity-recognition","natural-language-processing","ner","nlp","onnx","pipeline","python","pytorch","rnn","text"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/pytorch-ner/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dayyass.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-11-20T16:09:34.000Z","updated_at":"2025-03-24T22:44:09.000Z","dependencies_parsed_at":"2022-09-06T07:30:31.034Z","dependency_job_id":null,"html_url":"https://github.com/dayyass/pytorch-ner","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dayyass%2Fpytorch-ner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dayyass%2Fpytorch-ner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dayyass%2Fpytorch-ner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dayyass%2Fpytorch-ner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dayyass","download_url":"https://codeload.github.com/dayyass/pytorch-ner/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248670516,"owners_count":21142896,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","hacktoberfest","lstm","machine-learning","named-entity-recognition","natural-language-processing","ner","nlp","onnx","pipeline","python","pytorch","rnn","text"],"created_at":"2024-10-14T19:25:50.946Z","updated_at":"2025-04-13T05:34:50.897Z","avatar_url":"https://github.com/dayyass.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![tests](https://github.com/dayyass/pytorch-ner/actions/workflows/tests.yml/badge.svg)](https://github.com/dayyass/pytorch-ner/actions/workflows/tests.yml)\n[![linter](https://github.com/dayyass/pytorch-ner/actions/workflows/linter.yml/badge.svg)](https://github.com/dayyass/pytorch-ner/actions/workflows/linter.yml)\n[![codecov](https://codecov.io/gh/dayyass/pytorch-ner/branch/main/graph/badge.svg?token=WSB83O6GVV)](https://codecov.io/gh/dayyass/pytorch-ner)\n\n[![python 3.6](https://img.shields.io/badge/python-3.6-blue.svg)](https://github.com/dayyass/pytorch-ner#requirements)\n[![release (latest by date)](https://img.shields.io/github/v/release/dayyass/pytorch-ner)](https://github.com/dayyass/pytorch-ner/releases/latest)\n[![license](https://img.shields.io/github/license/dayyass/pytorch-ner?color=blue)](https://github.com/dayyass/pytorch-ner/blob/main/LICENSE)\n\n[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-black)](https://github.com/dayyass/pytorch-ner/blob/main/.pre-commit-config.yaml)\n[![code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\n[![pypi version](https://img.shields.io/pypi/v/pytorch-ner)](https://pypi.org/project/pytorch-ner)\n[![pypi downloads](https://img.shields.io/pypi/dm/pytorch-ner)](https://pypi.org/project/pytorch-ner)\n\n### Named Entity Recognition (NER) with PyTorch\nPipeline for training **NER** models using **PyTorch**.\n\n**ONNX** export supported.\n\n### Usage\nInstead of writing custom code for specific NER task, you just need:\n1. install pipeline:\n```shell script\npip install pytorch-ner\n```\n2. run pipeline:\n- either in **terminal**:\n```shell script\npytorch-ner-train --path_to_config config.yaml\n```\n- or in **python**:\n```python3\nimport pytorch_ner\n\npytorch_ner.train(path_to_config=\"config.yaml\")\n```\n\n#### Config\nThe user interface consists of only one file [**config.yaml**](https://github.com/dayyass/pytorch-ner/blob/main/config.yaml).\u003cbr/\u003e\nChange it to create the desired configuration.\n\nDefault **config.yaml**:\n```yaml\ntorch:\n  device: 'cpu'\n  seed: 42\n\ndata:\n  train_data:\n    path: 'data/conll2003/train.txt'\n    sep: ' '\n    lower: true\n    verbose: true\n  valid_data:\n    path: 'data/conll2003/valid.txt'\n    sep: ' '\n    lower: true\n    verbose: true\n  test_data:\n    path: 'data/conll2003/test.txt'\n    sep: ' '\n    lower: true\n    verbose: true\n  token2idx:\n    min_count: 1\n    add_pad: true\n    add_unk: true\n\ndataloader:\n  preprocess: true\n  token_padding: '\u003cPAD\u003e'\n  label_padding: 'O'\n  percentile: 100\n  batch_size: 256\n\nmodel:\n  embedding:\n    embedding_dim: 128\n  rnn:\n    rnn_unit: LSTM  # GRU, RNN\n    hidden_size: 256\n    num_layers: 1\n    dropout: 0\n    bidirectional: true\n\noptimizer:\n  optimizer_type: Adam  # torch.optim\n  clip_grad_norm: 0.1\n  params:\n    lr: 0.001\n    weight_decay: 0\n    amsgrad: false\n\ntrain:\n  n_epoch: 10\n  verbose: true\n\nsave:\n  path_to_folder: 'models'\n  export_onnx: true\n```\n\n**NOTE**: to export trained model to **ONNX** use the following config parameter:\n```\nsave:\n  export_onnx: true\n```\n\n#### Data Format\nPipeline works with text file containing separated tokens and labels on each line. Sentences are separated by empty line.\nLabels should already be in necessary format, e.g. IO, BIO, BILUO, ...\n\nExample:\n```\ntoken_11    label_11\ntoken_12    label_12\n\ntoken_21    label_21\ntoken_22    label_22\ntoken_23    label_23\n\n...\n```\n\n#### Output\nAfter training the model, the pipeline will return the following files:\n- `model.pth` - pytorch NER model\n- `model.onnx` - onnx NER model (optional)\n- `token2idx.json` - mapping from token to its index\n- `label2idx.json` - mapping from label to its index\n- `config.yaml` - config that was used to train the model\n- `logging.txt` - logging file\n\n### Models\nList of implemented models:\n- [x] BiLTSM\n- [ ] BiLTSMCRF\n- [ ] BiLTSMAttn\n- [ ] BiLTSMAttnCRF\n- [ ] BiLTSMCNN\n- [ ] BiLTSMCNNCRF\n- [ ] BiLTSMCNNAttn\n- [ ] BiLTSMCNNAttnCRF\n\n### Evaluation\nAll results are obtained on CoNLL-2003 [dataset](https://github.com/dayyass/pytorch-ner/tree/develop/data/conll2003). We didn't search the best parameters.\n\n| Model  | Train F1-weighted | Validation F1-weighted | Test F1-weighted |\n| ------ | ----------------- | ---------------------- | ---------------- |\n| BiLSTM | 0.968             | 0.928                  | 0.876            |\n\n### Requirements\nPython \u003e= 3.6\n\n### Citation\nIf you use **pytorch_ner** in a scientific publication, we would appreciate references to the following BibTex entry:\n```bibtex\n@misc{dayyass2020ner,\n    author       = {El-Ayyass, Dani},\n    title        = {Pipeline for training NER models using PyTorch},\n    howpublished = {\\url{https://github.com/dayyass/pytorch_ner}},\n    year         = {2020}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdayyass%2Fpytorch-ner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdayyass%2Fpytorch-ner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdayyass%2Fpytorch-ner/lists"}