{"id":16190693,"url":"https://github.com/devmount/neural-network-pos-tagger","last_synced_at":"2025-06-17T14:34:49.822Z","repository":{"id":39712773,"uuid":"135211520","full_name":"devmount/neural-network-pos-tagger","owner":"devmount","description":"Train and evaluate neural network language models for POS tagging, tag input sentences according to a trained model.","archived":false,"fork":false,"pushed_at":"2024-07-30T21:18:22.000Z","size":20631,"stargazers_count":12,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-02T22:51:12.728Z","etag":null,"topics":["embeddings","feedforward-neural-network","neural-network","neural-networks","nlp","part-of-speech-tagger","pos-tagger","pos-tagging","recurrent-neural-networks","word-embeddings"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/devmount.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-05-28T21:37:35.000Z","updated_at":"2024-07-30T21:18:20.000Z","dependencies_parsed_at":"2024-10-27T19:53:26.324Z","dependency_job_id":null,"html_url":"https://github.com/devmount/neural-network-pos-tagger","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/devmount/neural-network-pos-tagger","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devmount%2Fneural-network-pos-tagger","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devmount%2Fneural-network-pos-tagger/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devmount%2Fneural-network-pos-tagger/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devmount%2Fneural-network-pos-tagger/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/devmount","download_url":"https://codeload.github.com/devmount/neural-network-pos-tagger/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devmount%2Fneural-network-pos-tagger/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260379808,"owners_count":23000144,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["embeddings","feedforward-neural-network","neural-network","neural-networks","nlp","part-of-speech-tagger","pos-tagger","pos-tagging","recurrent-neural-networks","word-embeddings"],"created_at":"2024-10-10T07:43:55.954Z","updated_at":"2025-06-17T14:34:44.807Z","avatar_url":"https://github.com/devmount.png","language":"Python","funding_links":["https://www.buymeacoffee.com/devmount"],"categories":[],"sub_categories":[],"readme":"# Neural Network POS Tagger\n\n\u003e Part-of-Speech Tagging with Neural Networks for a conversational agent\n\nThis toolkit was implemented during my master's thesis, that aimed to improve the natural language understanding of an artificial conversational agent. This agent utilized a Hidden Markov Model to calculate Part-of-Speech tags for input words. In order to achieve better results, two different classification architectures are implemented and evaluated: a Feed-forward Neural Network and a Recurrent Neural Network.\n\nThis repository contains the toolkit to train and evaluate language models for POS tagging and tag input sentences according to a trained model. It provides the possibility to use the python scripts directly as well as an API and is licensed under GPL-3.0.\n\n## Setup\n\nCheck if Python version \u003e= 3.5 is installed:\n\n    $ python --version\n    Python 3.6.3\n\nInstall dependencies (consider using a virtual environment):\n\n    pip install -r requirements.txt\n\nIf the installation was successful, change to the directory of the Tagger and everything should be ready to run properly:\n\n    cd fnn-tagger/\n\n## Script Usage\n\nThe single python scripts of this toolkit can be called directly, documented in the following.\n\n### Configuration\n\nStatic settings are located in the `settings.py` script. It contains the following configuration options:\n\n| option | description |\n| ------ | ----------- |\n| `ARCHITECTURE` | Neural network architecture that will be used. Possible values: `FNN`, `RNN` |\n| `VOCAB_SIZE` | Setup dimension of the vocabulary |\n| `N_PAST_WORDS` | Number of preceding words to take into account for the POS tag training of the current word (FNN only) |\n| `N_TIMESTEPS` | Number of previous training steps to take into account (RNN only) |\n| `EMBEDDING_SIZE` | Dimension of the word embeddings (FNN only) |\n| `H_SIZE` | Dimension of the hidden layer |\n| `TEST_RATIO` | Ratio of test data extracted from the training data |\n| `BATCH_SIZE` | Size of the training batches |\n| `N_EPOCHS` | Number of training epochs |\n| `CHECKPOINT_EVERY` | Evaluate and save model state after this number of trainings steps |\n| `REPLACEMENT_FILE` | Preprocess training data by normalizing terms with the helo of replacements, stored in this file |\n\nTraining, evaluation and tagging can be executed using the `tagger.py` script, which represents the core script of this toolkit. Its general usage is:\n\n    python tagger.py [-h] [--train TRAIN] [--tag TAG] [--evaluate EVALUATE]\n                     [--reset] [-p PASTWORDS] [-e EMBEDDINGSIZE] [-s HIDDENSIZE]\n                     [-n NEPOCHS] [-t TIMESTEPS] [-f] [-q] [-i]\n\n    optional arguments:\n      -h, --help            show this help message and exit\n      --train TRAIN         Invokes training of a language model on given corpus\n      --tag TAG             Tags a given sentence with the pretrained language\n                            model\n      --evaluate EVALUATE   Evaluates pretrained language model with a given\n                            evaluation file\n      --reset               Removes all stored training and log data\n      -p PASTWORDS, --pastwords PASTWORDS\n                            Number of preceding words to take into account\n      -e EMBEDDINGSIZE, --embeddingsize EMBEDDINGSIZE\n                            Dimension of the word embeddings\n      -s HIDDENSIZE, --hiddensize HIDDENSIZE\n                            Dimension of the hidden layer\n      -n NEPOCHS, --nepochs NEPOCHS\n                            Number of training epochs\n      -t TIMESTEPS, --timesteps TIMESTEPS\n                            Number of past trained words\n      -f, --force           Force operation without confirmation\n      -q, --quiet           No output messages\n      -i, --inline          Only one line output\n\nHowever, the following sections explain the usage of the specific flags and their apropriate combination for each action.\n\n### Training\n\nTo train the Tagger call the `tagger.py` script with the `--train` flag. According to your static configuration, the batch training will start. Once you reached a sufficient accuracy, you can interrupt the training with \u003ckbd\u003eCTRL\u003c/kbd\u003e+\u003ckbd\u003eC\u003c/kbd\u003e or wait till the training process finishes.\n\n    $ python tagger.py --train data/test.corpus\n    Training starts...\n    Loading training data from \"data/test.corpus\"...\n    Generating vocabulary...\n    Generating tensors...\n    Initializing model...\n    Step 100: loss 0.9, accuracy 91% - saved model checkpoint to 'saved/model-100'\n    Step 200: loss 0.2, accuracy 98% - saved model checkpoint to 'saved/model-200'\n    Step 300: loss 0.0, accuracy 100% - saved model checkpoint to 'saved/model-300'\n\nYou can also call the script with inline configuration. To train a model using the FNN architecture, use the flags `-p`, `-e`, `-s`, and `-n`. It is required to use exactly these 4 flags, otherwise the static configuration from the `settings.py` will be used. An example call would be:\n\n    python tagger.py --train data/test.corpus -p 1 -e 250 -s 350 -n 5\n\nTo train a model using the RNN architecture, use the flags `-t`, `-e`, `-s`, and `-n`. It is required to use exaclty these 4 flags, otherwise the static configuration from the `settings.py` will be used. An example call would be:\n\n    python tagger.py --train data/test.corpus -t 8 -e 100 -s 100 -n 5\n\n### Tagging\n\nTo tag a sentence with a pretrained model call the `tagger.py` script with the `--tag` parameter followed by sentence to be tagged. Now a tag is attached to every word.\n\n    $ python tagger.py --tag \"Show all modules of Bachelor Informatics\"\n    The tagged sentence is:\n     Show     all       modules     of       Bachelor         Informatics\n    R_LIST   R_LIST   M_MTSModule    X    C_Program:degree   C_Program:name\n\nMake sure that the `settings.py` is configured with the same values that were used to train the model, otherwise the tagger cannot load the pretrained model correctly.\n\nIf you don't want to be bothered by the `settings.py`, you can also call the script with inline configuration. To tag a sentence using the FNN architecture, use the flags `-p`, `-e` and `-s`. It is required to use exactly these 3 flags, otherwise the static configuration from the `settings.py` will be used. An example call would be:\n\n    python tagger.py --tag \"Show all modules of Bachelor Informatics\" -p 1 -e 250 -s 350\n\nTo tag a sentence using the RNN architecture, use the flags `-t`, `-e` and `-s`. It is required to use exactly these 3 flags, otherwise the static configuration from the `settings.py` will be used. An example call would be:\n\n    python tagger.py --tag \"Show all modules of Bachelor Informatics\" -t 8 -e 100 -s 100\n\n### Evaluation\n\nTo evaluate a pretrained model on an external test set call the `tagger.py` script with the `--evaluate` parameter with the path to the file which contains the evaluation data. The evaluation data file must contain one sentence per line, containing space separated word/tag tuples.\n\n    $ python tagger.py --evaluate data/evaluation.txt\n    Loading saved vocabulary...\n    Generating tensors...\n    Evaluation starts...\n\n    # RESULTS:\n\n      20 / 29   0.690   sentences correct\n    207 / 208   0.995   words recognized\n    197 / 208   0.947   tags correct\n                0.966   kappa score\n\n    # ERRORS:\n\n    count           expected                    computed\n    ------------------------------------------------------------\n    4     bachelor/C_Program:degree   bachelor/C_Program:name\n    4     master/C_Program:degree     master/C_Program:name\n    1     institute/X_Chair:name      institute/C_Program:name\n    1     quality/C_Chair:name        quality/C_Program:name\n    1     and/C_Chair:name            \u003cUNKNOWN_WORD\u003e/X\n\n Make sure that the `settings.py` is configured with the same values that were used to train the model, otherwise the evaluation cannot load the pretrained model.\n\n If you don't want to be bothered by the `settings.py`, you can also call the script with inline configuration. To tag a sentence using the FNN architecture, use the flags `-p`, `-e` and `-s`. It is required to use exactly these 3 flags, otherwise the static configuration from the `settings.py` will be used. An example call would be:\n\n    python tagger.py --evaluate data/evaluation.txt -p 1 -e 250 -s 350\n\nTo tag a sentence using the RNN architecture, use the flags `-t`, `-e` and `-s`. It is required to use exactly these 3 flags, otherwise the static configuration from the `settings.py` will be used. An example call would be:\n\n    python tagger.py --evaluate data/evaluation.txt -t 8 -e 100 -s 100\n\nIf you don't need the list of errors in the evaluation result, you can also print it in one line, adding the `-i` flag, i.e.:\n\n    $ python tagger.py --evaluate data/evaluation.txt -p 1 -e 250 -s 350 -i\n    data/evaluation_known.txt: 197/208 (94.7%) tags correct, 0.966 kappa score\n\n### Reset\n\nTo reset the tagger and delete all previously created files call the `tagger.py` script with the `--reset` flag and confirm with 'Yes' (or \u003ckbd\u003eEnter\u003c/kbd\u003e):\n\n    $ python tagger.py --reset\n    Really delete all training data and log files? [Yes/no]\n    Reset was executed. All files successfully deleted.\n\nIf you don't want to be bothered by a security question, you can use the `-f` flag to force a direct deletion:\n\n    python tagger.py --reset -f\n\nIf you even don't want to be bothered by any output messages, you can use the `-q` flag to force a quiet deletion:\n\n    python tagger.py --reset -q\n\n## API Usage\n\nThis toolkit is designed to be used in other applictions. The API is documented in the following.\n\n### Configuration\n\nImport the tagging script properly according to your directory structure, i.e.:\n\n    import tagger as nn\n\nTo instantiate the tagger, just call the `Tagger()` class. Without any parameters, the static configuration from the `settings.py` will be used:\n\n    t = nn.Tagger()\n\nIf you prefer inline configuration, pass the corresponding parameters according to the neural network architecture (the FNN needs `n_past_words`, the RNN needs `n_timesteps`). See these two examples, one for each architecture:\n\n    t = nn.Tagger('FNN', n_past_words=1, embedding_size=250, h_size=350, n_epochs=5)\n    t = nn.Tagger('RNN', n_timesteps=8, embedding_size=100, h_size=100, n_epochs=5)\n\n### Training\n\nTo train the initialized tagger, just call the `train()` method with the path to the corpus file, i.e.:\n\n    t.train('data/test.corpus')\n\nThe trained model will be stored in the `saved/` directory.\n\n### Tagging\n\nA sentence can by tagged with a pretrained model by calling the `tag()` method. You have additional parameters to print the tagging output in tabular form to the console (`pretty_print`) or mute console messages concerning model loading completely (`silent`).\n\n    tagged_sentence = t.tag('Show all modules of Bachelor Informatics', format_list=False, pretty_print=True, silent=False)\n\nIf you want to process lists instead of strings, you can use `format_list=True` as parameter. Now the input sentence has to be a list of words, an the output will be a list of word tag tuples:\n\n    $ sentence = ['Show', 'all', 'modules', 'of', 'Bachelor', 'Informatics']\n    $ tagged_sentence = t.tag(sentence, format_list=True, pretty_print=False, silent=True)\n    $ print list(tagged_sentece)\n    [('Show', 'R_LIST'), ('all', 'R_LIST'), ('modules', 'M_MTSModule'), ('of', 'X'), ('Bachelor', 'C_Program:degree'), ('Informatics', 'C_Program:name')]\n\n### Evaluation\n\nTo evaluate a pretrained model, calle the `evaluate()` method. You have an additional parameter to print the main evaluation results in one single line.\n\n    t.evaluate('data/evaluation.txt', print_inline=False)\n\n## Resources\n\n### TensorFlow\n\n- \u003chttps://www.tensorflow.org/get_started/mnist/pros\u003e\n- \u003chttps://www.tensorflow.org/versions/master/api_docs/python/tf/nn\u003e\n\n### POS Tagger\n\n- \u003chttps://github.com/mrahtz/tensorflow-pos-tagger\u003e\n- \u003chttps://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/3_NeuralNetworks/recurrent_network.ipynb\u003e\n\n---\n\nIf you like and use this POS tagger and want to give some love back, feel free to...\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://www.buymeacoffee.com/devmount\" target=\"_blank\"\u003e\n  \u003cimg alt=\"Buy me a coffee\" src=\"https://user-images.githubusercontent.com/5441654/44213163-60a91100-a16d-11e8-9d5d-7d862cae7b7c.png\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevmount%2Fneural-network-pos-tagger","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdevmount%2Fneural-network-pos-tagger","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevmount%2Fneural-network-pos-tagger/lists"}