https://github.com/hyperparticle/neural-lemmatizer-allennlp
A simple NN model capable of training on and predicting lemmas for each word in a sentence, based on PyTorch and AllenNLP
https://github.com/hyperparticle/neural-lemmatizer-allennlp
allennlp deep-learning lemmatizer machine-learning neural-network nlp pytorch
Last synced: 14 days ago
JSON representation
A simple NN model capable of training on and predicting lemmas for each word in a sentence, based on PyTorch and AllenNLP
- Host: GitHub
- URL: https://github.com/hyperparticle/neural-lemmatizer-allennlp
- Owner: Hyperparticle
- License: mit
- Created: 2019-03-06T13:38:11.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2021-02-06T19:53:07.000Z (over 4 years ago)
- Last Synced: 2025-01-14T11:55:11.380Z (6 months ago)
- Topics: allennlp, deep-learning, lemmatizer, machine-learning, neural-network, nlp, pytorch
- Language: Python
- Homepage:
- Size: 133 KB
- Stars: 4
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# A Dead Simple Neural Lemmatizer using AllenNLP
## Getting Started
Install AllenNLP with the command
```bash
pip install allennlp
```To train, run `train.sh`, or
```bash
allennlp train config/small.json \
--serialization-dir logs/main \
--include-package library
```All logs will be put in the `logs/main` directory. If you would like to train again, delete `logs/main`, otherwise
there will be an error.To predict a trained model, run `predict.sh` or
```bash
allennlp predict logs/main/model.tar.gz data/Bengali_Dataset.txt \
--output-file data/predict.txt \
--predictor simple \
--include-package library \
--use-dataset-reader
```To evaluate a trained model, run `evaluate.sh` or
```bash
allennlp evaluate logs/main/model.tar.gz data/Bengali_Dataset.txt \
--include-package library
```This will output predictions to `data/predict.txt`.
## Visualize Model Performance in TensorBoard
To view `tensorboard` logs, just run
```bash
tensorboard --logdir logs
```Make sure `tensorflow` is installed to be able to run the `tensorboard` command.
## Providing Your Own Data or Model
Prepare your data in the format
```
word lemma
```with each sentence separated by an empty line. Then split your data into train, validation, and test sets.
Copy a config file in the `config` directory and change `train_data_path`, `validation_data_path`,
and `test_data_path`, to be the paths of your train, validation, and test files respectively.Then run training with your new config file
```bash
allennlp train path_to_config.json \
--serialization-dir logs/main \
--include-package library
```Finally, output predictions with your new model using
```bash
allennlp predict logs/main/model.tar.gz path_to_input_file.txt \
--output-file path_to_output_file.txt \
--predictor simple \
--include-package library \
--use-dataset-reader
```