Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bentrevett/pytorch-pos-tagging
A tutorial on how to implement models for part-of-speech tagging using PyTorch and TorchText.
https://github.com/bentrevett/pytorch-pos-tagging
cnn lstm natural-language-processing part-of-speech-tagger pos pos-tagging pytorch pytorch-implementation pytorch-nlp pytorch-tutorial pytorch-tutorials recurrent-neural-networks rnn torchtext tutorial tutorials
Last synced: about 14 hours ago
JSON representation
A tutorial on how to implement models for part-of-speech tagging using PyTorch and TorchText.
- Host: GitHub
- URL: https://github.com/bentrevett/pytorch-pos-tagging
- Owner: bentrevett
- License: mit
- Created: 2019-09-18T10:26:15.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2021-06-04T11:37:11.000Z (over 3 years ago)
- Last Synced: 2023-10-20T18:12:19.794Z (about 1 year ago)
- Topics: cnn, lstm, natural-language-processing, part-of-speech-tagger, pos, pos-tagging, pytorch, pytorch-implementation, pytorch-nlp, pytorch-tutorial, pytorch-tutorials, recurrent-neural-networks, rnn, torchtext, tutorial, tutorials
- Language: Jupyter Notebook
- Homepage:
- Size: 176 KB
- Stars: 165
- Watchers: 3
- Forks: 27
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# PyTorch PoS Tagging
## Note: This repo only works with torchtext 0.9 or above which requires PyTorch 1.8 or above. If you are using torchtext 0.8 then please use [this](https://github.com/bentrevett/pytorch-pos-tagging/tree/torchtext08) branch
This repo contains tutorials covering how to perform part-of-speech (PoS) tagging using [PyTorch](https://github.com/pytorch/pytorch) 1.8, [torchtext](https://github.com/pytorch/text) 0.9, and and [spaCy](https://spacy.io/) 3.0, using Python 3.8.
These tutorials will cover getting started with the most common approach to PoS tagging: recurrent neural networks (RNNs). The first notebook introduces a bi-directional LSTM (BiLSTM) network. The second covers how to fine-tune a pretrained Transformer model.
**If you find any mistakes or disagree with any of the explanations, please do not hesitate to [submit an issue](https://github.com/bentrevett/pytorch-pos-tagging/issues/new). I welcome any feedback, positive or negative!**
## Getting Started
To install PyTorch, see installation instructions on the [PyTorch website](pytorch.org).
To install TorchText:
``` bash
pip install torchtext
```To install the transformers library:
```bash
pip install transformers
```We'll also make use of spaCy to tokenize our data. To install spaCy, follow the instructions [here](https://spacy.io/usage/) making sure to install the English models:
``` bash
python -m spacy download en_core_web_sm
```## Tutorials
* 1 - [BiLSTM for PoS Tagging](https://github.com/bentrevett/pytorch-pos-tagging/blob/master/1_bilstm.ipynb)[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/bentrevett/pytorch-pos-tagging/blob/master/1_bilstm.ipynb)
This tutorial covers the workflow of a PoS tagging project with PyTorch and TorchText. We'll introduce the basic TorchText concepts such as: defining how data is processed; using TorchText's datasets and how to use pre-trained embeddings. Using PyTorch we built a strong baseline model: a multi-layer bi-directional LSTM. We also show how the model can be used for inference to tag any input text.
* 2 - [Fine-tuning Pretrained Transformers for PoS Tagging](https://github.com/bentrevett/pytorch-pos-tagging/blob/master/2_transformer.ipynb)[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/bentrevett/pytorch-pos-tagging/blob/master/2_transformer.ipynb)
This tutorial covers how to fine-tune a pretrained Transformer model, provided by the `transformers` library, by integrating it with TorchText. We use a pretrained BERT model to provide the embeddings for our input text and input these embeddings to a linear layer that will predict tags based on these embeddings.
## References
Here are some things I looked at while making these tutorials. Some of it may be out of date.
* https://github.com/pytorch/text/blob/master/torchtext/datasets/sequence_tagging.py
* https://github.com/pytorch/text/blob/master/test/sequence_tagging.py