Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/iesl/dilated-cnn-ner
Dilated CNNs for NER in TensorFlow
https://github.com/iesl/dilated-cnn-ner
cnns machine-learning named-entity-recognition natural-language-processing neural-networks tensorflow
Last synced: 3 days ago
JSON representation
Dilated CNNs for NER in TensorFlow
- Host: GitHub
- URL: https://github.com/iesl/dilated-cnn-ner
- Owner: iesl
- Created: 2017-04-07T07:44:11.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2019-03-09T01:04:42.000Z (almost 6 years ago)
- Last Synced: 2025-01-03T03:09:11.599Z (11 days ago)
- Topics: cnns, machine-learning, named-entity-recognition, natural-language-processing, neural-networks, tensorflow
- Language: Python
- Size: 160 KB
- Stars: 243
- Watchers: 23
- Forks: 60
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# dilated-cnn-ner
This code implements the models described in the paper
"[Fast and Accurate Entity Recognition with Iterated Dilated Convolutions](https://arxiv.org/abs/1702.02098)"
by [Emma Strubell](https://cs.umass.edu/~strubell), [Patrick Verga](https://cs.umass.edu/~pat),
[David Belanger](https://cs.umass.edu/~belanger) and [Andrew McCallum](https://cs.umass.edu/~mccallum).Requirements
-----
This code uses TensorFlow v[1.0, 1.4) and Python 2.7.It will probably train on a CPU, but honestly we haven't tried, and highly recommend training on a GPU.
Setup
-----
1. Set up environment variables. For example, from the root directory of this project:```
export DILATED_CNN_NER_ROOT=`pwd`
export DATA_DIR=/path/to/conll-2003
```2. Get some pretrained word embeddings, e.g. [SENNA embeddings](http://ronan.collobert.com/senna/download.html) or
[Glove embeddings](https://nlp.stanford.edu/projects/glove/). The code expects a space-separated file
with one word and its embedding per line, e.g.:
```
word 0.45 0.67 0.99 ...
```
Make a directory for the embeddings:
```
mkdir -p data/embeddings
```
and place the file there.3. Perform all data preprocessing for a given configuration. For example:
```
./bin/preprocess.sh conf/conll/dilated-cnn.conf
```This calls `preprocess.py`, which loads the data from text files, maps the tokens, labels and any other features to
integers, and writes to TensorFlow tfrecords.Training
----
Once the data preprocessing is completed, you can train a tagger:```
./bin/train-cnn.sh conf/conll/dilated-cnn.conf
```Evaluation
----
By default, the trainer will write the model which achieved the best dev F1. To evaluate a saved model on the dev set:```
./bin/eval-cnn.sh conf/conll/dilated-cnn.conf --load_model path/to/model
```
To evaluate a saved model on the test set:```
./bin/eval-cnn.sh conf/conll/dilated-cnn.conf test --load_model path/to/model
```Configs
----
Configuration files (`conf/*`) specify all the data, parameters, etc. for an experiment.