https://github.com/napsternxg/deepsequenceclassification
Deep neural network based model for sequence to sequence classification
https://github.com/napsternxg/deepsequenceclassification
deep-neural-networks named-entity-recognition sequence-classification
Last synced: 9 months ago
JSON representation
Deep neural network based model for sequence to sequence classification
- Host: GitHub
- URL: https://github.com/napsternxg/deepsequenceclassification
- Owner: napsternxg
- License: gpl-2.0
- Created: 2015-12-25T06:03:22.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2017-11-22T04:19:42.000Z (about 8 years ago)
- Last Synced: 2025-03-01T11:11:14.286Z (10 months ago)
- Topics: deep-neural-networks, named-entity-recognition, sequence-classification
- Language: Python
- Size: 34.2 KB
- Stars: 77
- Watchers: 9
- Forks: 20
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://twitter.com/TheShubhanshu)
[](https://github.com/napsternxg/DeepSequenceClassification/edit/master/LICENSE)
# Deep Sequence Classification
Generic library for training models for deep neural networks for text sequence classification tasks.
## Usage:
* Create a json file `config.json` (default name) using the template in `config.json.sample` and specify the parameters for your training.
* Best strategy is to save the training and test files in vector format in advance and then give their paths in `data_vectors` parameter in the file.
* Train a model:
```
python model.py --config config_multitask.json --verbose 1
```
* Resume training from saved weights:
```
python model.py --config config_multitask.json --verbose 1 --weights output/models/model_multi_brnn_multitask_h2-45.h5 --base_epochs 45
```
## Preprocessing:
* Currently, we support the preprocessing for the following file formats:
```
DOCUMENT 1
For , Shubhanshu A. B. Mishra has made several programming projects after being inspired by Linus Torvalds, a very renowned programmer.
```
* Each file can contain multiple `DOCNO`.
* The dir structure consists of many folders of data split for cross validation. It is as follows:
```
data/
data/CV_files
data/CV_files/1/file1.xml
data/CV_files/1/file2.xml
data/CV_files/1/file3.xml
...
data/CV_files/5/file1.xml
data/CV_files/5/file2.xml
```
## Supports:
* Boundary and Category Detection
* Simple RNN and Bidirectional RNN
* Multi task sequence learning (Boundary + Category trained using same model)
* CNN + BRNN
## Coming Up:
## Use Cases:
* Named Entity Recognition
* POS Tagging
* Dependency Parsing
## Author:
* Shubhanshu Mishra
## Dependencies:
* Theano
* Keras
* BeautifulSoup (with lxml)
* numpy
* lxml (requires libxml2, libxslt and libxml2-dev)
Install theano and keras using the following commands:
```
pip install --user --upgrade --no-deps git+git://github.com/Theano/Theano.git
pip install --user --upgrade --no-deps git+git://github.com/fchollet/keras.git
```