Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/indiejoseph/cnn-text-classification-tf-chinese
CNN for Chinese Text Classification in Tensorflow
https://github.com/indiejoseph/cnn-text-classification-tf-chinese
chinese cnn convolutional-neural-networks deep-learning nlp tensorflow text-classification
Last synced: about 2 months ago
JSON representation
CNN for Chinese Text Classification in Tensorflow
- Host: GitHub
- URL: https://github.com/indiejoseph/cnn-text-classification-tf-chinese
- Owner: indiejoseph
- Archived: true
- Created: 2016-03-09T03:41:17.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2018-01-23T03:20:23.000Z (almost 7 years ago)
- Last Synced: 2024-05-23T04:46:51.470Z (8 months ago)
- Topics: chinese, cnn, convolutional-neural-networks, deep-learning, nlp, tensorflow, text-classification
- Language: Python
- Size: 3.49 MB
- Stars: 237
- Watchers: 27
- Forks: 109
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## CNN for Chinese Text Classification in Tensorflow
Sentiment classification forked from [dennybritz/cnn-text-classification-tf](https://github.com/dennybritz/cnn-text-classification-tf), make the data helper supports Chinese language and modified the embedding from word-level to character-level, though that increased vocabulary size, and also i've implemented the [Character-Aware Neural Language Models](http://arxiv.org/pdf/1508.06615v4.pdf) network structure which CNN + Highway network to improve the performance, this version can achieve an accuracy of 98% with the Chinese corpus**[This code belongs to the "Implementing a CNN for Text Classification in Tensorflow" blog post.](http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/)**
It is slightly simplified implementation of Kim's [Convolutional Neural Networks for Sentence Classification](http://arxiv.org/abs/1408.5882) paper in Tensorflow.
## Requirements
- Python 2.7
- Tensorflow 0.9.0
- Numpy## Running
Print parameters:
```bash
./train.py --help
``````
optional arguments:
-h, --help show this help message and exit
--embedding_dim EMBEDDING_DIM
Dimensionality of character embedding (default: 128)
--filter_sizes FILTER_SIZES
Comma-separated filter sizes (default: '1,2,3,4,5,6,8')
--num_filters NUM_FILTERS
Number of filters per filter size (default: '50,100,150,150,200,200,200')
--l2_reg_lambda L2_REG_LAMBDA
L2 regularizaion lambda (default: 0.0)
--dropout_keep_prob DROPOUT_KEEP_PROB
Dropout keep probability (default: 0.5)
--batch_size BATCH_SIZE
Batch Size (default: 32)
--num_epochs NUM_EPOCHS
Number of training epochs (default: 100)
--evaluate_every EVALUATE_EVERY
Evaluate model on dev set after this many steps
(default: 100)
--checkpoint_every CHECKPOINT_EVERY
Save model after this many steps (default: 100)
--allow_soft_placement ALLOW_SOFT_PLACEMENT
Allow device soft device placement
--noallow_soft_placement
--log_device_placement LOG_DEVICE_PLACEMENT
Log placement of ops on devices
--nolog_device_placement```
Train:
```bash
./train.py
```## References
- [Convolutional Neural Networks for Sentence Classification](http://arxiv.org/abs/1408.5882)
- [A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification](http://arxiv.org/abs/1510.03820)