https://github.com/guokr/caver

Caver: a toolkit for multilabel text classification.
https://github.com/guokr/caver

attention-model cnn deep-learning multi-label-classification nlp pytorch text-classification

Last synced: 6 months ago
JSON representation

Caver: a toolkit for multilabel text classification.

Host: GitHub
URL: https://github.com/guokr/caver
Owner: guokr
License: gpl-3.0
Created: 2018-07-11T08:07:39.000Z (almost 7 years ago)
Default Branch: master
Last Pushed: 2019-06-11T02:46:57.000Z (almost 6 years ago)
Last Synced: 2024-11-19T06:01:44.721Z (6 months ago)
Topics: attention-model, cnn, deep-learning, multi-label-classification, nlp, pytorch, text-classification
Language: Python
Homepage: https://guokr.github.io/Caver/
Size: 11.3 MB
Stars: 39
Watchers: 10
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        
Caver


Rising a torch in the cave to see the words on the wall, tag your short text in 3 lines. Caver uses Facebook's PyTorch project to make the implementation easier.




  

      

    

  

      

  

  

        

  

  

    

  





  Demo •

  Requirements •

  Install •

  Pre-trained models •

  Train •

  Examples •

  Document





  

 


Quick Demo


```python

from caver import CaverModel

model = CaverModel("./checkpoint_path")

sentence = ["看 美 剧 学 英 语 靠 谱 吗",

            "科 比 携 手 姚 明 出 任 2019 篮 球 世 界 杯 全 球 大 使",

            "如 何 在 《 权 力 的 游 戏 》 中 苟 到 最 后",

            "英 雄 联 盟 LPL 夏 季 赛 RNG 能 否 击 败 TOP 战 队"]

model.predict([sentence[0]], top_k=3)

>>> ['美剧', '英语', '英语学习']

model.predict([sentence[1]], top_k=5)

>>> ['篮球', 'NBA', '体育', 'NBA 球员', '运动']

model.predict([sentence[2]], top_k=7)

>>> ['权力的游戏（美剧）', '美剧', '影视评论', '电视剧', '电影', '文学', '小说']

model.predict([sentence[3]], top_k=6)

>>> ['英雄联盟（LoL）', '电子竞技', '英雄联盟职业联赛（LPL）', '游戏', '网络游戏', '多人联机在线竞技游戏 (MOBA)']

```

Requirements


* PyTorch

* tqdm

* torchtext

* numpy

* Python3

Install


```bash

$ pip install caver --user

```

Did you guys have some pre-trained models


Yes, we have released two pre-trained models on Zhihu NLPCC2018 [opendataset](http://tcci.ccf.org.cn/conference/2018/taskdata.php).

If you want to use the pre-trained model for performing text tagging, you can download it (along with other important inference material) from the Caver releases page. Alternatively, you can run the following command to download and unzip the files in your current directory:

```bash

$ wget -O - https://github.com/guokr/Caver/releases/download/0.1/checkpoints_char_cnn.tar.gz | tar zxvf -

$ wget -O - https://github.com/guokr/Caver/releases/download/0.1/checkpoints_char_lstm.tar.gz | tar zxvf -

```

How to train on your own dataset


```bash

$ python3 train.py --input_data_dir {path to your origin dataset}

                   --output_data_dir {path to store the preprocessed dataset}

                   --train_filename train.tsv

                   --valid_filename valid.tsv

                   --checkpoint_dir {path to save the checkpoints}

                   --model {fastText/CNN/LSTM}

                   --batch_size {16, you can modify this for you own}

                   --epoch {10}

```

More Examples


It's updating, but basically you can check [examples](https://github.com/guokr/Caver/tree/master/examples).