Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/brightmart/sentiment_analysis_fine_grain
Multi-label Classification with BERT; Fine Grained Sentiment Analysis from AI challenger
https://github.com/brightmart/sentiment_analysis_fine_grain
bert fine-grained-classification language-model multi-label-classification online pre-train sentiment-analysis text-classification textcnn
Last synced: about 21 hours ago
JSON representation
Multi-label Classification with BERT; Fine Grained Sentiment Analysis from AI challenger
- Host: GitHub
- URL: https://github.com/brightmart/sentiment_analysis_fine_grain
- Owner: brightmart
- Created: 2018-11-03T15:08:01.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2018-11-15T13:13:29.000Z (about 6 years ago)
- Last Synced: 2025-01-05T07:11:09.366Z (8 days ago)
- Topics: bert, fine-grained-classification, language-model, multi-label-classification, online, pre-train, sentiment-analysis, text-classification, textcnn
- Language: Jupyter Notebook
- Homepage:
- Size: 3.39 MB
- Stars: 592
- Watchers: 29
- Forks: 161
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-bert - brightmart/sentiment_analysis_fine_grain - label Classification with BERT; Fine Grained Sentiment Analysis from AI challenger, (BERT Sentiment Analysis)
- awesome-transformer-nlp - brightmart/sentiment_analysis_fine_grain - Multi-label classification with BERT; Fine Grained Sentiment Analysis from AI challenger. (Tasks / Classification)
README
## Introduction
With this repository, you will able to train Multi-label Classification with BERT,
Deploy BERT for online prediction.
You can also find the a short tutorial of how to use bert with chinese: BERT short chinese tutorial
You can find Introduction to fine grain sentiment from AI Challenger
## Basic Ideas
Add something here.
## Experiment on New Models
for more, check model/bert_cnn_fine_grain_model.py
## PerformanceModel | TextCNN(No-pretrain)| TextCNN(Pretrain-Finetuning)| Bert(base_model_zh) | Bert(base_model_zh,pre-train on corpus)
--- | --- | --- | ----------- | -----------
F1 Score | 0.678 | 0.685 | ADD A NUMBER HERE | ADD A NUMBER HERE
----------------------------------------------------------------------------------------------Notice: F1 Score is reported on validation set
## Usage
### Bert for Multi-label Classificaiton [data for fine-tuning and pre-train]
export BERT_BASE_DIR=BERT_BASE_DIR/chinese_L-12_H-768_A-12
export TEXT_DIR=TEXT_DIR
nohup python run_classifier_multi_labels_bert.py
--task_name=sentiment_analysis
--do_train=true
--do_eval=true
--data_dir=$TEXT_DIR
--vocab_file=$BERT_BASE_DIR/vocab.txt
--bert_config_file=$BERT_BASE_DIR/bert_config.json
--init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt
--max_seq_length=512
--train_batch_size=4
--learning_rate=2e-5
--num_train_epochs=3
--output_dir=./checkpoint_bert &
1.firstly, you need to download pre-trained model from google, and put to a folder(e.g.BERT_BASE_DIR)
chinese_L-12_H-768_A-12 from bert
2.secondly, you need to have training data(e.g. train.tsv) and validation data(e.g. dev.tsv), and put it under a
folder(e.g.TEXT_DIR ). you can also download data from here data to train bert for AI challenger-Sentiment Analysis.
it contains processed data you can run for both fine-tuning on sentiment analysis and pre-train with Bert.
it is generated by following this notebook step by step:
preprocess_char.ipynb
you can generate data by yourself as long as data format is compatible with
processor SentimentAnalysisFineGrainProcessor(alias as sentiment_analysis);
data format: label1,label2,label3\t here is sentence or sentences\t
it only contains two columns, the first one is target(one or multi-labels), the second one is input strings.
no need to tokenized.
sample:"0_1,1_-2,2_-2,3_-2,4_1,5_-2,6_-2,7_-2,8_1,9_1,10_-2,11_-2,12_-2,13_-2,14_-2,15_1,16_-2,17_-2,18_0,19_-2 浦东五莲路站,老饭店福瑞轩属于上海的本帮菜,交通方便,最近又重新装修,来拨草了,饭店活动满188元送50元钱,环境干净,简单。朋友提前一天来预订包房也没有订到,只有大堂,五点半到店基本上每个台子都客满了,都是附近居民,每道冷菜量都比以前小,味道还可以,热菜烤茄子,炒河虾仁,脆皮鸭,照牌鸡,小牛排,手撕腊味花菜等每道菜都很入味好吃,会员价划算,服务员人手太少,服务态度好,要能团购更好。可以用支付宝方便"
check sample data in ./BERT_BASE_DIR folder
for more detail, check create_model and SentimentAnalysisFineGrainProcessor from run_classifier.py
### Pre-train Bert model based on open-souced model, then do classification task
1. generate raw data: [ADD SOMETHING HERE]
take sure each line is a sentence. between each document there is a blank line.
you can find generated data from zip file.
use write_pre_train_doc() from preprocess_char.ipynb
1. generate data for pre-train stage using:
export BERT_BASE_DIR=./BERT_BASE_DIR/chinese_L-12_H-768_A-12
nohup python create_pretraining_data.py \
--input_file=./PRE_TRAIN_DIR/bert_*_pretrain.txt \
--output_file=./PRE_TRAIN_DIR/tf_examples.tfrecord \
--vocab_file=$BERT_BASE_DIR/vocab.txt \
--do_lower_case=True \
--max_seq_length=512 \
--max_predictions_per_seq=60 \
--masked_lm_prob=0.15 \
--random_seed=12345 \
--dupe_factor=5 nohup_pre.out &
2. pre-train model with generated data:
python run_pretraining.py
3. fine-tuning
python run_classifier.py
### TextCNN
1. download cache file of sentiment analysis(tokens are in word level)
2. train the model:
python train_cnn_fine_grain.py
cache file of TextCNN model was generate by following steps from preprocess_word.ipynb.
it contains everything you need to run TextCNN.
it include: processed train/validation/test set; vocabulary of word; a dict map label to index.
take train_valid_test_vocab_cache.pik and put it under folder of preprocess_word/
raw data are also included in this zip file.
### Pre-train TextCNN
1. pre-train TextCNN with masked language model
python train_cnn_lm.py
2. fine-tuning for TextCNN
python train_cnn_fine_grain.py
### Deploy BERT for online prediction
with session and feed style you can easily deploy BERT.
online prediction with BERT, check more from here
## Reference1. Bidirectional Encoder Representations from Transformers for Language Understanding
5. Convolutional Neural Networks for Sentence Classification