{"id":13535168,"url":"https://github.com/brightmart/sentiment_analysis_fine_grain","last_synced_at":"2025-04-05T11:08:48.159Z","repository":{"id":108167869,"uuid":"155994397","full_name":"brightmart/sentiment_analysis_fine_grain","owner":"brightmart","description":"Multi-label Classification with BERT; Fine Grained Sentiment Analysis from AI challenger","archived":false,"fork":false,"pushed_at":"2018-11-15T13:13:29.000Z","size":3556,"stargazers_count":594,"open_issues_count":8,"forks_count":162,"subscribers_count":28,"default_branch":"master","last_synced_at":"2025-03-29T10:08:07.502Z","etag":null,"topics":["bert","fine-grained-classification","language-model","multi-label-classification","online","pre-train","sentiment-analysis","text-classification","textcnn"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/brightmart.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-11-03T15:08:01.000Z","updated_at":"2025-03-17T02:14:00.000Z","dependencies_parsed_at":null,"dependency_job_id":"122ea279-0eab-44cf-8a01-a5318999145d","html_url":"https://github.com/brightmart/sentiment_analysis_fine_grain","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brightmart%2Fsentiment_analysis_fine_grain","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brightmart%2Fsentiment_analysis_fine_grain/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brightmart%2Fsentiment_analysis_fine_grain/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brightmart%2Fsentiment_analysis_fine_grain/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/brightmart","download_url":"https://codeload.github.com/brightmart/sentiment_analysis_fine_grain/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247325693,"owners_count":20920714,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","fine-grained-classification","language-model","multi-label-classification","online","pre-train","sentiment-analysis","text-classification","textcnn"],"created_at":"2024-08-01T08:00:50.695Z","updated_at":"2025-04-05T11:08:48.126Z","avatar_url":"https://github.com/brightmart.png","language":"Jupyter Notebook","funding_links":[],"categories":["BERT Sentiment Analysis","Tasks"],"sub_categories":["Classification"],"readme":"## Introduction\n\nWith this repository, you will able to train Multi-label Classification with BERT, \n\nDeploy BERT for online prediction. \n\nYou can also find the a short tutorial of how to use bert with chinese: \u003ca href='https://github.com/brightmart/sentiment_analysis_fine_grain/blob/master/README_bert_chinese_tutorial.md'\u003eBERT short chinese tutorial\u003c/a\u003e\n\nYou can find Introduction to \u003ca href='https://challenger.ai/competition/fsauor2018'\u003efine grain sentiment from AI Challenger\u003c/a\u003e\n\n## Basic Ideas\n\nAdd something here.\n\n\n## Experiment on New Models\n      \n   \u003cimg src=\"https://github.com/brightmart/sentiment_analysis_fine_grain/blob/master/data/img/fine_grain.jpg\"  width=\"67%\" height=\"67%\" /\u003e\n\n   for more, check model/bert_cnn_fine_grain_model.py\n   \n## Performance \n\nModel                        | TextCNN(No-pretrain)| TextCNN(Pretrain-Finetuning)| Bert(base_model_zh) | Bert(base_model_zh,pre-train on corpus)\n---                          | ---                 | ---                         | -----------    |       ----------- \nF1 Score                     |  0.678               | 0.685                        |   ADD A NUMBER HERE     |  ADD A NUMBER HERE  \n----------------------------------------------------------------------------------------------\n\nNotice: F1 Score is reported on validation set\n\n\u003cimg src=\"https://github.com/brightmart/sentiment_analysis_fine_grain/blob/master/data/img/bert_sa.jpg\"  width=\"65%\" height=\"65%\" /\u003e\n\n## Usage\n   \n   ### Bert for Multi-label Classificaiton [\u003ca href='https://pan.baidu.com/s/1ZS4dAdOIAe3DaHiwCDrLKw'\u003edata for fine-tuning and pre-train\u003c/a\u003e]\n   \n    export BERT_BASE_DIR=BERT_BASE_DIR/chinese_L-12_H-768_A-12\n    export TEXT_DIR=TEXT_DIR\n    nohup python run_classifier_multi_labels_bert.py   \n      --task_name=sentiment_analysis   \n      --do_train=true   \n      --do_eval=true  \n      --data_dir=$TEXT_DIR   \n      --vocab_file=$BERT_BASE_DIR/vocab.txt   \n      --bert_config_file=$BERT_BASE_DIR/bert_config.json  \n      --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt   \n      --max_seq_length=512   \n      --train_batch_size=4   \n      --learning_rate=2e-5   \n      --num_train_epochs=3   \n      --output_dir=./checkpoint_bert \u0026\n    \n 1.firstly, you need to download pre-trained model from google, and put to a folder(e.g.BERT_BASE_DIR)\n \n    chinese_L-12_H-768_A-12 from \u003ca href='https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip'\u003ebert\u003c/a\u003e\n \n 2.secondly, you need to have training data(e.g. train.tsv) and validation data(e.g. dev.tsv), and put it under a \n \n     folder(e.g.TEXT_DIR ). you can also download data from here \u003ca href='https://pan.baidu.com/s/1ZS4dAdOIAe3DaHiwCDrLKw'\u003edata to train bert for AI challenger-Sentiment Analysis\u003c/a\u003e.\n      \n     it contains processed data you can run for both fine-tuning on sentiment analysis and pre-train with Bert. \n      \n     it is generated by following this notebook step by step:\n      \n     preprocess_char.ipynb \n      \n     you can generate data by yourself as long as data format is compatible with \n      \n     processor SentimentAnalysisFineGrainProcessor(alias as sentiment_analysis); \n  \n \n     data format:  label1,label2,label3\\t here is sentence or sentences\\t\n     \n     it only contains two columns, the first one is target(one or multi-labels), the second one is input strings.\n      \n     no need to tokenized.\n     \n     sample:\"0_1,1_-2,2_-2,3_-2,4_1,5_-2,6_-2,7_-2,8_1,9_1,10_-2,11_-2,12_-2,13_-2,14_-2,15_1,16_-2,17_-2,18_0,19_-2 浦东五莲路站，老饭店福瑞轩属于上海的本帮菜，交通方便，最近又重新装修，来拨草了，饭店活动满188元送50元钱，环境干净，简单。朋友提前一天来预订包房也没有订到，只有大堂，五点半到店基本上每个台子都客满了，都是附近居民，每道冷菜量都比以前小，味道还可以，热菜烤茄子，炒河虾仁，脆皮鸭，照牌鸡，小牛排，手撕腊味花菜等每道菜都很入味好吃，会员价划算，服务员人手太少，服务态度好，要能团购更好。可以用支付宝方便\"\n     \n     check sample data in ./BERT_BASE_DIR folder \n \n     for more detail, check create_model and SentimentAnalysisFineGrainProcessor from run_classifier.py \n   \n   ### Pre-train Bert model based on open-souced model, then do classification task\n   \n   1. generate raw data: [ADD SOMETHING HERE]\n      \n      take sure each line is a sentence. between each document there is a blank line.\n      \n      you can find generated data from zip file.\n      \n           use write_pre_train_doc() from preprocess_char.ipynb \n   \n   1. generate data for pre-train stage using:\n       \n          export BERT_BASE_DIR=./BERT_BASE_DIR/chinese_L-12_H-768_A-12\n          nohup python create_pretraining_data.py \\\n          --input_file=./PRE_TRAIN_DIR/bert_*_pretrain.txt \\\n          --output_file=./PRE_TRAIN_DIR/tf_examples.tfrecord \\\n          --vocab_file=$BERT_BASE_DIR/vocab.txt \\\n          --do_lower_case=True \\\n          --max_seq_length=512 \\\n          --max_predictions_per_seq=60 \\\n          --masked_lm_prob=0.15 \\\n          --random_seed=12345 \\\n          --dupe_factor=5 nohup_pre.out \u0026 \n      \n   2. pre-train model with generated data: \n       \n       python run_pretraining.py  \n   \n   3. fine-tuning\n       \n      python run_classifier.py \n   \n   ### TextCNN\n    \n   1. download \u003ca href='https://pan.baidu.com/s/19aMHbPgfpBxz9sS-sYsjOg'\u003ecache file of sentiment analysis(tokens are in word level)\u003c/a\u003e\n     \n   2. train the model:\n        \n      python train_cnn_fine_grain.py\n      \n        \n     cache file of TextCNN model was generate by following steps from preprocess_word.ipynb. \n     \n     it contains everything you need to run TextCNN.\n     \n     it include: processed train/validation/test set; vocabulary of word; a dict map label to index. \n     \n     take train_valid_test_vocab_cache.pik and put it under folder of preprocess_word/\n     \n     raw data are also included in this zip file.\n \n      \n   ### Pre-train TextCNN\n   \n   1. pre-train TextCNN with masked language model\n      \n      python train_cnn_lm.py \n   \n   2. fine-tuning for TextCNN\n      \n      python train_cnn_fine_grain.py\n      \n  ### Deploy BERT for online prediction\n    \n    with session and feed style you can easily deploy BERT.\n    \n  \u003ca href='https://github.com/brightmart/bert_language_understanding/blob/master/run_classifier_predict_online.py'\u003eonline prediction with BERT, check more from here\u003c/a\u003e\n\n    \n## Reference\n\n1. \u003ca href='https://arxiv.org/pdf/1810.04805.pdf'\u003eBidirectional Encoder Representations from Transformers for Language Understanding\u003c/a\u003e\n\n2. \u003ca href='https://github.com/google-research/bert'\u003egoogle-research/bert\u003c/a\u003e\n\n3. \u003ca href='https://github.com/pengshuang/AI-Comp'\u003epengshuang/AI-Comp\u003c/a\u003e\n\n4. \u003ca href='https://github.com/AIChallenger/AI_Challenger_2018'\u003eAI Challenger 2018\u003c/a\u003e\n\n5. \u003ca href='https://arxiv.org/abs/1408.5882'\u003eConvolutional Neural Networks for Sentence Classification\u003c/a\u003e","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrightmart%2Fsentiment_analysis_fine_grain","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbrightmart%2Fsentiment_analysis_fine_grain","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrightmart%2Fsentiment_analysis_fine_grain/lists"}