{"id":13535124,"url":"https://github.com/NLPScott/bert-Chinese-classification-task","last_synced_at":"2025-04-02T00:32:30.579Z","repository":{"id":45661771,"uuid":"159150601","full_name":"NLPScott/bert-Chinese-classification-task","owner":"NLPScott","description":"bert中文分类实践","archived":false,"fork":false,"pushed_at":"2018-12-11T07:10:49.000Z","size":465,"stargazers_count":735,"open_issues_count":19,"forks_count":224,"subscribers_count":30,"default_branch":"master","last_synced_at":"2024-11-02T23:32:19.654Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NLPScott.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-11-26T10:21:25.000Z","updated_at":"2024-09-14T08:59:57.000Z","dependencies_parsed_at":"2022-07-14T19:00:59.758Z","dependency_job_id":null,"html_url":"https://github.com/NLPScott/bert-Chinese-classification-task","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NLPScott%2Fbert-Chinese-classification-task","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NLPScott%2Fbert-Chinese-classification-task/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NLPScott%2Fbert-Chinese-classification-task/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NLPScott%2Fbert-Chinese-classification-task/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NLPScott","download_url":"https://codeload.github.com/NLPScott/bert-Chinese-classification-task/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246735353,"owners_count":20825221,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T08:00:49.981Z","updated_at":"2025-04-02T00:32:30.004Z","avatar_url":"https://github.com/NLPScott.png","language":"Python","readme":"# bert-Chinese-classification-task\nbert中文分类实践\n\n在run_classifier_word.py中添加NewsProcessor，即新闻的预处理读入部分 \\\n在main方法中添加news类型数据处理label \\\n processors = { \\\n        \"cola\": ColaProcessor,\\\n        \"mnli\": MnliProcessor,\\\n        \"mrpc\": MrpcProcessor,\\\n        \"news\": NewsProcessor,\\\n    }\n    \ndownload_glue_data.py 提供glue_data下面其他的bert论文公测glue数据下载\n\ndata目录下是news数据的样例\n\nexport GLUE_DIR=/search/odin/bert/extract_code/glue_data \\\nexport BERT_BASE_DIR=/search/odin/bert/chinese_L-12_H-768_A-12/ \\\nexport BERT_PYTORCH_DIR=/search/odin/bert/chinese_L-12_H-768_A-12/\n\npython run_classifier_word.py \\\n  --task_name NEWS \\\n  --do_train \\\n  --do_eval \\\n  --data_dir $GLUE_DIR/NewsAll/ \\\n  --vocab_file $BERT_BASE_DIR/vocab.txt \\\n  --bert_config_file $BERT_BASE_DIR/bert_config.json \\\n  --init_checkpoint $BERT_PYTORCH_DIR/pytorch_model.bin \\\n  --max_seq_length 256 \\\n  --train_batch_size 32 \\\n  --learning_rate 2e-5 \\\n  --num_train_epochs 3.0 \\\n  --output_dir ./newsAll_output/ \\\n  --local_rank 3\n  \n  中文分类任务实践\n\n实验中对中文34个topic进行实践（包括：时政，娱乐，体育等），在对run_classifier.py代码中的预处理环节需要加入NewsProcessor模块，及类似于MrpcProcessor，但是需要对中文的编码进行适当修改，训练数据与测试数据按照4:1进行切割，数据量约80万，单卡GPU资源，训练时间18小时，acc为92.8%\n\neval_accuracy = 0.9281581998809113\n\neval_loss = 0.2222444740207354\n\nglobal_step = 59826\n\nloss = 0.14488934577978746\n","funding_links":[],"categories":["BERT classification task:","Tasks"],"sub_categories":["Classification"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNLPScott%2Fbert-Chinese-classification-task","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FNLPScott%2Fbert-Chinese-classification-task","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNLPScott%2Fbert-Chinese-classification-task/lists"}