{"id":13535069,"url":"https://github.com/benywon/ChineseBert","last_synced_at":"2025-04-02T00:32:14.527Z","repository":{"id":95488434,"uuid":"158323226","full_name":"benywon/ChineseBert","owner":"benywon","description":"This is a chinese Bert model specific for question answering","archived":false,"fork":false,"pushed_at":"2019-08-08T11:35:01.000Z","size":1257,"stargazers_count":27,"open_issues_count":3,"forks_count":8,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-08-02T08:09:53.635Z","etag":null,"topics":["chinese-nlp","deep-learning","natural-language-processing"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/benywon.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2018-11-20T03:04:55.000Z","updated_at":"2024-02-29T04:51:47.000Z","dependencies_parsed_at":null,"dependency_job_id":"c4a4896f-06ec-43c2-ab98-c24b28dc1513","html_url":"https://github.com/benywon/ChineseBert","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benywon%2FChineseBert","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benywon%2FChineseBert/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benywon%2FChineseBert/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benywon%2FChineseBert/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/benywon","download_url":"https://codeload.github.com/benywon/ChineseBert/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":222788514,"owners_count":17037777,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chinese-nlp","deep-learning","natural-language-processing"],"created_at":"2024-08-01T08:00:49.289Z","updated_at":"2024-11-02T23:30:22.502Z","avatar_url":"https://github.com/benywon.png","language":"Python","funding_links":[],"categories":["BERT QA \u0026 RC task:","Tasks"],"sub_categories":["Question Answering (QA)"],"readme":"# ChineseBert\nThis is a chinese Bert model specific for question answering. We provide two models, a large model which is a 16 layer 1024 transformer, and a small model with 8 layer and 512 hidden size. Our implementation is a different from the original paper https://arxiv.org/abs/1810.04805, in which we replace a position embedding with LSTM, which shows advantages when the text length varies a lot.\n\nCurrently it is run on python3 and pytorch\n\n-------------------------------------\n\n#Stats:\n\nData: 200m chinese internet question answering pairs.\n\ntokenizer: we use the [sentencepiece](https://github.com/google/sentencepiece) tokenizer with vocab size equal to 35,000\n\nFor both large and small model, we train it for 2m steps, which did not suffer from overfit problem\n\nlarge model takes 12 days for one epoch on 8-GPU NV-LINK v100.\nSmall model takes 2 days for one epoch on 8-GPU NV-LINK v100.\n\n------------------------------------------\n#Usage:\n\nFed with chinese question answer pair and get the combined representations.\n\nYou can refer to the main.py for more detail.\n\nThe model has been tested under sequence length less than 1024\n\n\n------------------------------------\n\nAs the torch model file is very large, you should download it from the google drive via\n get_model.sh\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbenywon%2FChineseBert","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbenywon%2FChineseBert","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbenywon%2FChineseBert/lists"}