{"id":15035701,"url":"https://github.com/codertimo/bert-pytorch","last_synced_at":"2025-10-08T17:11:02.441Z","repository":{"id":37431684,"uuid":"153113207","full_name":"codertimo/BERT-pytorch","owner":"codertimo","description":"Google AI 2018 BERT pytorch implementation","archived":false,"fork":false,"pushed_at":"2023-09-15T12:57:08.000Z","size":101,"stargazers_count":6447,"open_issues_count":69,"forks_count":1324,"subscribers_count":124,"default_branch":"master","last_synced_at":"2025-09-14T08:19:09.864Z","etag":null,"topics":["bert","language-model","nlp","pytorch","transformer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/codertimo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-10-15T12:58:15.000Z","updated_at":"2025-09-12T16:37:58.000Z","dependencies_parsed_at":"2023-02-19T07:30:50.982Z","dependency_job_id":"aace1d36-9878-4a7f-ba81-71a82b36f62e","html_url":"https://github.com/codertimo/BERT-pytorch","commit_stats":{"total_commits":52,"total_committers":5,"mean_commits":10.4,"dds":0.07692307692307687,"last_synced_commit":"d10dc4f9d5a6f2ca74380f62039526eb7277c671"},"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/codertimo/BERT-pytorch","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codertimo%2FBERT-pytorch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codertimo%2FBERT-pytorch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codertimo%2FBERT-pytorch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codertimo%2FBERT-pytorch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/codertimo","download_url":"https://codeload.github.com/codertimo/BERT-pytorch/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codertimo%2FBERT-pytorch/sbom","scorecard":{"id":297668,"data":{"date":"2025-08-11","repo":{"name":"github.com/codertimo/BERT-pytorch","commit":"d10dc4f9d5a6f2ca74380f62039526eb7277c671"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":2.1,"checks":[{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Code-Review","score":3,"reason":"Found 3/8 approved changesets -- score normalized to 3","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: Apache License 2.0: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Vulnerabilities","score":0,"reason":"16 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: PYSEC-2018-34 / GHSA-2fc2-6r4j-p65h","Warn: Project is vulnerable to: PYSEC-2021-856 / GHSA-5545-2q6w-2gh6","Warn: Project is vulnerable to: PYSEC-2019-108 / GHSA-9fq2-x9r6-wfmf","Warn: Project is vulnerable to: PYSEC-2018-33 / GHSA-cw6w-4rcx-xphc","Warn: Project is vulnerable to: PYSEC-2021-857 / GHSA-f7c7-j99h-c22f","Warn: Project is vulnerable to: GHSA-fpfv-jqm9-f5jm","Warn: Project is vulnerable to: PYSEC-2017-1 / GHSA-frgw-fgh6-9g52","Warn: Project is vulnerable to: GHSA-3749-ghw9-m3mg","Warn: Project is vulnerable to: PYSEC-2022-43015 / GHSA-47fc-vmwq-366v","Warn: Project is vulnerable to: PYSEC-2025-41 / GHSA-53q9-r3pm-6pq6","Warn: Project is vulnerable to: PYSEC-2024-252 / GHSA-5pcm-hx3q-hm94","Warn: Project is vulnerable to: GHSA-887c-mr87-cxwp","Warn: Project is vulnerable to: PYSEC-2024-251 / GHSA-pg7h-5qx3-wjr3","Warn: Project is vulnerable to: PYSEC-2024-250","Warn: Project is vulnerable to: PYSEC-2024-259","Warn: Project is vulnerable to: PYSEC-2017-74"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 27 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-17T19:56:18.835Z","repository_id":37431684,"created_at":"2025-08-17T19:56:18.835Z","updated_at":"2025-08-17T19:56:18.835Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278981518,"owners_count":26079640,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-08T02:00:06.501Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","language-model","nlp","pytorch","transformer"],"created_at":"2024-09-24T20:29:13.191Z","updated_at":"2025-10-08T17:11:02.422Z","avatar_url":"https://github.com/codertimo.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# BERT-pytorch\n\n[![LICENSE](https://img.shields.io/github/license/codertimo/BERT-pytorch.svg)](https://github.com/codertimo/BERT-pytorch/blob/master/LICENSE)\n![GitHub issues](https://img.shields.io/github/issues/codertimo/BERT-pytorch.svg)\n[![GitHub stars](https://img.shields.io/github/stars/codertimo/BERT-pytorch.svg)](https://github.com/codertimo/BERT-pytorch/stargazers)\n[![CircleCI](https://circleci.com/gh/codertimo/BERT-pytorch.svg?style=shield)](https://circleci.com/gh/codertimo/BERT-pytorch)\n[![PyPI](https://img.shields.io/pypi/v/bert-pytorch.svg)](https://pypi.org/project/bert_pytorch/)\n[![PyPI - Status](https://img.shields.io/pypi/status/bert-pytorch.svg)](https://pypi.org/project/bert_pytorch/)\n[![Documentation Status](https://readthedocs.org/projects/bert-pytorch/badge/?version=latest)](https://bert-pytorch.readthedocs.io/en/latest/?badge=latest)\n\nPytorch implementation of Google AI's 2018 BERT, with simple annotation\n\n\u003e BERT 2018 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding\n\u003e Paper URL : https://arxiv.org/abs/1810.04805\n\n\n## Introduction\n\nGoogle AI's BERT paper shows the amazing result on various NLP task (new 17 NLP tasks SOTA), \nincluding outperform the human F1 score on SQuAD v1.1 QA task. \nThis paper proved that Transformer(self-attention) based encoder can be powerfully used as \nalternative of previous language model with proper language model training method. \nAnd more importantly, they showed us that this pre-trained language model can be transfer \ninto any NLP task without making task specific model architecture.\n\nThis amazing result would be record in NLP history, \nand I expect many further papers about BERT will be published very soon.\n\nThis repo is implementation of BERT. Code is very simple and easy to understand fastly.\nSome of these codes are based on [The Annotated Transformer](http://nlp.seas.harvard.edu/2018/04/03/attention.html)\n\nCurrently this project is working on progress. And the code is not verified yet.\n\n## Installation\n```\npip install bert-pytorch\n```\n\n## Quickstart\n\n**NOTICE : Your corpus should be prepared with two sentences in one line with tab(\\t) separator**\n\n### 0. Prepare your corpus\n```\nWelcome to the \\t the jungle\\n\nI can stay \\t here all night\\n\n```\n\nor tokenized corpus (tokenization is not in package)\n```\nWel_ _come _to _the \\t _the _jungle\\n\n_I _can _stay \\t _here _all _night\\n\n```\n\n\n### 1. Building vocab based on your corpus\n```shell\nbert-vocab -c data/corpus.small -o data/vocab.small\n```\n\n### 2. Train your own BERT model\n```shell\nbert -c data/corpus.small -v data/vocab.small -o output/bert.model\n```\n\n## Language Model Pre-training\n\nIn the paper, authors shows the new language model training methods, \nwhich are \"masked language model\" and \"predict next sentence\".\n\n\n### Masked Language Model \n\n\u003e Original Paper : 3.3.1 Task #1: Masked LM \n\n```\nInput Sequence  : The man went to [MASK] store with [MASK] dog\nTarget Sequence :                  the                his\n```\n\n#### Rules:\nRandomly 15% of input token will be changed into something, based on under sub-rules\n\n1. Randomly 80% of tokens, gonna be a `[MASK]` token\n2. Randomly 10% of tokens, gonna be a `[RANDOM]` token(another word)\n3. Randomly 10% of tokens, will be remain as same. But need to be predicted.\n\n### Predict Next Sentence\n\n\u003e Original Paper : 3.3.2 Task #2: Next Sentence Prediction\n\n```\nInput : [CLS] the man went to the store [SEP] he bought a gallon of milk [SEP]\nLabel : Is Next\n\nInput = [CLS] the man heading to the store [SEP] penguin [MASK] are flight ##less birds [SEP]\nLabel = NotNext\n```\n\n\"Is this sentence can be continuously connected?\"\n\n understanding the relationship, between two text sentences, which is\nnot directly captured by language modeling\n\n#### Rules:\n\n1. Randomly 50% of next sentence, gonna be continuous sentence.\n2. Randomly 50% of next sentence, gonna be unrelated sentence.\n\n\n## Author\nJunseong Kim, Scatter Lab (codertimo@gmail.com / junseong.kim@scatterlab.co.kr)\n\n## License\n\nThis project following Apache 2.0 License as written in LICENSE file\n\nCopyright 2018 Junseong Kim, Scatter Lab, respective BERT contributors\n\nCopyright (c) 2018 Alexander Rush : [The Annotated Trasnformer](https://github.com/harvardnlp/annotated-transformer)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodertimo%2Fbert-pytorch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcodertimo%2Fbert-pytorch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodertimo%2Fbert-pytorch/lists"}