{"id":47590174,"url":"https://github.com/david-wb/bert-text-summarizer","last_synced_at":"2026-04-01T17:16:50.094Z","repository":{"id":57414708,"uuid":"257102901","full_name":"david-wb/bert-text-summarizer","owner":"david-wb","description":"A BERT-based text summarization tool","archived":false,"fork":false,"pushed_at":"2021-03-01T03:41:48.000Z","size":29,"stargazers_count":5,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-01-03T07:30:14.430Z","etag":null,"topics":["bert","deep-learning","extractive-summarization","machine-learning","nlp","tensorflow","text-summarization"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/david-wb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-04-19T21:10:28.000Z","updated_at":"2025-03-10T04:48:42.000Z","dependencies_parsed_at":"2022-09-16T08:52:48.940Z","dependency_job_id":null,"html_url":"https://github.com/david-wb/bert-text-summarizer","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/david-wb/bert-text-summarizer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/david-wb%2Fbert-text-summarizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/david-wb%2Fbert-text-summarizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/david-wb%2Fbert-text-summarizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/david-wb%2Fbert-text-summarizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/david-wb","download_url":"https://codeload.github.com/david-wb/bert-text-summarizer/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/david-wb%2Fbert-text-summarizer/sbom","scorecard":{"id":325928,"data":{"date":"2025-08-11","repo":{"name":"github.com/david-wb/bert-text-summarizer","commit":"de57ac002d663c670f773bae7fbc385c18ba88b2"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":2.6,"checks":[{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"SAST","score":0,"reason":"no SAST tool detected","details":["Warn: no pull requests merged into dev branch"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Code-Review","score":0,"reason":"Found 0/16 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"License","score":0,"reason":"license file not detected","details":["Warn: project does not have a license file"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}}]},"last_synced_at":"2025-08-18T02:29:10.606Z","repository_id":57414708,"created_at":"2025-08-18T02:29:10.606Z","updated_at":"2025-08-18T02:29:10.606Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31290537,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-01T13:12:26.723Z","status":"ssl_error","status_checked_at":"2026-04-01T13:12:25.102Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","deep-learning","extractive-summarization","machine-learning","nlp","tensorflow","text-summarization"],"created_at":"2026-04-01T17:16:49.049Z","updated_at":"2026-04-01T17:16:50.082Z","avatar_url":"https://github.com/david-wb.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# A BERT-based Text Summarizer\n\nCurrently, only **extractive** summarization is supported.\n\nUsing a word limit of 200, this simple model achieves approximately the following ROUGE F1 scores on the CNN/DM validation set.\n\n```buildoutcfg\nROUGE-1: 37.78\nROUGE-2: 15.78\n```\n\n## How does it work?\n\nDuring preprocessing, the input text is divided into chunks up to 512 tokens long. Each sentence is\n tokenized using the bert official tokenizer and a special `[CLS]` is placed \n at the begging of each sentence. The ROUGE-1 and ROUGE-2 scores of each sentence with \n respect to the example summary are calculated. The model ouputs a single value corresponding to each `[CLS]` token and is\n trained to directly predict the mean of the ROUGE-1 and 2 scores. \n \n During post-processing, the sentences are ranked according to their\n predicted ROUGE score. Finally, the top sentences are selected until the \n word limit is reached and resorted according to their positions within the text.\n \n## Install\n```buildoutcfg\npip install -U bert-text-summarizer\n```\n\n## Usage\n\n### Get training data\n\n```buildoutcfg\nbert-text-summarizer get-cnndm-train --max-examples=10000\n```\n\nThis outputs a tf-record file named `cnndm_train.tfrec` by default.\n\nLeaving out `--max-examples` it will process the entire CNN/DM training set which may take \u003e1 hours to complete.\n\n### Train the model\n\n```buildoutcfg\nbert-text-summarizer train-ext-summarizer \\\n  --saved-model-dir=bert_ext_summ_model \\\n  --train-data-path=cnndm_train.tfrec \\\n  --epochs=10\n```\n\n### Get summary\n\n```buildoutcfg\nbert-text-summarizer get-summary \\\n  --saved-model-dir=bert_ext_summ_model \\\n  --article-file=article.txt \\\n  --max-words=150\n```\n\nYou can create a summary programmatically like this\n```python\nimport tensorflow_hub as hub\nfrom official.nlp.bert import tokenization\n\nfrom bert_text_summarizer.extractive.model import ExtractiveSummarizer\n\n# Create the tokenizer (if you have the vocab.txt file you can bypass this tfhub step)\nbert_layer = hub.KerasLayer(\"https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/1\", trainable=False)\nvocab_file = bert_layer.resolved_object.vocab_file.asset_path.numpy()\ndo_lower_case = bert_layer.resolved_object.do_lower_case.numpy()\ntokenizer = tokenization.FullTokenizer(vocab_file, do_lower_case)\n\n# Create the summarizer\npredictor = ExtractiveSummarizer(tokenizer=tokenizer, saved_model_dir='bert_ext_summ_model')\n\n# Get the article summary\narticle = open('article.txt', 'r').read().strip()\nsummary = predictor.get_summary(text=article, max_words=200)\nprint(summary)\n```\n\n### Evaluate on the CNN/DM validation set\n\n```\nbert-text-summarizer eval-ext-summarizer \\\n  --saved-model-dir=bert_ext_summ_model\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavid-wb%2Fbert-text-summarizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdavid-wb%2Fbert-text-summarizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavid-wb%2Fbert-text-summarizer/lists"}