{"id":19589700,"url":"https://github.com/jackaduma/secbert","last_synced_at":"2025-04-27T12:32:49.140Z","repository":{"id":37358058,"uuid":"307754128","full_name":"jackaduma/SecBERT","owner":"jackaduma","description":"pretrained BERT model for cyber security text, learned CyberSecurity Knowledge","archived":false,"fork":false,"pushed_at":"2023-04-28T22:14:19.000Z","size":502,"stargazers_count":81,"open_issues_count":4,"forks_count":17,"subscribers_count":9,"default_branch":"main","last_synced_at":"2023-08-02T20:13:14.926Z","etag":null,"topics":["apt","attention","bert","bert-embeddings","cyber-security","cyber-threat-intelligence","cybersecurity","deep-learning-security","deeplearning","machine-learning-security","nlp","nlp-machine-learning","security","security-automation","threat-analysis","threat-detection","threat-hunting","threat-intelligence","transformer-encoder","transformers"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jackaduma.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-10-27T16:00:49.000Z","updated_at":"2023-07-29T19:38:53.000Z","dependencies_parsed_at":"2022-09-03T16:40:44.543Z","dependency_job_id":null,"html_url":"https://github.com/jackaduma/SecBERT","commit_stats":null,"previous_names":[],"tags_count":0,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jackaduma%2FSecBERT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jackaduma%2FSecBERT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jackaduma%2FSecBERT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jackaduma%2FSecBERT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jackaduma","download_url":"https://codeload.github.com/jackaduma/SecBERT/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224070397,"owners_count":17250652,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apt","attention","bert","bert-embeddings","cyber-security","cyber-threat-intelligence","cybersecurity","deep-learning-security","deeplearning","machine-learning-security","nlp","nlp-machine-learning","security","security-automation","threat-analysis","threat-detection","threat-hunting","threat-intelligence","transformer-encoder","transformers"],"created_at":"2024-11-11T08:20:19.039Z","updated_at":"2024-11-11T08:20:19.864Z","avatar_url":"https://github.com/jackaduma.png","language":"Python","readme":"\u003c!--\n * @Author: Kun\n * @Date: 2020-11-24 22:58:24\n * @LastEditTime: 2023-04-29 06:12:28\n * @LastEditors: Kun\n * @Description: \n * @FilePath: /my_open_projects/SecBERT/README.md\n--\u003e\n# \u003cp align=center\u003e**`SecBERT`**\u003c/p\u003e\n\n[![standard-readme compliant](https://img.shields.io/badge/readme%20style-standard-brightgreen.svg?style=flat-square)](https://github.com/jackaduma/SecBERT)\n[![Donate](https://img.shields.io/badge/Donate-PayPal-green.svg)](https://paypal.me/jackaduma?locale.x=zh_XC)\n\n[**中文说明**](./README.zh-CN.md) | [**English**](./README.md)\n\n`SecBERT` is a `BERT` model trained on cyber security text, learned CyberSecurity Knowledge.\n\n* `SecBERT` is trained on papers from the corpus of \n  \n  * [APTnotes](https://github.com/kbandla/APTnotes)\n  \n  * [Stucco-Data: Cyber security data sources](https://stucco.github.io/data/)  \n  \n  * [CASIE: Extracting Cybersecurity Event Information from Text](https://ebiquity.umbc.edu/_file_directory_/papers/943.pdf)\n  \n  * [SemEval-2018 Task 8: Semantic Extraction from CybersecUrity REports using Natural Language Processing (SecureNLP)](https://competitions.codalab.org/competitions/17262). \n\n* `SecBERT` has its own vocabulary (`secvocab`) that's built to best match the training corpus. We trained [SecBERT](https://huggingface.co/jackaduma/SecBERT)  and [SecRoBERTa](https://huggingface.co/jackaduma/SecRoBERTa) versions.\n\n\n## **Table of Contents**\n\n\n## **Downloading Trained Models**\n\nSecBERT models now installable directly within Huggingface's framework:\n\n```\nfrom transformers import AutoTokenizer, AutoModelForMaskedLM\n\ntokenizer = AutoTokenizer.from_pretrained(\"jackaduma/SecBERT\")\n\nmodel = AutoModelForMaskedLM.from_pretrained(\"jackaduma/SecBERT\")\n\n\ntokenizer = AutoTokenizer.from_pretrained(\"jackaduma/SecRoBERTa\")\n\nmodel = AutoModelForMaskedLM.from_pretrained(\"jackaduma/SecRoBERTa\")\n\n```\n\n------\n\n## **Pretrained-Weights** \n\nWe release the the pytorch version of the trained models. The pytorch version is created using the [Hugging Face](https://github.com/huggingface/pytorch-pretrained-BERT) library, and this repo shows how to use it.\n\n[Huggingface Modelhub](https://huggingface.co/models)\n\n  * [SecBert](https://huggingface.co/jackaduma/SecBERT)\n\n  * [SecRoBERTa](https://huggingface.co/jackaduma/SecRoBERTa)\n\n\n### **Using SecBERT in your own model**\n\nSecBERT models include all necessary files to be plugged in your own model and are in same format as BERT.\n\nIf you use PyTorch, refer to [Hugging Face's repo](https://github.com/huggingface/pytorch-pretrained-BERT) where detailed instructions on using BERT models are provided. \n\n\n## **Fill Mask**\n\nWe proposed to build language model which work on cyber security text, as result, it can improve downstream tasks (NER, Text Classification, Semantic Understand, Q\u0026A) in Cyber Security Domain.\n\nFirst, as below shows Fill-Mask pipeline in [Google Bert](), [AllenAI SciBert](https://github.com/allenai/scibert) and our [SecBERT](https://github.com/jackaduma/SecBERT) .\n\n```\ncd lm\npython eval_fillmask_lm.py\n```\n\n\u003cimg src=\"./fill-mask-result.png\" width=\"150%\" height=\"150%\"\u003e\n\n\n## **Downstream-tasks** \n\n### TODO\n\n\n------\n## **Star-History**\n\n![star-history](https://api.star-history.com/svg?repos=jackaduma/SecBERT\u0026type=Date \"star-history\")\n\n------\n\n## Donation\nIf this project help you reduce time to develop, you can give me a cup of coffee :) \n\nAliPay(支付宝)\n\u003cdiv align=\"center\"\u003e\n\t\u003cimg src=\"./misc/ali_pay.png\" alt=\"ali_pay\" width=\"400\" /\u003e\n\u003c/div\u003e\n\nWechatPay(微信)\n\u003cdiv align=\"center\"\u003e\n    \u003cimg src=\"./misc/wechat_pay.png\" alt=\"wechat_pay\" width=\"400\" /\u003e\n\u003c/div\u003e\n\n------\n\n## **License**\n\n[MIT](LICENSE) © Kun","funding_links":["https://paypal.me/jackaduma?locale.x=zh_XC"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjackaduma%2Fsecbert","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjackaduma%2Fsecbert","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjackaduma%2Fsecbert/lists"}