https://github.com/jackaduma/secbert
pretrained BERT model for cyber security text, learned CyberSecurity Knowledge
https://github.com/jackaduma/secbert
apt attention bert bert-embeddings cyber-security cyber-threat-intelligence cybersecurity deep-learning-security deeplearning machine-learning-security nlp nlp-machine-learning security security-automation threat-analysis threat-detection threat-hunting threat-intelligence transformer-encoder transformers
Last synced: about 1 month ago
JSON representation
pretrained BERT model for cyber security text, learned CyberSecurity Knowledge
- Host: GitHub
- URL: https://github.com/jackaduma/secbert
- Owner: jackaduma
- License: mit
- Created: 2020-10-27T16:00:49.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2023-04-28T22:14:19.000Z (about 2 years ago)
- Last Synced: 2023-08-02T20:13:14.926Z (almost 2 years ago)
- Topics: apt, attention, bert, bert-embeddings, cyber-security, cyber-threat-intelligence, cybersecurity, deep-learning-security, deeplearning, machine-learning-security, nlp, nlp-machine-learning, security, security-automation, threat-analysis, threat-detection, threat-hunting, threat-intelligence, transformer-encoder, transformers
- Language: Python
- Homepage:
- Size: 490 KB
- Stars: 81
- Watchers: 9
- Forks: 17
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
#
**`SecBERT`**
[](https://github.com/jackaduma/SecBERT)
[](https://paypal.me/jackaduma?locale.x=zh_XC)[**中文说明**](./README.zh-CN.md) | [**English**](./README.md)
`SecBERT` is a `BERT` model trained on cyber security text, learned CyberSecurity Knowledge.
* `SecBERT` is trained on papers from the corpus of
* [APTnotes](https://github.com/kbandla/APTnotes)
* [Stucco-Data: Cyber security data sources](https://stucco.github.io/data/)
* [CASIE: Extracting Cybersecurity Event Information from Text](https://ebiquity.umbc.edu/_file_directory_/papers/943.pdf)
* [SemEval-2018 Task 8: Semantic Extraction from CybersecUrity REports using Natural Language Processing (SecureNLP)](https://competitions.codalab.org/competitions/17262).* `SecBERT` has its own vocabulary (`secvocab`) that's built to best match the training corpus. We trained [SecBERT](https://huggingface.co/jackaduma/SecBERT) and [SecRoBERTa](https://huggingface.co/jackaduma/SecRoBERTa) versions.
## **Table of Contents**
## **Downloading Trained Models**
SecBERT models now installable directly within Huggingface's framework:
```
from transformers import AutoTokenizer, AutoModelForMaskedLMtokenizer = AutoTokenizer.from_pretrained("jackaduma/SecBERT")
model = AutoModelForMaskedLM.from_pretrained("jackaduma/SecBERT")
tokenizer = AutoTokenizer.from_pretrained("jackaduma/SecRoBERTa")
model = AutoModelForMaskedLM.from_pretrained("jackaduma/SecRoBERTa")
```
------
## **Pretrained-Weights**
We release the the pytorch version of the trained models. The pytorch version is created using the [Hugging Face](https://github.com/huggingface/pytorch-pretrained-BERT) library, and this repo shows how to use it.
[Huggingface Modelhub](https://huggingface.co/models)
* [SecBert](https://huggingface.co/jackaduma/SecBERT)
* [SecRoBERTa](https://huggingface.co/jackaduma/SecRoBERTa)
### **Using SecBERT in your own model**
SecBERT models include all necessary files to be plugged in your own model and are in same format as BERT.
If you use PyTorch, refer to [Hugging Face's repo](https://github.com/huggingface/pytorch-pretrained-BERT) where detailed instructions on using BERT models are provided.
## **Fill Mask**
We proposed to build language model which work on cyber security text, as result, it can improve downstream tasks (NER, Text Classification, Semantic Understand, Q&A) in Cyber Security Domain.
First, as below shows Fill-Mask pipeline in [Google Bert](), [AllenAI SciBert](https://github.com/allenai/scibert) and our [SecBERT](https://github.com/jackaduma/SecBERT) .
```
cd lm
python eval_fillmask_lm.py
```
## **Downstream-tasks**
### TODO
------
## **Star-History**
------
## Donation
If this project help you reduce time to develop, you can give me a cup of coffee :)AliPay(支付宝)
![]()
WechatPay(微信)
![]()
------
## **License**
[MIT](LICENSE) © Kun