{"id":21359354,"url":"https://github.com/trendmicro/nlp-securespacy","last_synced_at":"2025-06-17T18:34:59.878Z","repository":{"id":237125221,"uuid":"792197378","full_name":"trendmicro/NLP-SecureSpacy","owner":"trendmicro","description":null,"archived":false,"fork":false,"pushed_at":"2024-05-08T18:33:53.000Z","size":205,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-16T06:26:53.459Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/trendmicro.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-26T07:12:10.000Z","updated_at":"2024-07-15T01:31:59.000Z","dependencies_parsed_at":"2024-11-22T05:28:05.741Z","dependency_job_id":"e1f17863-cd28-41e7-9e75-64343af351e2","html_url":"https://github.com/trendmicro/NLP-SecureSpacy","commit_stats":null,"previous_names":["trendmicro/nlp-securespacy"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/trendmicro/NLP-SecureSpacy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trendmicro%2FNLP-SecureSpacy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trendmicro%2FNLP-SecureSpacy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trendmicro%2FNLP-SecureSpacy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trendmicro%2FNLP-SecureSpacy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/trendmicro","download_url":"https://codeload.github.com/trendmicro/NLP-SecureSpacy/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trendmicro%2FNLP-SecureSpacy/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260419284,"owners_count":23006247,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-22T05:27:54.867Z","updated_at":"2025-06-17T18:34:54.866Z","avatar_url":"https://github.com/trendmicro.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# securespacy\n\n`securespacy` is a Python module that contains our custom tokenizer and named entity extractor for Spacy v3. The following named entities can be extracted by using `securespacy`:\n\n- IP\n- URL\n- DOMAIN\n- EMAIL\n- MALWARE\n- CVE\n- HASH\n- INTRUSION_SET\n- CITY\n- COUNTRY\n\n`securespacy` uses Spacy's **Entity Ruler**, which is a rules-based matching approach in order to extract additional named entities from the text. In other words, this is a fancy way of saying that we're using regex and other static rules to detect entities, in order to complement Spacy's named entity recognition (NER) that uses trained language models.\n\n## Installation\n```bash\npip install git+https://github.com/trendmicro/NLP-SecureSpacy.git\n```\n\n\n## Usage\n\n```\nimport spacy\n\nimport securespacy\nfrom securespacy import tagger\nfrom securespacy.tokenizer import custom_tokenizer\nfrom securespacy.patterns import add_entity_ruler_pipeline\n\ntext = ('The quick brown fox owns the domain quickbrownfox[.]sh with the ip address 10.231.31.8 '\n'with the server located in Manila, Philippines.')\n\nnlp = spacy.load(\"en_core_web_sm\")\nnlp.tokenizer = custom_tokenizer(nlp)\nadd_entity_ruler_pipeline(nlp)\ndoc = nlp(text)\n\nfor ent in doc.ents:\n    print(f\"{ent.label_:\u003c15} {ent}\")\n\nDOMAIN          quickbrownfox[.]sh\nIP              10.231.31.8\nCITY            Manila\nCOUNTRY         Philippines\n```\n\n## Flair Wrapper\n\nsecurespacy can be used with Flair. The API is slightly different.\n\n**N.B.** In order to accelerate `phrase_matcher()`, a dictionary will be written in `~/.tokenized_matcher.pickle`.\nDelete the file to regenerate it when dictionary files are updated (usually when you update SecureSpacy.)\n\n```python\nfrom flair.models import SequenceTagger\nfrom flair.data import Sentence\nfrom securespacy.flair import SecureSpacyFlairWrapper\n\ntagger = SequenceTagger.load('ner')\ntext = 'We were able to find a second variant (detected as Trojan.MacOS.GMERA.B) that was uploaded to VirusTotal.'\nwrapper = SecureSpacyFlairWrapper()\nsentence = Sentence(text, use_tokenizer=wrapper.tokenizer)\nmodel.predict(sentence)\nwrapper.phrase_matcher(sentence)\nfor ent in sentence.get_spans('ner'):\n    print(ent)\n```\n\nThe type of sentence is `flair.data.sentence`.\n\n## References\n- https://spacy.io/usage/rule-based-matching#entityruler\n\n\n## Maintenance\n\nTo import the latest data from [MITRE ATT\u0026CK Techniques](https://github.com/mitre-attack/attack-stix-data/tree/master/enterprise-attack), download the latest JSON and run\n`./src/securespacy/data/convert-mitre-enterprise.py`. Do a manual pass before mergin the converted files, as\nshort software names (such as `Net`, `at`, `at.exe`) can cause false classifications.\n\nMerge `mitre-malware.txt` into the case-sensitive list `malware-cased.txt`.\n\n## License\n\nSee LICENSE.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftrendmicro%2Fnlp-securespacy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftrendmicro%2Fnlp-securespacy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftrendmicro%2Fnlp-securespacy/lists"}