{"id":13296912,"url":"https://github.com/muchdogesec/cxe2stix_helper","last_synced_at":"2025-05-02T14:33:45.248Z","repository":{"id":243654181,"uuid":"809776616","full_name":"muchdogesec/cxe2stix_helper","owner":"muchdogesec","description":"[ARCHIVED -- USE CVE2STIX] A small Python wrapper to download data using cve2stix and cpe2stix.","archived":true,"fork":false,"pushed_at":"2024-12-12T15:56:15.000Z","size":481,"stargazers_count":4,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-05T18:53:14.779Z","etag":null,"topics":["cpe","cve","nvd","stix2","stix2-patterns"],"latest_commit_sha":null,"homepage":"https://www.dogesec.com/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/muchdogesec.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-03T12:35:07.000Z","updated_at":"2024-12-12T15:56:57.000Z","dependencies_parsed_at":"2024-09-09T07:24:51.656Z","dependency_job_id":"3fefabd5-aeef-41d1-83e8-7056e6b54883","html_url":"https://github.com/muchdogesec/cxe2stix_helper","commit_stats":null,"previous_names":["muchdogesec/cxe2stix_helper"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/muchdogesec%2Fcxe2stix_helper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/muchdogesec%2Fcxe2stix_helper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/muchdogesec%2Fcxe2stix_helper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/muchdogesec%2Fcxe2stix_helper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/muchdogesec","download_url":"https://codeload.github.com/muchdogesec/cxe2stix_helper/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252053936,"owners_count":21687196,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpe","cve","nvd","stix2","stix2-patterns"],"created_at":"2024-07-29T17:21:18.086Z","updated_at":"2025-05-02T14:33:45.228Z","avatar_url":"https://github.com/muchdogesec.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# cxe2stix_helper\n\n## Before you begin\n\nWe host a full web API that includes all objects created by cve2stix and cpe2stix, [Vulmatch](https://www.vulmatch.com/).\n\n## Overview\n\n![](docs/cve2stix.png)\n\nA small wrapper to download data using [cve2stix](https://github.com/muchdogesec/cve2stix/) and [cpe2stix](https://github.com/muchdogesec/cpe2stix/), organising it into STIX bundles based on time ranges.\n\n## Install the script\n\n```shell\n# clone the latest code\ngit clone https://github.com/muchdogesec/cxe2stix_helper -b main --recurse-submodules\n# create a venv\ncd cxe2stix_helper\npython3 -m venv cxe2stix_helper-venv\nsource cxe2stix_helper-venv/bin/activate\n# install requirements\npip3 install -r requirements.txt\n```\n\n### Configuration options\n\ncxe2stix_helper has various settings that are defined in an `.env` file.\n\nTo create a template for the file:\n\n```shell\ncp .env.example .env\n```\n\nTo see more information about how to set the variables, and what they do, read the `.env.markdown` file.\n\n## Usage\n\n```shell\npython3 cxe2stix_helper.py \\\n\t--run_cve2stix boolean \\\n\t--run_cpe2stix boolean \\\n\t--last_modified_earliest date \\\n\t--last_modified_latest date \\\n\t--file_time_range dictionary\n```\n\nWhere;\n\n* `run_cve2stix` (optional, boolean): will run the cve2stix script with settings defined\n\t* default: `false`\n* `run_cpe2stix` (optional, boolean): will run the cpe2stix script with settings defined\n\t* default: `false`\n* `last_modified_earliest` (required, date in format `YYYY-MM-DDThh:mm:ss`): used in the the cve2stix/cpe2stix config\n\t* default: none\n* `last_modified_latest` (required, date in format `YYYY-MM-DDThh:mm:ss`): used in the the cve2stix/cpe2stix config\n\t* default: none\n* `file_time_range` (optional): defines how much data should be packed in each output bundle. Use `d` for days, `m` for months, `y` for years. Note, if no results are found for a time period, a bundle will not be generated. This usually explains why you see \"missing\" bundles for a day or month. \n\t* default `1m` (1 month)\n\n### Example 1: Get 3 months of CPE data (split into STIX bundles of 1 month)\n\n```shell\npython3 cxe2stix_helper.py \\\n\t--run_cpe2stix \\\n\t--last_modified_earliest 2023-03-04T00:00:00 \\\n\t--last_modified_latest 2023-06-04T23:59:59 \\\n\t--file_time_range 1m\n```\n\nWill generate 4 bundle files in directories as follows:\n\n```txt\noutput\n└── bundles\n\t└── cpe\n\t\t└── 2023\n\t\t\t├── cpe-bundle-2023_03_04-2023_03_31.json\n\t\t\t├── cpe-bundle-2023_04_01-2023_04_30.json\n\t\t\t├── cpe-bundle-2023_05_01-2023_05_31.json\n\t\t\t└── cpe-bundle-2023_06_01-2023_06_04.json\n```\n\n### Example 2: Get 3 days of CVE data (split into STIX bundles of 1 day)\n\n```shell\npython3 cxe2stix_helper.py \\\n\t--run_cve2stix \\\n\t--last_modified_earliest 2023-01-01T00:00:00 \\\n\t--last_modified_latest 2023-01-03T23:59:59 \\\n\t--file_time_range 1d\n```\n\nWill generate 3 bundle files:\n\n* `cve-bundle-2023_01_01-2023_01_01.json`\n* `cve-bundle-2023_01_02-2023_01_02.json`\n* `cve-bundle-2023_01_03-2023_01_03.json`\n\n```txt\noutput\n└── bundles\n\t└── cve\n\t\t└── 2023-01\n\t\t\t├── cve-bundle-2023_01_01-2023_01_01.json\n\t\t\t├── cve-bundle-2023_01_02-2023_01_02.json\n\t\t\t└── cve-bundle-2023_01_03-2023_01_03.json\n```\n\n### Example 3: Get 2 days of CVE and CPE data (split into STIX bundles of 2 months)\n\n```shell\npython3 cxe2stix_helper.py \\\n\t--run_cve2stix \\\n\t--run_cpe2stix \\\n\t--last_modified_earliest 2023-01-01T00:00:00 \\\n\t--last_modified_latest 2023-01-02T23:59:59 \\\n\t--file_time_range 2m\n```\n\nWill generate 2 bundle files:\n\n```txt\noutput\n└── bundles\n \t├── cve\n\t│ \t└── 2023\n\t│ \t\t└── cve-bundle-2023_01_01-2023_01_02.json\n\t└── cpe\n\t  \t└── 2023\n\t  \t\t└── cpe-bundle-2023_01_01-2023_01_02.json\n```\n\n## Why not run the scripts (cpe2stix / cve2stix) independently?\n\nThe APIs can return a large amount of data, and downloading large time ranges in one run can cause an issue.\n\nWe use a range of downstream tools that require STIX bundles in smaller sizes and with certain naming conventions.\n\nWhich means you need to manually edit the .env files for many time ranges each time.\n\ncxe2stix_helper is designed to automate the process of downloading very large datasets whilst also allowing control on the output filenames.\n\nIf you want to keep a copy of each individual STIX .json object, you should use cve2stix or cpe2stix. cxe2stix_helper will only print the final bundles.\n\n## Recommendations for running large backfills\n\n### CVE\n\nThe first CVE published was `1988-10-01T04:00:00.000` (. There are 250,888 at the time of writing, and this number increasing rapidly.\n\nNote, whilst the first CVE was published in October 1988, it appears all CVEs published before 2005 were updated at the end of 2005 (or afterwards).\n\nThere are more CPEs (1,267,211 currently) than CVEs but the STIX objects created from them are smaller, and thus a smaller file size. The earliest CPEs have a last modified date in 2007.\n\nDue to the volume and size of CVEs, we recommend iterating through the data in days. This means all bundles (especially those after 2018) will always be less than 10mb.\n\nHere is what we use;\n\n```shell\npython3 cxe2stix_helper.py \\\n\t--run_cve2stix \\\n\t--run_cpe2stix \\\n\t--last_modified_earliest 2005-01-01T00:00:00 \\\n\t--last_modified_latest 2024-11-30T23:59:59 \\\n\t--file_time_range 1d\n``` \n\n## Git submodule use\n\nWe try and keep this repo in sync with the remote cve2stix / cpe2stix repos used as Git submodules when changes happen.\n\nSometimes this is not always the case (either because we've forgot, or there are breaking changes).\n\nIf it's the case we've forgotten, you can update the Git Submodules in this repo as follows:\n\n```shell\ncd cpe2stix \u0026\u0026 \\\ngit checkout main \u0026\u0026 \\\ngit pull \u0026\u0026 \\\ncd .. \u0026\u0026 \\\ncd cve2stix \u0026\u0026 \\\ngit checkout main \u0026\u0026 \\\ngit pull \u0026\u0026 \\\ncd ..\n```\n\n## Support for Cloudflare R2 + Github action\n\nWe use a Github action to run this script daily to store the bundles generated by cxe2stix_helper on Cloudflare R2.\n\nThe script runs at 0700 UTC everyday (github servers UTC) using cron:  `\"0 7 * * *\"`\n\nYou can see the action in: `/.github/workflows/daily-r2.yml`.\n\nEssentially the following command is run everyday by the action\n\n```shell\npython3 cxe2stix_helper.py \\\n\t--run_cve2stix \\\n\t--run_cpe2stix \\\n\t--last_modified_earliest \"YESTERDAY (00:00:00)\" \\\n\t--last_modified_latest \"YESTERDAY (23:59:59)\" \\\n\t--file_time_range 1d\n```\n\nThe action will store the data in the bucket as follows;\n\n```txt\ncxe2stix-helper-github-action-output\n├── cve\n│ \t└── 2023-01\n│  \t\t└── cve-bundle-2023_01_01-2023_01_02.json\n└── cpe\n\t  └── 2023-01\n\t  \t└── cpe-bundle-2023_01_01-2023_01_02.json\n```\n\nIf you'd like to run the action in your own repository to create your own data store you will need to do the following;\n\n### Create Cloudflare bucket/kets\n\nFirst, go to Cloudflare.com and navigate to R2. Create a new bucket called `cxe2stix-helper-github-action-output`.\n\nNow you need to create a CloudFlare API keys. For the CloudFlare API Key you create, make sure to set the permissions to `Admin Read \u0026 Write`. For security, it is also worth limiting the scope of the key to the bucket `cxe2stix_helper-github-action-output` (defined in the action).\n\n### Set Github vars\n\nThen go to the Github repo, then `repo \u003e settings \u003e secrets and variables \u003e actions \u003e new repository secret`.\n\n![](docs/github-repo-vars.png)\n\nThen choose one of the following options;\n\n#### Option 1: use `CLOUDFLARE_*` vars\n\nSet the following in the secrets;\n\n```txt\nCLOUDFLARE_ACCOUNT_ID=#Get this in Cloudflare R2 UI\nCLOUDFLARE_ACCESS_KEY_ID=#Get this in Cloudflare R2 UI\nCLOUDFLARE_ACCESS_KEY_SECRET=#Get this in Cloudflare R2 UI\nNVD_API_KEY=#Get this from https://nvd.nist.gov/developers/request-an-api-key\n```\n\nYou most likely want to use this approach.\n\n#### Option 2: use `RCLONE_CONFIG` var\n\nIn the `RCLONE_CONFIG` var, add a valid RClone conf file (title must be `[R2]`), e.g.\n\n```\n[r2]\ntype = s3\nprovider = Cloudflare\naccess_key_id = \u003cACCESS_KEY\u003e\nsecret_access_key = \u003cSECRET_ACCESS_KEY\u003e\nregion = auto\nendpoint = https://\u003cACCOUNT_ID\u003e.r2.cloudflarestorage.com\nacl = private\n```\n\nThis approach allows you to potentially use other services than just Cloudflare, if you know what you're doing.\n\nWhere:\n\n* `[r2]`: An alias for the storage service. We need to use it to operate files, should always be `[r2]`\n* `type` = s3: The type of file operation API. R2 supports the S3 standard protocol.\n* `provider` = Cloudflare: The storage provider ID. You could use man rclone in your terminal to get the supported providers.\n* `access_key_id`: You need to create a token with Admin Read \u0026 Write permissions on the R2 console (note, I am not sure if this is a bug, but I couldn’t get it to work with any other permissions levels)\n* `secret_access_key`: Same as above.\n* `endpoint`: The URL that rclone uses to operate files. To get the account id on the top-right of the R2 homepage.\n\n### Backfill advice\n\nDue to the backfill size it will cause timeouts if you try to run it on Github. Similarly, if you set the `file_time_range` above `1d` it is likely to timeout due to data sizes. It's better to run the backfill locally and then start the automated action to backfill from backfill dayN+1.\n\nHere are the Rclone commands you can use to upload the backfill files downloaded locally;\n\n```shell\nrclone copy output/bundles/cpe r2:cti-public/cxe2stix-helper-github-action-output/cpe --exclude '.*{/**,}' \u0026\u0026 \\\nrclone copy output/bundles/cve r2:cti-public/cxe2stix-helper-github-action-output/cve --exclude '.*{/**,}'\n```\n\nYou will need to replace `cti-public` with your bucket name. `/cxe2stix-helper-github-action-output/cpe` is the path the the directory in the bucket you want to store the files.\n\nNote, the default behaviour of running this command will be to overwrite old files. If you need to delete the directories, you can use rclone to do so as follows\n\n```shell\nrclone purge r2:cti-public/cxe2stix-helper-github-action-output/cpe \u0026\u0026 \\\nrclone purge r2:cti-public/cxe2stix-helper-github-action-output/cve\n```\n\n## Support\n\n[Minimal support provided via the DOGESEC community](https://community.dogesec.com/).\n\n## License\n\n[Apache 2.0](/LICENSE).","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmuchdogesec%2Fcxe2stix_helper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmuchdogesec%2Fcxe2stix_helper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmuchdogesec%2Fcxe2stix_helper/lists"}