{"id":24506440,"url":"https://github.com/cak/osv-data-collection","last_synced_at":"2025-03-15T08:45:09.664Z","repository":{"id":273247890,"uuid":"919075951","full_name":"cak/osv-data-collection","owner":"cak","description":"A project for accessing data from the OSV vulnerability database and exporting it to various formats for analysis and reporting","archived":false,"fork":false,"pushed_at":"2025-01-19T18:17:27.000Z","size":32,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-19T19:26:31.857Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cak.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-19T16:42:20.000Z","updated_at":"2025-01-19T18:17:30.000Z","dependencies_parsed_at":"2025-01-19T19:36:35.426Z","dependency_job_id":null,"html_url":"https://github.com/cak/osv-data-collection","commit_stats":null,"previous_names":["cak/osv-data-collection"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cak%2Fosv-data-collection","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cak%2Fosv-data-collection/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cak%2Fosv-data-collection/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cak%2Fosv-data-collection/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cak","download_url":"https://codeload.github.com/cak/osv-data-collection/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243707304,"owners_count":20334615,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-01-21T23:37:13.260Z","updated_at":"2025-03-15T08:45:09.642Z","avatar_url":"https://github.com/cak.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Open Source Vulnerability (OSV) Data Processor\n\nThis project provides tools to download and process vulnerability data from the [OSV database](https://osv.dev/). It supports multiple ecosystems to collect and save CVEs for open-source libraries and packages.\n\n## Features\n\n- Download and extract OSV vulnerability data for selected ecosystems.\n- Process JSON files to extract CVE identifiers.\n- Save CVEs to a CSV file with ecosystem metadata.\n- Modular design for easy addition of new ecosystems.\n\n## Requirements\n\n- Python 3.11+\n- The only required package is `requests`.\n\n### Installation\n\n1. Clone the repository:\n   ```bash\n   git clone git@github.com:cak/osv-data-collection.git\n   cd osv-data-collection\n   ```\n\n2. Create a virtual environment and activate it:\n   ```bash\n   python -m venv venv\n   source venv/bin/activate  # On Windows: venv\\Scripts\\activate\n   ```\n\n3. Install dependencies using `pip`:\n   ```bash\n   pip install requests\n   ```\n   \n---\n\n## Scripts Overview\n\n### `osv_downloader.py`\n\n**Purpose**: Download and extract OSV data for specified ecosystems.\n\n#### Key Functions:\n- **`create_directory`**: Ensures a directory exists.\n- **`download_file`**: Downloads a ZIP file from a URL.\n- **`extract_zip`**: Extracts a ZIP file and removes it after extraction.\n- **`download_and_extract_osv`**: Combines the above steps to handle ecosystem-specific OSV data.\n\n#### Usage:\n```bash\npython osv_downloader.py\n```\n\n#### Output:\n- Data for each ecosystem is extracted into the `./data/{ecosystem}/` directory.\n\n---\n\n### `osv_processor.py`\n\n**Purpose**: Process downloaded OSV JSON files to extract CVEs and save them to CSV files.\n\n#### Key Functions:\n- **`fetch_osv_data`**: Reads JSON files for a specific ecosystem and extracts CVE identifiers from `id`, `aliases`, and `related` fields.\n- **`save_osv_cves`**: Saves the extracted CVEs and ecosystem information into a CSV file.\n\n#### Usage:\n```bash\npython osv_processor.py\n```\n\n#### Output:\n- A CSV file for each ecosystem is saved in the `./output/` directory. Example: `./output/PyPI-cves.csv`\n\n---\n\n## Supported Ecosystems\n\nThe following ecosystems are covered:\n- PyPI (Python)\n- npm (JavaScript/Node.js)\n- crates.io (Rust)\n- Go (Go modules)\n- RubyGems (Ruby)\n- Maven (Java)\n- NuGet (.NET)\n- Packagist (PHP)\n- Hex (Elixir/Erlang)\n- Pub (Dart)\n- R (CRAN and Bioconductor)\n\n### Adding New Ecosystems\nTo add a new ecosystem, include its name in the `ecosystems` list in both `osv_downloader.py` and `osv_processor.py`. The scripts will handle the rest automatically.\n\n---\n\n## Example Workflow\n\n1. **Download OSV Data**:\n   Run `osv_downloader.py` to fetch and extract data for all supported ecosystems.\n   ```bash\n   python osv_downloader.py\n   ```\n\n2. **Process Data**:\n   Run `osv_processor.py` to extract CVEs and save them to CSV files.\n   ```bash\n   python osv_processor.py\n   ```\n\n3. **Check Output**:\n   Extracted CVEs for each ecosystem will be saved in the `./output/` directory.\n\n---\n\n## Contributing\n\nFeel free to submit issues or pull requests to improve this project. Contributions are welcome!\n\n---\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcak%2Fosv-data-collection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcak%2Fosv-data-collection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcak%2Fosv-data-collection/lists"}