Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/cveproject/cve-reference-ingest


https://github.com/cveproject/cve-reference-ingest

Last synced: about 1 month ago
JSON representation

Awesome Lists containing this project

README

        

# Automated ADP Reference Ingest Capability

## About

The automated ADP Reference Ingest Capability is a Github action workflow that will automatically add references to an ADP container. The ADP Reference Ingest Capability has the ability to pull from multiple sources of ADP references in Github, de-duplicate those entries, and keep a master list of references.

## Data Flow

Data flows through a multi-staged Github Action. The first two stages can be repeated for as many data source repositories as desired.

### Stage 1 - Check out source repository

During the first stage, the source repository is "checked out" into the Github Actions's runner. This requires that the source repository is public, or that the Github Action is configured to have permissions to access the source repository. See Github's documation details [here](https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/enabling-features-for-your-repository/managing-github-actions-settings-for-a-repository#enabling-workflows-for-forks-of-private-repositories) for more information about configuration of actions to use private repositories.

### Stage 2 - Identify new data in source repository

This stage performs two major operations:

1. Identifes the new files that have been added to the source data repository since the last run of the Github Action.
2. Creates copies of the the new files in the main reference list contained in this repository in the `references` folder.

To perfom these operations, the ADP Automated Reference Ingest Capability leverages the `last_run_shas` directory. The `last_run_shas` directory keeps track of the last Git commit SHA from the data source repository that was processed by the github action. This SHA is then used by this stage as a starting point where as the most recent SHA is used as the ending point. The previous SHA is collected through the Github API and processed in base64 by the `read-file-via-api.py` python helper script.

Once the previous SHA and current SHA have been deterimined, a call to the `git diff` tooling is made to get the list of _files_ that have changed during the time between the provided shas. These files are then copied to the `cve-reference-ingest/references` repostiory/folder by the `create-file-via-api.py` python helper script for processing in a later stage.

#### Stage 2 - Configuration

Stage 2 requires the usage of REST requests to the Github REST API. This requires useage of access tokens for successful authentication.

* Details on how to create Access tokens can be found in Github's documentation [here](https://docs.github.com/en/[email protected]/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens)
* Details on how secrets are used in Github actions can be found in Github's documentation [here](https://docs.github.com/en/actions/security-guides/using-secrets-in-github-actions?tool=cli)

#### Stage 2 - Error Management

Stage 2 is a cruical point for maintaining state. Stage two must be completed in full in a linear methodology. If stage two fails the primary stage of the Github Action, Stage 4, will incorrectly complete and cause an invalid state to be reached that will require manual fixing.

Therefore, if Stage 2 fails, the Github Action will fail, and will trigger an email to the team.

If a failed state is reached, a proper debug message will be written to the logs to be viewed at a later time. However, almost all failed states in this stage will be related to network based failures to the Github API. All network based failures will be automatically retried the next time the github action runs, and should allow the action to retry copying the files.

If multiple failures of this stage happen consecutively over a 24 hour period, it should be invesitaged by a member of the team.

### Stage 3 - Check out Github action repository

During the third stage, the `cve-reference-ingest` repository is "checked out" into the Github Actions's runner. Providing the action with the cruical `last_run_shas` folder and `references` folder.

### Stage 4 - CVE Services

Stage 4 is responsible for 3 major operations:

1. Determining the new references in the `cve-reference-ingest` repository that need to be proccessed.
2. Writing the references to CVEs using CVE services.
3. Updating and committing the `last_run_shas` for any sources that data was pulled from.

For step 1 listed above, the Github Action checks its current Git SHA against the SHA saved in `last_run_shas`. A call to the `git diff` tooling then determines what files changed between those points.

For step 2 listed above, the Github actions passess the reference file to the `adp.py` python helper script. The `adp.py` helper script will then check to ensure the CVE the reference is for exists, ensures that there is no duplicate in the ADP container if there is one, and finally will write the new reference to the ADP container.

For step 3 listed above, after processing all the files, the Github action will update the appropriate `last_run_shas` files and commit the changes to the `cve-reference-ingest` repository.

#### Stage 4 - Configuration

Stage 4 requires the useage of a CVE services account and API key. Speak to your ORG's CNA to have an account created. The API key will then need to be added as a secret, as described in stage 2's configuration.

#### Stage 4 - Error Handling

While a file is being processed by `adp.py`, if any network requests to CVE services fail, the network reqeust will be automatically retried once. If a failure happens on the second attempt, the file will be copied to the `retry` folder. Where it will be queued to be re-attempted at a later time.

### Stage 5 - Retry

This stage only triggers if there is a reference file in the `retry` directory. Files are added to the retry directory due to failures in Stage 4. The Github Action will attempt to write all references to the respective CVE ADP containers. If the write fails, the file will remain in the `retry` folder to be tried again during the next run of the Github Action. If the write succeeds, the file will be removed from the `retry` folder.