{"id":21016604,"url":"https://github.com/gsa/site-scanning-engine","last_synced_at":"2025-05-15T05:33:02.911Z","repository":{"id":37576842,"uuid":"300761835","full_name":"GSA/site-scanning-engine","owner":"GSA","description":"The repository for the rearchitected site-scanning project, specifically the scanning engine itself. ","archived":false,"fork":false,"pushed_at":"2025-05-07T17:04:52.000Z","size":10068,"stargazers_count":24,"open_issues_count":0,"forks_count":12,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-05-07T18:23:10.630Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://github.com/GSA/site-scanning/issues","language":"TypeScript","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/GSA.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"docs/CONTRIBUTING.md","funding":null,"license":null,"code_of_conduct":"docs/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"docs/SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-10-03T00:02:51.000Z","updated_at":"2025-05-07T17:04:55.000Z","dependencies_parsed_at":"2023-02-15T10:16:22.324Z","dependency_job_id":"b47072d5-556a-48b9-847a-476924c9417e","html_url":"https://github.com/GSA/site-scanning-engine","commit_stats":{"total_commits":618,"total_committers":16,"mean_commits":38.625,"dds":0.6019417475728155,"last_synced_commit":"64dc2d4a81bf90c1950ab14d3c230a29a48de678"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GSA%2Fsite-scanning-engine","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GSA%2Fsite-scanning-engine/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GSA%2Fsite-scanning-engine/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GSA%2Fsite-scanning-engine/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/GSA","download_url":"https://codeload.github.com/GSA/site-scanning-engine/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254282738,"owners_count":22045128,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-19T10:14:52.078Z","updated_at":"2025-05-15T05:32:59.599Z","avatar_url":"https://github.com/GSA.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Site Scanning Engine\n\n[![CodeQL](https://github.com/GSA/site-scanning-engine/actions/workflows/codeql.yml/badge.svg)](https://github.com/GSA/site-scanning-engine/actions/workflows/codeql.yml)\n[![Semgrep](https://github.com/GSA/site-scanning-engine/actions/workflows/semgrep.yml/badge.svg)](https://github.com/GSA/site-scanning-engine/actions/workflows/semgrep.yml)\n[![Snyk Scan](https://github.com/GSA/site-scanning-engine/actions/workflows/snyk.yml/badge.svg)](https://github.com/GSA/site-scanning-engine/actions/workflows/snyk.yml)\n[![MegaLinter](https://github.com/GSA/site-scanning-engine/actions/workflows/megalinter.yml/badge.svg)](https://github.com/GSA/site-scanning-engine/actions/workflows/megalinter.yml)\n[![woke](https://github.com/GSA/site-scanning-engine/actions/workflows/woke.yml/badge.svg)](https://github.com/GSA/site-scanning-engine/actions/workflows/woke.yml)\n\n## Description\n\nThis repository is for the Site Scanning engine itself, the codebase that actually runs the scans and generates data. This is the new base scanner repository which uses Headless Chrome, powered by Puppeteer for scanning.\n\nFor more detailed documentation about the Site Scanning program, including **who it's for**, **what it does**, **long-term goals**, etc. please visit the \n[Site Scanning program website](https://digital.gov/site-scanning), especially the [Technical Details page](https://digital.gov/guides/site-scanning/technical-details/).  \n\nThe project's issue tracker and other relevant repositories and links [can be found here](https://github.com/GSA/site-scanning?tab=readme-ov-file#site-scanning).  \n\n## Table of Contents\n\n- [Site Scanning Engine](#site-scanning-engine)\n  - [Description](#description)\n  - [Table of Contents](#table-of-contents)\n  - [Quickstart](#quickstart)\n    - [Installation](#installation)\n    - [Dotenv](#dotenv)\n    - [Docker](#docker)\n    - [Build and Start all apps](#build-and-start-all-apps)\n    - [Ingest Website List](#ingest-website-list)\n    - [Enqueue scans](#enqueue-scans)\n    - [Run individual Scans](#run-individual-scans)\n  - [Test](#test)\n  - [Deploy](#deploy)\n\n- [Development Documentation](./docs)\n  - [Project Layout](./docs/layout.md)\n  - [Detailed Development](./docs/development.md)\n  - [Architecture](./docs/architecture/README.md)\n\n## Quickstart\n\nDevelopment Requirements:\n\n- `git`\n- `nodejs`\n- `nvm` (see [.nvmrc](./.nvmrc) for current `node` version.\n- `docker`\n- `docker-compose`\n- Cloud Foundry CLI (aka `cf`)\n- `redis-cli` (optional)\n\n### Installation\n\nFirst clone the repository:\n\n```bash\ngit clone https://github.com/GSA/site-scanning-engine/\n```\n\nFrom the project root run:\n\n```bash\nnvm use\n```\n\nThis will install the correct Node version for the project.\n\n```bash\nnpm i\n```\n\nThis will install all production and development Node dependencies.\n\n### Dotenv\n\nThe project uses a dotenv (`.env`) file for local development credentials.\nNote that this file is not version-controlled and should only be used for\nlocal development.\n\nBefore starting Docker, create a `.env` file in the project root and add\nthe following values replacing `\u003cadd_a_key_here\u003e` with a local passwords\nthat are at least 8 characters long.\n\n**Note: this is only for local development and has no impact on the Cloud.gov configuration**\n\n```env\n# postgres configuration\nDATABASE_HOST=localhost\nDATABASE_PORT=5432\nPOSTGRES_USER=postgres\nPOSTGRES_PASSWORD=\u003cadd_a_key_here\u003e\n\n# redis configuration\nQUEUE_HOST=localhost\nQUEUE_PORT=6379\n\n# Minio Config -- Minio is an S3 api compliant storage\nMINIO_ACCESS_KEY=\u003cadd_a_key_here\u003e\nMINIO_SECRET_KEY=\u003cadd_a_key_here\u003e\nAWS_ACCESS_KEY_ID=\u003cadd_a_key_here\u003e\nAWS_SECRET_ACCESS_KEY=\u003cadd_a_key_here\u003e\nS3_HOSTNAME=localhost\nS3_PORT=9000\nS3_BUCKET_NAME=site-scanning-snapshot\n\n# Sets the development environment name to dev\nNODE_ENV=dev\n```\n\n### Docker\n\nFrom the project root run:\n\n```bash\ndocker-compose up --build -d\n```\n\nThis will build `(--build)` all of the Docker containers and\nnetwork interfaces listed in the\n[docker-compose.yml](docker-compose.yml) file and start them\nrunning in the background `(-d)`.\n\n`docker-compose down` will stop and remove all containers\nand network interfaces.\n\nRunning `docker-compose up --build -d` will rebuild all of\nthe containers. This is useful if you need to wipe data from\nthe database, for instance.\n\nIf you encounter any issues starting the containers with\n`docker-compose`, specifically related to OOM errors\n(or Exit 137) try upping the resources in your Docker\npreferences.\n\n#### Building application images\n\nTo build the application image, go to the project root\nand run:\n\n```bash\ndocker build -f apps/scan-engine/Dockerfile .\n```\n\n### Build and Start all apps\n\n`cd` to the project root and run:\n\n```bash\nnpm run build:all\n```\n\nThis command will build the apps, which compiles from Typescript\nto Javascript, doing any minification and optimization in the\nprocess. All of the app artifacts end up in the `/dist` directory.\nThis is ultimately what gets pushed to Cloud Foundry.\n\nNote that you can also build apps seperately:\n\n```bash\nnpm run build:api\nnpm run build:scan-engine\nnpm run build:cli\n```\n\nNext, you can start the apps with following command:\n\n```bash\nnpm run start:all\n```\n\nThe apps are started as follows: first the API starts and then\nthe Site Scanning worker follows. This is designed so that the\nAPI app runs any shared configuration against the database first.\n\nNote, that you can start the apps individually as follows:\n\n```bash\nnpm run start:api\nnpm run start:scan-engine\n```\n\n### Ingest Website List\n\nThe Site Scanning engine relies on a list of federal domains and metadata about\nthose to domains to operate. This list is ingested into the system\nfrom a public repository using a the [Ingest Service](libs/ingest).\n\nTo run the ingest service do the following:\n\n```bash\nnpm run ingest -- --limit 200\n```\n\nThe limit parameter is optional, but it can be useful to use a smaller\nsubset of the total list for local development.\n\n### Enqueue scans\n\nTo enqueue for scan all sites in the website table:\n\n```bash\nnpx nest start cli -- enqueue-scans\n```\n\n### Run individual Scans\n\nTo scan a single site, which must be in the website table, run commands like:\n\n```bash\nnpx nest start cli -- scan-site --url 18f.gov\n```\n\nNOTE: This is intended for testing scan behavior, and doesn't currently\nwrite results to the database.\n\n## Test\n\nFrom the project root run:\n\n```bash\nnpm run test:unit\n```\n\nThis runs all unit tests.\n\n## Deploy\n\nFirst, log in to Cloud.gov using the CLI and choose the organization and space.\n\nThen, you can use the `cloudgov-deploy.sh` script to build and deploy the apps.\n\nYou can optionally pass a different manifest file with `cloudgov-deploy.sh manifest-dev.yml`.\n\n## Additional Documentation\n\n- [License](docs/LICENSE.md)\n- [Contributing](docs/CONTRIBUTING.md)\n- [Security](docs/SECURITY.md)\n- [Code of Conduct](docs/CODE_OF_CONDUCT.md)\n- [Deployment](docs/deployment.md)\n- [Development](docs/development.md)\n- [Environment Provisioning](docs/environment_provisioning.md)\n- [Layout](docs/layout.md)\n- [Architectural Diagram](docs/architecture/diagrams/images/architecture-cloud-gov.png)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgsa%2Fsite-scanning-engine","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgsa%2Fsite-scanning-engine","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgsa%2Fsite-scanning-engine/lists"}