{"id":33572363,"url":"https://github.com/adobe/spacecat-audit-worker","last_synced_at":"2026-06-17T13:01:26.309Z","repository":{"id":217801951,"uuid":"701157925","full_name":"adobe/spacecat-audit-worker","owner":"adobe","description":"SpaceCat Audit Worker for auditing edge delivery sites.","archived":false,"fork":false,"pushed_at":"2026-06-15T15:55:08.000Z","size":101600,"stargazers_count":13,"open_issues_count":64,"forks_count":13,"subscribers_count":31,"default_branch":"main","last_synced_at":"2026-06-15T16:21:58.390Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/adobe.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-10-06T03:47:31.000Z","updated_at":"2026-06-15T15:24:02.000Z","dependencies_parsed_at":"2026-01-03T07:01:48.469Z","dependency_job_id":null,"html_url":"https://github.com/adobe/spacecat-audit-worker","commit_stats":null,"previous_names":["adobe/spacecat-audit-worker"],"tags_count":1832,"template":false,"template_full_name":null,"purl":"pkg:github/adobe/spacecat-audit-worker","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adobe%2Fspacecat-audit-worker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adobe%2Fspacecat-audit-worker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adobe%2Fspacecat-audit-worker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adobe%2Fspacecat-audit-worker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/adobe","download_url":"https://codeload.github.com/adobe/spacecat-audit-worker/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adobe%2Fspacecat-audit-worker/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34449283,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-17T02:00:05.408Z","response_time":127,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-11-28T11:03:21.608Z","updated_at":"2026-06-17T13:01:26.299Z","avatar_url":"https://github.com/adobe.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SpaceCat Audit Worker\n\n\u003e SpaceCat Audit Worker for auditing edge delivery sites.\n\n## Status\n[![codecov](https://img.shields.io/codecov/c/github/adobe-rnd/spacecat-audit-worker.svg)](https://codecov.io/gh/adobe-rnd/spacecat-audit-worker)\n[![CircleCI](https://img.shields.io/circleci/project/github/adobe-rnd/spacecat-audit-worker.svg)](https://circleci.com/gh/adobe-rnd/spacecat-audit-worker)\n[![GitHub license](https://img.shields.io/github/license/adobe-rnd/spacecat-audit-worker.svg)](https://github.com/adobe-rnd/spacecat-audit-worker/blob/master/LICENSE.txt)\n[![GitHub issues](https://img.shields.io/github/issues/adobe-rnd/spacecat-audit-worker.svg)](https://github.com/adobe-rnd/spacecat-audit-worker/issues)\n[![LGTM Code Quality Grade: JavaScript](https://img.shields.io/lgtm/grade/javascript/g/adobe-rnd/spacecat-audit-worker.svg?logo=lgtm\u0026logoWidth=18)](https://lgtm.com/projects/g/adobe-rnd/spacecat-audit-worker)\n[![semantic-release](https://img.shields.io/badge/%20%20%F0%9F%93%A6%F0%9F%9A%80-semantic--release-e10079.svg)](https://github.com/semantic-release/semantic-release)\n\n## Installation\n\n```bash\n$ npm install @adobe/spacecat-audit-worker\n```\n\n## Usage\n\nSee the [API documentation](docs/API.md).\n\n## Development\n\n### Build\n\n```bash\n$ npm install\n```\n\n### Test\n\n```bash\n$ npm test\n```\n\n### Lint\n\n```bash\n$ npm run lint\n```\n\n## Message Body Formats\n\nAudit worker consumes the `AUDIT_JOBS_QUEUE` queue, performs the requested audit, then queues the result to `AUDIT_RESULTS_QUEUE` for the interested parties to consume later on.\n\nExpected message body format in `AUDIT_JOBS_QUEUE` is:\n\n```json\n{\n  \"type\": \"string\",\n  \"siteId\": \"string\"\n}\n```\n\nOutput message body format sent to `AUDIT_RESULTS_QUEUE` is:\n\n```json\n{\n  \"type\": \"string\",\n  \"url\": \"string\",\n  \"auditContext\": \"object\",\n  \"auditResult\": \"object\"\n}\n```\n\n## How to Run Locally\n\n**Prerequisite:** Connection to Adobe Corp VPN is required for accessing KLAM and Vault.\n\n### 1. Using `nodemon` and AWS Credentials\n\nEveryone working on Spacecat should have access to the development environments via [KLAM](https://klam.corp.adobe.com/). \nIf you don’t have access, please refer to the engineering onboarding guide or contact your Spacecat team representative.\n\nAfter logging into KLAM, you’ll receive the following credentials required to access AWS resources such as S3 for local development:\n\n- `AWS_ACCESS_KEY_ID`\n- `AWS_SECRET_ACCESS_KEY`\n- `AWS_SESSION_TOKEN`\n\n**IMPORTANT: DO NOT USE THE AWS TOKENS FROM KLAM PRODUCTION PROFILES. USE ONLY DEV TOKENS.**\n\n\n### Steps to use `nodemon`\n\n#### 1. Configure Local Environment Variables\n\nBoth development scripts (`npm start` and `npm run start:unpacked`) require environment variables to be set. The **`start:unpacked`** script uses [dotenv](https://github.com/motdotla/dotenv) to automatically load them from a `.env` file in the project root. For **`npm start`**, you need to manually export the variables in your shell or use a tool to load `.env` before running the command.\n\nThe `.env` file should contain:\n\n1. **AWS Credentials** (from KLAM) for accessing S3 and other AWS services\n2. **Application Secrets** required by various audits (API keys, service endpoints, etc.)\n\n**Creating your `.env` file:**\n\nCreate a `.env` file in the root directory with the following structure:\n\n```bash\n# AWS Credentials (from KLAM - use DEV profile only)\nAWS_REGION=us-east-1\nAWS_ACCESS_KEY_ID=\u003cyour-access-key-from-klam\u003e\nAWS_SECRET_ACCESS_KEY=\u003cyour-secret-key-from-klam\u003e\nAWS_SESSION_TOKEN=\u003cyour-session-token-from-klam\u003e\n\n# Core Spacecat Configuration\nPOSTGREST_URL=\u003cyour-postgrest-endpoint\u003e\nS3_SCRAPER_BUCKET_NAME=spacecat-scraper-results\n\n# Add additional secrets as needed for specific audits\n# Example: SEO_API_KEY=your-key-here\n# Example: SLACK_BOT_TOKEN=your-token-here\n```\n\n**Where to get application secrets:**\n\nApplication secrets (API keys, tokens, etc.) are stored in AWS Secrets Manager. You can retrieve them using:\n\n```bash\n# Fetch secrets from AWS Secrets Manager and save to .env\n./scripts/populate-env.sh\n```\n\nThis script pulls all secrets from `/helix-deploy/spacecat-services/audit-worker/latest` and appends them to your `.env` file.\n\n**Important Notes:**\n- **Never commit `.env` to git** - it's already in `.gitignore`\n- **Use only DEV credentials from KLAM** - never production tokens\n- **Refresh AWS credentials regularly** - KLAM tokens expire after a few hours\n- Only `npm run start:unpacked` automatically loads `.env` via dotenv\n- For `npm start`, either export variables manually or use `source .env` (with proper format)\n\n#### 2. Run/Debug with `npm start` (Source Mode)\n\nOnce your `.env` file is set up, you'll need to export the environment variables before starting the dev server.\n\n**Option A: Export variables in your shell**\n```bash\nexport AWS_REGION=us-east-1\nexport AWS_ACCESS_KEY_ID=\u003cyour-key\u003e\nexport AWS_SECRET_ACCESS_KEY=\u003cyour-secret\u003e\nexport AWS_SESSION_TOKEN=\u003cyour-token\u003e\n# ... export other variables\nnpm start\n```\n\n**Option B: Use a shell script to source `.env`**\n```bash\n# Make sure your .env file has proper export syntax:\n# export AWS_REGION=us-east-1\n# export AWS_ACCESS_KEY_ID=...\n\nsource .env\nnpm start\n```\n\nThis runs the source code directly with hot-reloading. To use breakpoints, make sure to use the debugging tools provided by your IDE (e.g., VSCode, WebStorm, etc.).\n\n**Note:** `npm start` does **not** automatically load `.env` - you must set environment variables manually. The `test/dev/server.mjs` script relies on `process.env` already being populated.\n\n#### 3. Run/Debug with `npm run start:unpacked` (Bundle Mode)\n\nTo test the **actual bundled Lambda artifact** that gets deployed to AWS, use the `start:unpacked` script. This is useful for debugging bundle-specific issues like missing dependencies or runtime module resolution problems.\n\n**Steps:**\n\n1. **Ensure your `.env` file is configured** (see step 1 above)\n   \n   Unlike `npm start`, the `start:unpacked` script (`test/dev/server-unpacked.mjs`) uses `dotenv` to **automatically load** your `.env` file. No manual exports needed! The bundled code will also attempt to fetch additional secrets from AWS Secrets Manager using the path `/helix-deploy/spacecat-services/audit-worker/latest`.\n\n2. **Build the bundle:**\n   ```bash\n   npm run build\n   ```\n\n3. **Prepare the unpacked bundle:**\n   ```bash\n   # Remove old artifacts\n   rm -rf dist/spacecat-services/unpacked\n   \n   # Unzip the bundle into the unpacked directory\n   cd dist/spacecat-services\n   unzip audit-worker@*.zip -d unpacked/\n   cd ../..\n   ```\n\n4. **Start the dev server:**\n   ```bash\n   npm run start:unpacked\n   ```\n   \n   You should see output like:\n   ```\n   ✓ Loaded .env from: /path/to/.env\n   Unpacked bundle loaded successfully (using lambda adapter)\n   loaded 23 package parameter in 408ms\n   loaded 82 package parameter in 5374ms\n   Started development server at http://localhost:3000/\n   ```\n\n5. **Attach the Chrome debugger:**\n   - Open Chrome and navigate to `chrome://inspect`\n   - Click \"Configure...\" and ensure `localhost:9229` is listed\n   - Under \"Remote Target\", click \"inspect\" on the running Node process\n   - Set breakpoints in the bundled code at `dist/spacecat-services/unpacked/index.js`\n\n**Note:** The bundled version reflects the exact production runtime behavior, including:\n- How helix-deploy packages dependencies\n- How secrets are loaded from AWS Secrets Manager\n- How the Lambda adapter processes requests\n\n#### 3. Trigger an Audit\n\nWith the server running, you can trigger an audit using a `curl` POST request. The request body should include the audit type and `siteId`:\n\n```json\n{\n  \"type\": \"\u003caudit handler name\u003e\",\n  \"siteId\": \"\u003csiteId\u003e\"\n}\n```\n\n- A list of audit handler names can be found in the [index.js file](https://github.com/adobe/spacecat-audit-worker/blob/main/src/index.js#L45).\n- You can retrieve a `siteId` using:\n    - The [Spacecat API](https://opensource.adobe.com/spacecat-api-service/#tag/site/operation/getSiteByBaseUrl)\n    - The Slack command: `@spacecat-dev get site domain.com`\n\nExample `curl` request to trigger the \"apex\" audit:\n\n```bash\ncurl -X POST http://localhost:3000 \\\n     -H \"Content-Type: application/json\" \\\n     -d '{ \"type\": \"apex\", \"siteId\": \"9ab0575a-c238-4470-ae82-9d37fb2d0e78\" }'\n```\n\n#### 4. Inspect the Audit Result\n\nOnce the audit completes, the results are saved to the Postgres data store.\n\nTo retrieve the audit result, use the [Spacecat API](https://opensource.adobe.com/spacecat-api-service/#tag/audit/operation/getLatestAuditForSite).\n\nFor example, to fetch the result for the \"apex\" audit triggered above:\n\n```bash\ncurl -H \"x-api-key: \u003cYOUR_API_KEY\u003e\" \\\n     \"https://spacecat.experiencecloud.live/api/ci/sites/9ab0575a-c238-4470-ae82-9d37fb2d0e78/latest-audit/apex\"\n```\n\n**Note:**  \nAlways verify the timestamp of the returned audit result. If the audit failed to save (e.g., due to a bug), you might receive results from a previous run.\n\n\n\n\n### 2. Using AWS SAM and Docker.\n\n1. Ensure you have [Docker](https://docs.docker.com/desktop/setup/install/mac-install/), [AWS SAM](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/install-sam-cli.html) and [jq](https://jqlang.org/) installed.\n2. Login to AWS using [KLAM](https://klam.corp.adobe.com/) and login with your [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html).\n    * KLAM dev project: `SpaceCat Development (AWS3338)`\n3. To provide secrets to the audit, please run `./scripts/populate-env.sh` once. It will fetch all secrets from the AWS Secret Manager.\n4. To run the audit locally, execute the following commands:\n    ```bash\n    source env.sh\n    npm run local-build\n    npm run local-run\n    ```\n5. Starting point of the execution is `src/index-local.js`. Output of the audit can be found in `output.txt`.\n6. To hot reload any changes in the `/src` folder, you can use `npm run local-watch`. Note: This will require to run `npm run local-build` at least once beforehand.\n\nIf you need to add additional secrets, make sure to adjust the Lambda `template.yml` accordingly.\n\n\n## Audit Worker Flow\n\n![SpaceCat (Star Catalogue) - Audit Flow](https://github.com/adobe/spacecat-audit-worker/assets/1171225/78632887-3edf-4aee-b28a-4cecc3c28fc8)\n\n\n## What is a Spacecat Audit\n\nA Spacecat audit is an operation designed for various purposes, including inspection, data collection, verification, and more, all performed on a given `URL`. Spacecat supports two types of audits:\n\n1. **Traditional (Runner-based) Audits**: Single-function audits that execute their logic in one pass. Best for straightforward checks like uptime monitoring or performance scoring.\n\n2. **Step-based Audits**: Multi-step workflows where each step can be processed by different specialized workers. Ideal for complex scenarios requiring different processing capabilities or coordination between multiple services.\n\nSpacecat audits run periodically: weekly, daily, and even hourly. By default, the results of these audits are automatically stored in the Postgres data store and sent to the `audit-results-queue`. The results can then be queried by type via [the Spacecat API](https://opensource.adobe.com/spacecat-api-service/#tag/audit).\n\n## Audit Steps\n\nA Spacecat audit consists of seven steps, six of which are provided by default. The only step that typically changes between different audits is the core runner, which contains the business logic.\n\n1. **Site Provider**: This step reads the message with `siteId` information and retrieves the site object from the database. By default, the `defaultSiteProvider` reads the site object from the Star Catalogue. This step can be overridden.\n1. **Org Provider**: This step retrieves the organization information from the Star Catalogue. This step can be overridden.\n1. **URL Resolver**: This step calculates which URL to run the audit against. By default, the `defaultUrlResolver` sends an HTTP request to the site's `baseURL` and returns the `finalURL` after following the redirects. This step can be overridden.\n1. **Runner**: The core function that contains the audit's business logic. **No default runner is provided**. The runner should return an object with `auditResult`, which holds the audit result, and `fullAuditRef`, a string that holds a reference (often a URL) to the audit.\n1. **Persister**: The core function that stores the `auditResult`, `fullAuditRef`, and the audit metadata. By default, the `defaultPersister` stores the information back in the Star Catalogue.\n1. **Message Sender**: The core function that sends the audit result to a downstream component via a message (queue, email, HTTP). By default, the `defaultMessageSender` sends the audit result to the `audit-results-queue` in Spacecat.\n1. **Post Processors**: A list of post-processing functions that further process the audit result for various reasons. By default, no post processor is provided. These should be added only if needed.\n\n## How to create a new Audit\n\nWhen implementing a new audit, first decide which type of audit best suits your needs:\n\n- Choose a **Traditional (Runner-based) Audit** when:\n  - Your audit logic can execute in a single pass\n  - You don't need to coordinate with other specialized workers\n  - The processing time fits within Lambda execution limits\n  - Example: Checking site uptime, collecting CWV metrics\n\n- Choose a **Step-based Audit** when:\n  - Your audit requires multiple specialized processing steps\n  - You need to coordinate with other workers (e.g., content scrapers)\n  - Processing might exceed Lambda execution limits\n  - Example: Content analysis requiring scraping, processing, and analysis steps\n\n### Creating a Traditional Audit\n\nTo create a traditional audit, you'll need to create an audit handler function. This function should accept a `url` and a `context` (see [HelixUniversal](https://github.com/adobe/helix-universal/blob/main/src/adapter.d.ts#L120) ) object as parameters, and it should return an `auditResult` along with `fullAuditRef`. Here's an example:\n\n```js\nexport async function auditRunner(url, context) {\n\n  // your audit logic goes here...\n\n  return {\n    auditResult: results,\n    fullAuditRef: baseURL,\n  };\n}\n\nexport default new AuditBuilder()\n  .withRunner(auditRunner)\n  .build();\n\n```\n\n### Creating a Step-based Audit\n\nFor step-based audits, use the `addStep()` method...\n\n### How to customize audit steps\n\nAll audits share common components, such as persisting audit results to a database or sending them to SQS for downstream components to consume. These common functionalities are managed by default functions. However, if desired, you can override them as follows:\n\n```js\nexport async function auditRunner(url, context) {\n\n  // your audit logic goes here...\n\n  return {\n    auditResult: results,\n    fullAuditRef: baseURL,\n  };\n}\n\nexport async function differentUrlResolver(site) {\n  // logic to override to default behavior of the audit step\n\n  return 'url';\n}\n\nexport default new AuditBuilder()\n  .withUrlResolver(differentUrlResolver)\n  .withRunner(auditRunner)\n  .build();\n\n```\n\n### How to prevent audit result to sent to SQS queue\n\nUsing a noop messageSender, audit results might not be sent to the audit results SQS queue:\n\n```js\nexport async function auditRunner(url, context) {\n\n  // your audit logic goes here...\n\n  return {\n    auditResult: results,\n    fullAuditRef: baseURL,\n  };\n}\n\nexport default new AuditBuilder()\n  .withRunner(auditRunner)\n  .withMessageSender(() =\u003e {}) // no-op message sender\n  .build();\n\n```\n\n### How to add a custom post processor\n\nYou can add a post-processing step for your audit using `AuditBuilder`'s `withPostProcessors` function. The list of post-processing functions will be executed sequentially after the audit run.\n\nPost-processor functions take two params: `auditUrl` and `auditData` as following. `auditData` object contains following properties:\n\n```\nauditData = {\n  siteId: string,\n  isLive: boolean,\n  auditedAt: string,\n  auditType: string,\n  auditResult: object,\n  fullAuditRef: string,\n};\n```\n\nHere's the full example:\n\n```js\nexport async function auditRunner(url, context) {\n\n  // your audit logic goes here...\n\n  return {\n    auditResult: results,\n    fullAuditRef: baseURL,\n  };\n}\n\nasync function postProcessor(auditUrl, auditData, context) {\n  // your post-processing logic goes here\n  // you can obtain the dataAccess from context\n  // { dataAccess } = context;\n}\n\nexport default new AuditBuilder()\n  .withRunner(auditRunner)\n  .withPostProcessors([ postProcessor ]) // you can submit multiple post processors\n  .build();\n\n```\n\n### How to add Opportunities and Suggestions\n\nIn the handler, the `opportunityAndSuggestions` function is responsible for converting audit data into an opportunity and synchronizing suggestions.\n\nThis function utilizes the `convertToOpportunity` function to create or update an opportunity based on the audit data and type.\n\nThe `buildKey` function is used to generate a unique key for each suggestion based on specific properties of the audit data.\n\nIt then uses the `syncSuggestions` function to map new suggestions to the opportunity and synchronize them.\n\n```js\nimport { syncSuggestions } from '../utils/data-access.js';\nimport { convertToOpportunity } from '../common/opportunity.js';\nimport { createOpportunityData } from './opportunity-data-mapper.js';\n\nexport async function opportunityAndSuggestions(auditUrl, auditData, context) {\n  const opportunity = await convertToOpportunity(\n    auditUrl,\n    auditData,\n    context,\n    createOpportunityData,\n    auditType,\n  );\n\n  const { log } = context;\n  \n  // buildKey and SyncSuggestions logic based on the auditType goes here...\n)};\n```\n```js\nexport default new AuditBuilder()\n  .withRunner(auditRunner)\n  .withPostProcessors([opportunityAndSuggestions])\n  .build();\n```\n\n\nThe logic for converting to an opportunity is in `common/opportunity.js`. The function `convertToOpportunity` is used to create a new opportunity or update an existing one based on the audit type. The function takes the audit URL, audit data, context, createOpportunityData, auditType, and props as arguments. It first fetches the opportunities for the site. If the opportunity is not found, it creates a new one. If the opportunity is found, it updates the existing one with the new data. The function returns the opportunity entity.\n\n\nHow to map the opportunity data in the handler's `opportunity-data-mapper.js` file:\n\n```js\nexport function createOpportunityData(parameters) {\n  return {\n    runbook: 'runbook',\n    origin: 'origin',\n    title: 'title',\n    description: 'description',\n    guidance: {\n      steps: [\n        'step1',\n        'step2',\n      ],\n    },\n    tags: ['tag1'],\n    data: {data},\n  };\n}\n```\n\n\n### Auto-Detection of Publish and Regression Detection\n\nThe `syncSuggestionsWithPublishDetection` function extends `syncSuggestions` with automatic detection of when fixes are published and regression detection.\n\n#### Features\n\n1. **Reconcile Disappeared Suggestions**: When a suggestion disappears from audit data (e.g., a broken backlink is fixed), the system can automatically:\n   - Check if the issue was fixed using the AI-suggested fix (`isIssueFixedWithAISuggestion` callback)\n   - Mark the suggestion as `FIXED`\n   - Create a `FixEntity` to track the fix\n\n2. **Publish Deployed Fix Entities**: After reconciliation, the system verifies deployed fixes on production:\n   - Calls `isIssueResolvedOnProduction` callback for each deployed fix\n   - Moves `FixEntity` from `DEPLOYED` to `PUBLISHED` status when verified\n\n3. **Regression Detection**: When a previously `FIXED` suggestion reappears in audit data:\n   - Checks if all fix entities are fully published\n   - If so, logs a warning about potential regression\n\n#### Usage\n\n```js\nimport { syncSuggestionsWithPublishDetection } from '../utils/data-access.js';\n\nawait syncSuggestionsWithPublishDetection({\n  context,\n  opportunity,\n  newData: auditResult.items,\n  buildKey: (item) =\u003e item.uniqueKey,\n  mapNewSuggestion: (item) =\u003e ({ type: 'FIX', data: item }),\n  // Optional: Check if disappeared suggestion was fixed using AI suggestion\n  isIssueFixedWithAISuggestion: async (suggestion) =\u003e {\n    // Return true if the issue was fixed using the suggested fix\n    return checkIfFixedWithSuggestion(suggestion);\n  },\n  // Optional: Build fix entity payload when issue is fixed\n  buildFixEntityPayload: (suggestion, opportunity, isAuthorOnly) =\u003e ({\n    opportunityId: opportunity.getId(),\n    status: isAuthorOnly ? 'DEPLOYED' : 'PUBLISHED',\n    suggestions: [suggestion.getId()],\n  }),\n  // Optional: Verify if issue is resolved on production\n  isIssueResolvedOnProduction: async (suggestion) =\u003e {\n    // Return true if the issue is verified fixed on production\n    return verifyFixOnProduction(suggestion);\n  },\n});\n```\n\n#### Author-Only Opportunity Types\n\nSome opportunity types represent changes that only affect the authoring environment (no publish step required). For these types:\n\n- Fix entities are created with `DEPLOYED` status instead of `PUBLISHED`\n- The publish verification step is skipped\n- Regression detection checks for `DEPLOYED` fix entities instead of `PUBLISHED`\n\nCurrently defined author-only types:\n- `security-permissions-redundant`\n- `security-permissions`\n\nTo add a new author-only type, add it to the `AUTHOR_ONLY_OPPORTUNITY_TYPES` array in `src/utils/data-access.js`.\n\n### How to add auto-suggest to an audit\nA new auto-suggest feature can be added as a post processor step to the existing audit.\n\nThe `AuditBuilder` is chaining all post processors together and passing the `auditData` object to each post processor.\nThe `auditData` object can be updated by each post processor and the updated `auditData` object will be passed to the next post processor.\nIf the `auditData` object is not updated by a post processor, the previous `auditData` object will be used.\n\nThe auto-suggest post processor should verify if the site is enabled for suggestions and if the audit was run successfully:\n\n```js\nexport const generateSuggestionData = async (finalUrl, auditData, context, site) =\u003e {\n  const { dataAccess, log } = context;\n  const { Configuration } = dataAccess;\n\n  if (auditData.auditResult.success === false) {\n    log.info('Audit failed, skipping suggestions generation');\n    return { ...auditData };\n  }\n\n  const configuration = await Configuration.findLatest();\n  if (!configuration.isHandlerEnabledForSite('[audit-name]-auto-suggest', site)) {\n    log.info('Auto-suggest is disabled for site');\n    return {...auditData};\n  }\n}\n```\n\n```js\nexport default new AuditBuilder()\n  .withRunner(auditRunner)\n  .withPostProcessors([ generateSuggestionData, convertToOpportunity ])\n  .build();\n```\n\n## Step-Based Audits\n\nSpacecat supports multi-step audit workflows where each step can be processed by different workers. This enables complex audit scenarios that may require different processing capabilities or need to be split across multiple services.\n\n### Creating a Step-Based Audit\n\nHere's an example of how to create a step-based audit:\n\n```js\nimport { Audit } from '@adobe/spacecat-shared-data-access';\n\nconst { AUDIT_STEP_DESTINATIONS } = Audit;\n\nexport default new AuditBuilder()\n  // First step: Prepare content scraping\n  .addStep('prepare', async (context) =\u003e {\n    const { site, finalUrl, log } = context;\n    log.info(`Preparing content scrape for ${site.getBaseURL()}`);\n    \n    // First step MUST return auditResult and fullAuditRef\n    return {\n      auditResult: { status: 'preparing' },\n      fullAuditRef: `s3://content-bucket/${site.getId()}/raw.json`,\n      // Additional data for content scraper\n      urls: [{ url: finalUrl }],\n      siteId: site.getId(),\n    };\n  }, AUDIT_STEP_DESTINATIONS.CONTENT_SCRAPER)\n\n  // Second step: Process results\n  .addStep('process', async (context) =\u003e {\n    const { site, audit } = context;\n    // Access previous audit data via audit.getFullAuditRef()\n    return {\n      type: 'content-import',\n      siteId: site.getId(),\n    };\n  }, AUDIT_STEP_DESTINATIONS.IMPORT_WORKER)\n\n  // Final step: Analyze results (no destination needed for final step)\n  .addStep('analyze', async (context) =\u003e {\n    const { audit } = context;\n    const results = await analyzeContent(audit.getFullAuditRef());\n    return {\n      status: 'complete',\n      findings: results,\n    };\n  })\n  .build();\n```\n\n### Step Requirements\n\n1. **First Step**\n   - Must return an object containing both `auditResult` and `fullAuditRef`\n   - These are used to create the initial audit record\n   - Receives `finalUrl` in context (resolved from site's baseURL)\n   - No previous audit data available in context\n\n2. **Intermediate Steps**\n   - Must specify a destination queue via `AUDIT_STEP_DESTINATIONS`\n   - Return data will be formatted according to destination requirements\n   - Have access to audit record via `context.audit`\n   - Can access previous step data via `audit.getFullAuditRef()`\n\n3. **Final Step**\n   - Must not specify a destination\n   - Return data will be stored as the final audit result\n   - Has access to all previous audit data via `context.audit`\n\n### Step Context\n\nEach step receives a context object containing:\n- `site`: The site being audited (with methods like `getBaseURL()`, `getId()`)\n- `audit`: The audit record (undefined for first step)\n- `finalUrl`: The resolved URL (only in first step)\n- `scrapeResultPaths`: Map(url -\u003e path) for all successfully scraped URLs (only after scrape step (SCRAPE_CLIENT only))\n- Standard context properties (`log`, `dataAccess`, etc.)\n\n### Destinations\n\nThe `AUDIT_STEP_DESTINATIONS` enum defines supported destination queues. Each destination has specific payload format requirements:\n\n```js\nCONTENT_SCRAPER: {\n  // Formats payload for content scraper queue\n  payload: {\n    urls: Array\u003c{url: string}\u003e,\n    jobId: string,\n    auditContext: {\n      next: string,\n      auditId: string,\n      auditType: string,\n      fullAuditRef: string\n    }\n  }\n}\n\nIMPORT_WORKER: {\n  // Formats payload for import worker queue\n  payload: {\n    type: string,\n    siteId: string,\n    auditContext: {\n      next: string,\n      auditId: string,\n      auditType: string,\n      fullAuditRef: string\n    }\n  }\n}\n\nSCRAPE_CLIENT: {\n    // Formats payload for scrape client\n    payload: {\n        urls: Array\u003c{url: string}\u003e,\n        processingType: string,\n        options: object,\n        maxScrapeAge: number,\n        auditData: {\n          siteId: string,\n          completionQueueUrl: string, \n          auditContext: {\n            next: string,\n            auditId: string,\n            auditType: string,\n            fullAuditRef: string\n          }\n        }\n    }\n}\n```\n\n### Error Handling\n\nThe step-based audit implementation includes several validations:\n\n1. **Step Configuration**\n   - All steps except the last must specify a valid destination\n   - Step handlers must be functions\n   - First step must return both `auditResult` and `fullAuditRef`\n\n2. **Audit Context**\n   - For subsequent steps, audit ID must be valid\n   - Audit record must exist in database\n   - Audit type must match current audit\n\n3. **Destination Validation**\n   - Destinations must be from `AUDIT_STEP_DESTINATIONS`\n   - Each destination must have valid queue URL and payload formatting\n\nIf any validation fails, the audit will throw an error and stop processing.\n\n### Message Flow Example\n\nHere's how messages flow between workers in a step-based audit:\n\n```js\n// 1. Initial trigger message to audit-worker\n{\n  \"type\": \"content-audit\",\n  \"siteId\": \"123\",\n  \"auditContext\": {}\n}\n\n// 2. After first step, audit-worker sends to content-scraper\n{\n  \"urls\": [{ \"url\": \"https://example.com\" }],\n  \"jobId\": \"audit-456\",\n  \"auditContext\": {\n    \"next\": \"process\",\n    \"auditId\": \"audit-456\",\n    \"auditType\": \"content-audit\",\n    \"fullAuditRef\": \"s3://content-bucket/123/raw.json\"\n  }\n}\n\n// 3. Content-scraper completes, sends to audit-worker\n{\n  \"type\": \"content-audit\",\n  \"siteId\": \"123\",\n  \"auditContext\": {\n    \"next\": \"process\",\n    \"auditId\": \"audit-456\",\n    \"auditType\": \"content-audit\",\n    \"fullAuditRef\": \"s3://content-bucket/123/raw.json\"\n  }\n}\n\n// 4. Audit-worker processes second step, sends to import-worker\n{\n  \"type\": \"content-import\",\n  \"siteId\": \"123\",\n  \"auditContext\": {\n    \"next\": \"analyze\",\n    \"auditId\": \"audit-456\",\n    \"auditType\": \"content-audit\",\n    \"fullAuditRef\": \"s3://content-bucket/123/raw.json\"\n  }\n}\n\n// 5. Import-worker completes, sends to audit-worker\n{\n  \"type\": \"content-audit\",\n  \"siteId\": \"123\",\n  \"auditContext\": {\n    \"next\": \"analyze\",\n    \"auditId\": \"audit-456\",\n    \"auditType\": \"content-audit\",\n    \"fullAuditRef\": \"s3://content-bucket/123/processed.json\"\n  }\n}\n\n// 6. Final step completes, audit-worker sends results\n{\n  \"type\": \"content-audit\",\n  \"url\": \"https://example.com\",\n  \"auditContext\": {\n    \"auditId\": \"audit-456\",\n    \"auditType\": \"content-audit\",\n    \"fullAuditRef\": \"s3://content-bucket/123/processed.json\"\n  },\n  \"auditResult\": {\n    \"status\": \"complete\",\n    \"findings\": [/*...*/]\n  }\n}\n```\n\nEach message preserves the `auditContext` to maintain the step chain. The `next` field determines which step runs next, while `auditId` and `fullAuditRef` track the audit state across workers.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadobe%2Fspacecat-audit-worker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fadobe%2Fspacecat-audit-worker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadobe%2Fspacecat-audit-worker/lists"}