https://github.com/rangle/a11y-violations-crawler
A performant webpage-crawler that will run each page through axe-core to display and/or store violations related to it for reporting.
https://github.com/rangle/a11y-violations-crawler
Last synced: about 1 year ago
JSON representation
A performant webpage-crawler that will run each page through axe-core to display and/or store violations related to it for reporting.
- Host: GitHub
- URL: https://github.com/rangle/a11y-violations-crawler
- Owner: rangle
- Created: 2021-01-13T17:44:52.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2021-02-11T19:47:19.000Z (about 5 years ago)
- Last Synced: 2025-01-24T13:32:38.593Z (over 1 year ago)
- Language: CSS
- Homepage:
- Size: 1.51 MB
- Stars: 2
- Watchers: 5
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# POC - a11y crawler/scanner
The goal is to provide the client and/or developers a tool where they can access a website's accessibility issues so that they can be addressed.
Create a tool that involves:
- crawling a website
- gathering the interal urls
- generating an accessibility report for each url
- generating an accessibility summary report for the whole website
## Installation
POC is using npm, expressjs
Created using node v12
Run `yarn` from the root folder
## Pre-Run Setup
Copy the .env_sample file in crawler-scanner/ and rename it to .env.development
## Running the crawler/checker from the browser (Via React Frontend)
- navigate to localhost:3000
- on the homepage, there will be a small form to launch a scan
- enter the url and if you want to scan as well, check the box
- press the submit button
- refresh once the scanning is completed
## Running Only the crawler/checker server
`yarn workspace a11y-crawler start`
## Running Only the React server
`yarn workspace a11y-frontend-react start`
## Running the crawler from the command line
Running this will launch the node-crawler and crawl the url you have provided.
It will generate a txt result file with all the valid urls found.
- CD into the packages/crawler-scanner/src/lib folder
node crawler.js --siteUrl `` [--saveFile ]
- if no saveFile is provided, it will default to the url's hostname.
- folder/file creation: `./crawls///.txt`
Example call: `node crawler.js --siteUrl https://www.yahoo.ca`
## Running the Puppeteer checker from the command line
Running this will launch the puppeteer axe checker. It will read the txt result
file generated in the crawl, and generate JSON result files listing out all the
accessibility violations for each URL.
- CD into the packages/crawler-scanner/src/lib folder
node checker.js --crawlFilePath `` --filePrefix ``
Example: `node checker.js --crawlFilePath /Users/magalibautista/workspace/rangle/a11y-crawler-poc/src/crawls/www.yahoo.ca/2021-01-15T18-44-23.571Z/www.yahoo.ca.txt --filePrefix yahoo`
- The scans folder will be created in ./src/public/
- A folder named with the hostname will be created in /scans
## TODOs
- allow more parameters to be passed to the axe core library for different kinds of scans
- add an option to run Puppeteer headless (flag or dev environment var)
- allow a user to upload a sitemap-type file to bypass crawling (FUTURE) **
- make the front-end prettier
- ensure there is no timeout when launching scan from the frontend (long polling?)
- progress bar (via sockets) (or you can notify the user - via email)
- potentially generate partial results right away
## Notes
- could become mini saas application
- /public should be outside of /src
- make sure ./scans folder exists (or create it) . (use makedirp module)
- files should be stored outside of /src
- how do we know when a page has loaded?
- check why google.com keeps looping and if we can prevent that?
- ** FRONT END MUST BE ACCESSIBLE **
- could Cypress replace Puppeteer?