{"id":13341463,"url":"https://github.com/lincprog/CAASES","last_synced_at":"2025-03-11T21:31:06.613Z","repository":{"id":240806380,"uuid":"750414952","full_name":"lincprog/CAASES","owner":"lincprog","description":"CAASES is a web crawler for extracting website accessibility information using ASES. It provides metrics like URL, title, HTML file size, and eMAG-based error/warning summaries across markup, behavior, content, design, multimedia, and forms.","archived":false,"fork":false,"pushed_at":"2024-05-20T20:19:06.000Z","size":25,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-10-24T08:30:55.405Z","etag":null,"topics":["web-accessibility","web-crawler"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lincprog.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-30T15:52:03.000Z","updated_at":"2024-05-23T01:06:09.000Z","dependencies_parsed_at":"2024-05-21T00:25:38.813Z","dependency_job_id":null,"html_url":"https://github.com/lincprog/CAASES","commit_stats":null,"previous_names":["lincprog/caases"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lincprog%2FCAASES","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lincprog%2FCAASES/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lincprog%2FCAASES/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lincprog%2FCAASES/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lincprog","download_url":"https://codeload.github.com/lincprog/CAASES/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243115372,"owners_count":20238747,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["web-accessibility","web-crawler"],"created_at":"2024-07-29T19:25:26.355Z","updated_at":"2025-03-11T21:31:06.322Z","avatar_url":"https://github.com/lincprog.png","language":"Python","readme":"# **CAASES**\r\n\r\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/Selenium)](https://www.python.org/downloads/)  [![License: MIT](https://img.shields.io/badge/License-MIT-red.svg)](https://www.gnu.org/licenses/mit)\r\n\r\nCAASES is a web crawler created for extracting accessibility information from websites through the [Avaliador e Simulador de Acessibilidade em Sítios (ASES)](https://asesweb.governoeletronico.gov.br/)\r\n\r\nThe collected information includes:\r\n\r\n## Page Info\r\n\r\n|    Variable         | Description | Dtype |\r\n|---------------------|-------------|-------|\r\n| `url`               | URL of the web page\t | string |\r\n| `name`              | The title of the web page | string |\r\n| `size_bytes`        | The size of the page's HTML file in bytes | float |\r\n| `with_html`         | Indicates whether information was collected from the HTML file or the URL | boolean |\r\n| `num_lines_of_code` | The number of lines of code in the web page | integer |\r\n\r\n## Errors/Warnings Summary\r\n\r\nThe accessibility summary based on [eMAG](https://emag.governoeletronico.gov.br/) recommendations presents errors and warnings for six evaluation sections: **markup**, **behavior**, **content/information**, **presentation/design**, **multimedia** and **forms**.\r\n\r\n| Variable              | Description              | Dtype   |\r\n|-----------------------|--------------------------|---------|\r\n| `url`                 | URL of the web page\t | string |\r\n| `ases_pct`            | Web page accessibility percentage | float |\r\n| `n_markup_errors`     | Number of markup errors on the page | integer |\r\n| `n_behavior_errors`   | Number of behavior errors on the page | integer |\r\n| `n_information_errors`| Number of information errors on the page | integer |\r\n| `n_presentation_errors`| Number of presentation errors on the page | integer |\r\n| `n_multimedia_errors` | Number of multimedia errors on the page | integer |\r\n| `n_form_errors`       | Number of form errors on the page | integer |\r\n| `n_markup_warnings`   | Number of markup warnings on the page | integer |\r\n| `n_behavior_warnings` | Number of behavior warnings on the page | integer |\r\n| `n_information_warnings`| Number of information warnings on the page | integer |\r\n| `n_presentation_warnings`| Number of presentation warnings on the page | integer |\r\n| `n_multimedia_warnings` | Number of multimedia warnings on the page | integer |\r\n| `n_form_warnings`     | Number of form warnings on the page | integer |\r\n\r\n## Errors/Warnings eMAG\r\n\r\n| Variable           | Description       | Dtype       |\r\n|--------------------|-------------------|-------------|\r\n| `url`              | URL of the web page\t | string |\r\n| `category`         | Category of the content (`mark`, `behavior`, `information`, `presentation`, `multimedia`, `form`) | string |\r\n| `info_type`        | Type of information (`error` or `warning`) | string |\r\n| `recommendation`   | eMAG recommendation | string |\r\n| `count`            | Quantity of recommendations | integer |\r\n| `source_code_lines`| Lines of code to which the recommendation applies | list(string) |\r\n\r\n## Installation \u0026 File Tree\r\n\r\nCAASES can be installed directly from the source using the following commands\r\n\r\n```bash\r\ngit clone https://github.com/lincprog/CAASES.git\r\npip install -r requirements.txt\r\n```\r\n\r\nThe project structure is defined as follows:\r\n```\r\n📦CAASES\r\n ┣ 📂data  (data collected by the crawler. It includes the data explained above)\r\n ┃ ┣ 📜emag_summary.csv\r\n ┃ ┣ 📜err_warn_summary.csv\r\n ┃ ┗ 📜page_info.csv\r\n ┣ 📂html_files  (store HTML files downloaded by the crawler)\r\n ┣ 📂logs  (store log files generated by the crawler)\r\n ┣ 📜broken_urls.txt  (contains a list of broken URLs encountered during execution)\r\n ┣ 📜docker-compose.yml (Docker Compose configuration file)\r\n ┣ 📜main.py  (executing the crawling process)\r\n ┣ 📜models.py  (definitions for structures used for data collection)\r\n ┣ 📜README.md\r\n ┣ 📜requirements.txt  (dependencies required for the project)\r\n ┣ 📜urls.txt  (a list of URLs to be processed by the crawler)\r\n ┗ 📜utils.py  (utility functions and helper methods used throughout the project)\r\n```\r\n\r\n## Usage\r\n\r\n**Please, ensure [Docker](https://www.docker.com/products/docker-desktop/) is installed on your system.**\r\n\r\n1. Navigate to the directory containing the `docker-compose.yaml` file.\r\n2. Open a terminal window.\r\n4. Run the command below to start the Docker containers defined in the docker-compose.yaml file\r\n```bash\r\ndocker-compose up -d\r\n```\r\n5. Once the containers are running, navigate to the directory containing the **main\\.py** file.\r\n6. Execute the **main\\.py** file using the appropriate command for your Python environment\r\n```bash\r\npython main.py\r\n```\r\n\r\nOptionally, to monitor the execution in the Selenium Grid, you can access the URL in your browser at http://localhost:4444.\r\n\r\n## CAASES in action\r\n\r\nThe following works use CAASES:\r\n\r\n- Marcos, C. O., Gustavo, S. S., \u0026 Antonio, F. L. J. J. (2024). Dados da avaliação de acessibilidade Web nos portais das Instituições de Ensino Superior no Brasil [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10612128","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flincprog%2FCAASES","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flincprog%2FCAASES","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flincprog%2FCAASES/lists"}