https://github.com/autogluon/autogluon-brokenlinks
Repo to hold data on broken links of AutoGluon websites for reference
https://github.com/autogluon/autogluon-brokenlinks
Last synced: 11 months ago
JSON representation
Repo to hold data on broken links of AutoGluon websites for reference
- Host: GitHub
- URL: https://github.com/autogluon/autogluon-brokenlinks
- Owner: autogluon
- License: apache-2.0
- Created: 2023-07-31T21:14:29.000Z (almost 3 years ago)
- Default Branch: master
- Last Pushed: 2025-03-01T08:10:28.000Z (over 1 year ago)
- Last Synced: 2025-03-02T03:48:02.104Z (over 1 year ago)
- Language: Python
- Size: 353 KB
- Stars: 0
- Watchers: 8
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# AutoGluon Link Checker
AutoGluon Link Checker is a robust tool designed to crawl and identify broken links within AutoGluon's documentation. It operates seamlessly with GitHub Actions, providing daily reports for both stable and development documentation versions.
## Features
- **Crawling**: Checks links in doc.
- **Edge Case Handling**: Manages common issues like bot detection and DNS problems.
- **Reporting**: Generates comprehensive CSV reports highlighting broken links and their origins.
- **Configurable Allowlist**: Easily manage known false positives to reduce noise in reports.
## Installation
1. **Clone the Repository**:
```bash
git clone https://github.com/yourusername/autogluon-link-checker.git
cd autogluon-link-checker
```
2. **Install Dependencies**:
Ensure you have Python 3.9 installed. Then, install the required packages:
```bash
pip install -r requirements.txt
```
## Usage
Run the link checker script with the following command:
```bash
python get_broken_links.py
```
This will generate CSV files with broken links for both stable and development documentation.
## GitHub Actions Integration
The link checker is set up to run daily using GitHub Actions. The workflow is defined in `.github/workflows/broken_link_checker.yml`. It automatically commits and pushes CSV reports of broken links to the repository.
## Configuration
- **Allowed Domains**: Modify `ALLOWED_403_DOMAINS` in `get_broken_links.py` to add domains that are allowed to return 403 errors.
- **Ignored URLs**: Update `IGNORE_STRINGS_IN_URL` to skip specific URLs during the check.
## Contributing
Contributions are welcome! Please fork the repository and submit a pull request with your changes.
## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
## Contact
For questions or feedback, please open an issue on GitHub or contact the maintainers directly.