{"id":31785320,"url":"https://github.com/quantecon/action-link-checker","last_synced_at":"2025-10-10T11:52:57.226Z","repository":{"id":317467610,"uuid":"1067525711","full_name":"QuantEcon/action-link-checker","owner":"QuantEcon","description":"AI-powered GitHub Action for validating web links in HTML files","archived":false,"fork":false,"pushed_at":"2025-10-01T03:21:24.000Z","size":37,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-01T04:23:56.766Z","etag":null,"topics":["ai","documentation","github-action","html-validation","link-checker","quantecon"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/QuantEcon.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":"numfocus","custom":"https://numfocus.org/donate-to-quantecon"}},"created_at":"2025-10-01T01:36:24.000Z","updated_at":"2025-10-01T03:21:28.000Z","dependencies_parsed_at":"2025-10-01T04:24:00.558Z","dependency_job_id":"decec775-746c-4e31-ab9c-e2e06f1974ee","html_url":"https://github.com/QuantEcon/action-link-checker","commit_stats":null,"previous_names":["quantecon/action-link-checker"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/QuantEcon/action-link-checker","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QuantEcon%2Faction-link-checker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QuantEcon%2Faction-link-checker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QuantEcon%2Faction-link-checker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QuantEcon%2Faction-link-checker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/QuantEcon","download_url":"https://codeload.github.com/QuantEcon/action-link-checker/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QuantEcon%2Faction-link-checker/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279003717,"owners_count":26083610,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-10T02:00:06.843Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","documentation","github-action","html-validation","link-checker","quantecon"],"created_at":"2025-10-10T11:52:49.229Z","updated_at":"2025-10-10T11:52:57.214Z","avatar_url":"https://github.com/QuantEcon.png","language":"Python","readme":"# AI-Powered Link Checker Action\n\n[![CI](https://github.com/QuantEcon/action-link-checker/actions/workflows/ci.yml/badge.svg)](https://github.com/QuantEcon/action-link-checker/actions/workflows/ci.yml)\n[![GitHub Marketplace](https://img.shields.io/badge/Marketplace-AI%20Link%20Checker-blue.svg?colorA=24292e\u0026colorB=0366d6\u0026style=flat\u0026longCache=true\u0026logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAA4AAAAOCAYAAAAfSC3RAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAAM6wAADOsB5dZE0gAAABl0RVh0U29mdHdhcmUAd3d3Lmlua3NjYXBlLm9yZ5vuPBoAAAERSURBVCiRhZG/SsMxFEafKoEMFhyrdsFt6FYHNycunTo0Q4LgEhcnW4PgYgchGjoYiQ6ON2ARpK9nCxjUjuIFP+B+h3O/xmE2yVxPOJGkC3RgJ8qA3bQn7SiTKCQdC4J8HDW0v85CZaUHNzxhQcHdJvjZwM4mXaKJ4BdDMKxIsYoim1Smk2X6HPUdCnU5gO5D9POqvayBzY8nwoJJ+G9h9vGB0U8h8dNPgGLKlv1n6cJgAjjfY9lv1CVKq5f3oUAe5dJz9n3RkBhGA1ouJ/hT5a4c8yQQYSdF8vhN5gT1igMgZ9nJgzUqm9E1V+8rbYQhptmEURKA=)](https://github.com/marketplace/actions/ai-link-checker)\n\nA sophisticated GitHub Action that validates web links in HTML files with AI-powered suggestions for improvements. Goes beyond traditional link checkers by providing intelligent recommendations and handling modern web challenges.\n\n## 🎯 Features\n\n- **🤖 AI-Powered Suggestions**: Intelligent recommendations for broken or redirected links\n- **🔍 Smart Detection**: Bot-blocking awareness and enhanced robustness \n- **⚡ Performance Optimized**: Respectful rate limiting and efficient scanning\n- **🎛️ Flexible Modes**: Full project or PR-changed files scanning\n- **🔧 Highly Configurable**: Custom timeouts, status codes, and behaviors\n- **📊 Rich Reporting**: GitHub issues, artifacts, and detailed JSON output\n- **📚 Documentation Ready**: Perfect for Jupyter Book and documentation sites\n\n## Features\n\n- **Smart Link Validation**: Checks external web links in HTML files with configurable timeout and redirect handling\n- **Enhanced Robustness**: Intelligent detection of bot-blocked sites to reduce false positives\n- **AI-Powered Suggestions**: Provides intelligent recommendations for broken or redirected links\n- **Two Scanning Modes**: Full project scan or PR-specific changed files only  \n- **Configurable Status Codes**: Define which HTTP status codes to silently report (e.g., 403, 503)\n- **Redirect Detection**: Identifies and suggests updates for redirected links\n- **GitHub Integration**: Creates issues, PR comments, and workflow artifacts\n- **MyST Markdown Support**: Works with Jupyter Book projects by scanning HTML output\n- **Performance Optimized**: Respectful rate limiting, improved timeouts, and efficient scanning\n\n## Usage\n\n### Basic Usage\n\n```yaml\n- name: Check links in documentation\n  uses: QuantEcon/action-link-checker@v1\n```\n\n### Weekly Full Project Scan\n\n```yaml\nname: Weekly Link Check\non:\n  schedule:\n    - cron: '0 9 * * 1'  # Monday at 9 AM UTC\n  workflow_dispatch:\n\njobs:\n  link-check:\n    runs-on: ubuntu-latest\n    permissions:\n      contents: read\n      issues: write\n    steps:\n      - uses: actions/checkout@v4\n        with:\n          ref: gh-pages  # Check the published site\n      \n      - name: AI-powered link check\n        uses: QuantEcon/action-link-checker@v1\n        with:\n          html-path: '.'\n          mode: 'full'\n          fail-on-broken: 'false'\n          create-issue: 'true'\n          ai-suggestions: 'true'\n          silent-codes: '403,503'\n          issue-title: 'Weekly Link Check Report'\n          notify: 'maintainer1,maintainer2'\n```\n\n### PR-Triggered Changed Files Only\n\n```yaml\nname: PR Link Check\non:\n  pull_request:\n    branches: [ main ]\n\njobs:\n  link-check:\n    runs-on: ubuntu-latest\n    permissions:\n      contents: read\n      pull-requests: write\n    steps:\n      - uses: actions/checkout@v4\n      \n      - name: Build documentation\n        run: jupyter-book build .\n      \n      - name: Check links in changed files\n        uses: QuantEcon/action-link-checker@v1\n        with:\n          html-path: './_build/html'\n          mode: 'changed'\n          fail-on-broken: 'true'\n          ai-suggestions: 'true'\n          silent-codes: '403,503'\n```\n\n### Complete Advanced Usage\n\n```yaml\n- name: Comprehensive link checking\n  uses: QuantEcon/action-link-checker@v1\n  with:\n    html-path: './_build/html'\n    mode: 'full'\n    silent-codes: '403,503,429'\n    fail-on-broken: 'false'\n    ai-suggestions: 'true'\n    create-issue: 'true'\n    issue-title: 'Link Check Report - Broken Links Found'\n    create-artifact: 'true'\n    artifact-name: 'detailed-link-report'\n    notify: 'team-lead,docs-maintainer'\n    timeout: '30'\n    max-redirects: '5'\n```\n\n## False Positive Reduction\n\nThe action includes intelligent logic to reduce false positives for legitimate sites:\n\n### Bot Blocking Detection\n- **Major Sites**: Automatically detects common sites that block automated requests (Netflix, Amazon, Facebook, etc.)\n- **Encoding Issues**: Identifies encoding errors that often indicate bot protection\n- **Status Code Analysis**: Recognizes rate limiting (429) and bot blocking patterns\n- **Silent Reporting**: Marks likely bot-blocked sites as silent instead of broken\n\n### Improved Robustness\n- **Browser-like Headers**: Uses realistic browser headers to reduce blocking\n- **Increased Timeout**: Default 45-second timeout for slow-loading legitimate sites\n- **Smart Error Handling**: Distinguishes between genuine broken links and temporary blocks\n\n### AI Suggestion Filtering\n- **Constructive Suggestions**: Only suggests fixes, not removals, for legitimate domains\n- **Manual Review**: Suggests manual verification for unknown domains instead of automatic removal\n- **Domain Whitelist**: Recognizes trusted domains (GitHub, Python.org, etc.) and handles them appropriately\n\n## AI-Powered Suggestions\n\nThe action includes intelligent analysis that can suggest:\n\n### Automatic Fixes\n- **HTTPS Upgrades**: Detects `http://` links that should be `https://`\n- **GitHub Branch Updates**: Finds `/master/` links that should be `/main/`\n- **Documentation Migrations**: Suggests updated URLs for moved documentation sites\n- **Version Updates**: Recommends newer versions of deprecated documentation\n\n### Redirect Optimization\n- **Final Destination**: Suggests updating redirected links to their final destination\n- **Performance**: Eliminates unnecessary redirect chains\n- **Reliability**: Reduces dependency on redirect services\n\n### Example AI Suggestions Output:\n```\n🤖 http://docs.python.org/2.7/library/urllib.html\n   Issue: Broken link (Status: 404)\n   💡 version_update: https://docs.python.org/3/library/urllib.html\n      Reason: Python 2.7 is deprecated, consider Python 3 documentation\n\n🤖 http://github.com/user/repo/blob/master/README.md\n   Issue: Redirected 1 times\n   💡 redirect_update: https://github.com/user/repo/blob/main/README.md\n      Reason: GitHub default branch changed from master to main\n```\n\n## How It Works\n\n1. **File Discovery**: Scans HTML files in the specified directory\n2. **Link Extraction**: Uses BeautifulSoup to extract all external links\n3. **Link Validation**: Checks each link with configurable timeout and redirect handling\n4. **AI Analysis**: Applies rule-based AI to suggest improvements\n5. **Reporting**: Creates detailed reports with actionable suggestions\n\n### Scanning Modes\n\n#### Full Mode (`mode: 'full'`)\n- Scans all HTML files in the target directory\n- Ideal for scheduled weekly scans\n- Comprehensive coverage of entire project\n\n#### Changed Mode (`mode: 'changed'`)\n- Only scans HTML files that changed in the current PR\n- Efficient for PR-triggered workflows\n- Falls back to full scan if no changes detected\n\n## Configuration\n\n### Silent Status Codes\n\nConfigure which HTTP status codes should be reported without failing:\n\n```yaml\nsilent-codes: '403,503,429,502'\n```\n\nCommon codes to consider:\n- `403`: Forbidden (often due to bot detection)\n- `503`: Service Unavailable (temporary outages)\n- `429`: Too Many Requests (rate limiting)\n- `502`: Bad Gateway (temporary server issues)\n\n### Performance Tuning\n\n```yaml\ntimeout: '30'        # Timeout per link in seconds\nmax-redirects: '5'   # Maximum redirects to follow\n```\n\n## Integration Examples\n\n### Replacing Lychee\n\n**Before (using lychee):**\n```yaml\n- name: Link Checker\n  uses: lycheeverse/lychee-action@v2\n  with:\n    fail: false\n    args: --accept 403,503 *.html\n```\n\n**After (using AI-powered link checker):**\n```yaml\n- name: AI-Powered Link Checker\n  uses: QuantEcon/action-link-checker@v1\n  with:\n    html-path: '.'\n    fail-on-broken: 'false'\n    silent-codes: '403,503'\n    ai-suggestions: 'true'\n    create-issue: 'true'\n```\n\n### MyST Markdown Projects\n\nFor Jupyter Book projects:\n\n```yaml\n- name: Build Jupyter Book\n  run: jupyter-book build lectures/\n\n- name: Check links in built documentation\n  uses: QuantEcon/action-link-checker@v1\n  with:\n    html-path: './lectures/_build/html'\n    mode: 'full'\n    ai-suggestions: 'true'\n```\n\n## Outputs\n\nUse action outputs in subsequent workflow steps:\n\n```yaml\n- name: Check links\n  id: link-check\n  uses: QuantEcon/action-link-checker@v1\n  with:\n    fail-on-broken: 'false'\n\n- name: Report results\n  run: |\n    echo \"Broken links: ${{ steps.link-check.outputs.broken-link-count }}\"\n    echo \"Redirects: ${{ steps.link-check.outputs.redirect-count }}\"\n    echo \"AI suggestions available: ${{ steps.link-check.outputs.ai-suggestions != '' }}\"\n```\n\n## Permissions\n\nRequired workflow permissions depend on features used:\n\n```yaml\npermissions:\n  contents: read          # Always required\n  issues: write          # For create-issue: 'true'\n  pull-requests: write   # For PR comments\n  actions: read          # For create-artifact: 'true'\n```\n\n## Inputs\n\n| Input | Description | Required | Default |\n|-------|-------------|----------|---------|\n| `html-path` | Path to HTML files directory | No | `./_build/html` |\n| `mode` | Scan mode: `full` or `changed` | No | `full` |\n| `silent-codes` | HTTP codes to silently report | No | `403,503` |\n| `fail-on-broken` | Fail workflow on broken links | No | `true` |\n| `ai-suggestions` | Enable AI-powered suggestions | No | `true` |\n| `create-issue` | Create GitHub issue for broken links | No | `false` |\n| `issue-title` | Title for created issues | No | `Broken Links Found in Documentation` |\n| `create-artifact` | Create workflow artifact | No | `false` |\n| `artifact-name` | Name for workflow artifact | No | `link-check-report` |\n| `notify` | Users to assign to created issue | No | `` |\n| `timeout` | Timeout per link (seconds) | No | `45` |\n| `max-redirects` | Maximum redirects to follow | No | `5` |\n\n## Outputs\n\n| Output | Description |\n|--------|-------------|\n| `broken-links-found` | Whether broken links were found |\n| `broken-link-count` | Number of broken links |\n| `redirect-count` | Number of redirects found |\n| `link-details` | Detailed broken link information |\n| `ai-suggestions` | AI-powered improvement suggestions |\n| `issue-url` | URL of created GitHub issue |\n| `artifact-path` | Path to created artifact file |\n\n## Best Practices\n\n1. **Weekly Scans**: Use scheduled workflows for comprehensive link checking\n2. **PR Validation**: Use changed-file mode for efficient PR validation\n3. **Status Code Configuration**: Adjust silent codes based on your links' typical behavior\n4. **AI Suggestions**: Review and apply AI suggestions to improve link quality\n5. **Issue Management**: Use automatic issue creation for tracking broken links\n6. **Performance**: Set appropriate timeouts based on your link destinations\n\n## Troubleshooting\n\n### Common Issues\n\n1. **Timeout Errors**: Increase `timeout` value for slow-responding sites (default is now 45s)\n2. **False Positives**: The action automatically detects major sites that block bots (Netflix, Amazon, etc.)\n3. **Rate Limiting**: Add `429` to `silent-codes` for rate-limited sites\n4. **Bot Blocking**: Legitimate sites blocking automated requests are automatically handled gracefully\n5. **Large Repositories**: Use `changed` mode for PR workflows\n\n### False Positive Mitigation\n\nIf legitimate links are being flagged as broken:\n\n1. **Check if it's a major site**: Netflix, Amazon, Facebook, etc. are automatically detected as likely bot-blocked\n2. **Increase timeout**: Use `timeout: '60'` for slower sites like tutorials or educational content\n3. **Add to silent codes**: If a site consistently returns specific error codes, add them to `silent-codes`\n4. **Review AI suggestions**: The action provides constructive fix suggestions rather than suggesting removal\n\n### Debug Output\n\nThe action provides detailed logging including:\n- Number of files scanned\n- Links found per file\n- Status codes and errors\n- AI suggestion reasoning\n\n## Migration from Lychee\n\nThis action can directly replace `lychee` workflows with enhanced functionality:\n\n1. Replace `lycheeverse/lychee-action` with this action\n2. Update input parameters (see comparison above)  \n3. Add AI suggestions and issue creation features\n4. Configure silent status codes as needed\n\nThe enhanced AI capabilities provide value beyond basic link checking by suggesting improvements and maintaining link quality over time.","funding_links":["https://github.com/sponsors/numfocus","https://numfocus.org/donate-to-quantecon"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquantecon%2Faction-link-checker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fquantecon%2Faction-link-checker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquantecon%2Faction-link-checker/lists"}