{"id":19591961,"url":"https://github.com/catalyst/moodle-tool_crawler","last_synced_at":"2026-02-28T21:32:17.785Z","repository":{"id":10191462,"uuid":"62562387","full_name":"catalyst/moodle-tool_crawler","owner":"catalyst","description":"A moodle link crawling robot, find broken, slow and oversized links","archived":false,"fork":false,"pushed_at":"2026-01-28T14:25:20.000Z","size":672,"stargazers_count":11,"open_issues_count":50,"forks_count":16,"subscribers_count":27,"default_branch":"MOODLE_310_STABLE","last_synced_at":"2026-01-28T15:36:38.154Z","etag":null,"topics":["crawler","moodle","plugin-moodle"],"latest_commit_sha":null,"homepage":"https://moodle.org/plugins/tool_crawler","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/catalyst.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2016-07-04T12:58:58.000Z","updated_at":"2026-01-28T14:30:24.000Z","dependencies_parsed_at":"2023-01-13T15:47:37.719Z","dependency_job_id":"86e50151-f1ba-40a2-aa3b-0d28a19cc51e","html_url":"https://github.com/catalyst/moodle-tool_crawler","commit_stats":null,"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"purl":"pkg:github/catalyst/moodle-tool_crawler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/catalyst%2Fmoodle-tool_crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/catalyst%2Fmoodle-tool_crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/catalyst%2Fmoodle-tool_crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/catalyst%2Fmoodle-tool_crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/catalyst","download_url":"https://codeload.github.com/catalyst/moodle-tool_crawler/tar.gz/refs/heads/MOODLE_310_STABLE","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/catalyst%2Fmoodle-tool_crawler/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29952265,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-28T18:42:55.706Z","status":"ssl_error","status_checked_at":"2026-02-28T18:42:48.811Z","response_time":90,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","moodle","plugin-moodle"],"created_at":"2024-11-11T08:32:13.358Z","updated_at":"2026-02-28T21:32:17.751Z","avatar_url":"https://github.com/catalyst.png","language":"PHP","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![ci](https://github.com/catalyst/moodle-tool_crawler/actions/workflows/ci.yml/badge.svg?branch=MOODLE_310_STABLE)](https://github.com/catalyst/moodle-tool_crawler/actions/workflows/ci.yml?branch=MOODLE_310_STABLE)\n\n# moodle-tool_crawler\n\n* [What is this?](#what-is-this)\n* [How does it work?](#how-does-it-work)\n* [Branches](#branches)\n* [Installation](#installation)\n* [Configuration](#configuration)\n* [Testing](#testing)\n* [Debugging](#debugging)\n* [Reports](#reports)\n* [Support](#support)\n* [Warm thanks](#warm-thanks)\n\n# What is this?\n\nThis is a link checking robot, that crawls your Moodle site following links\nand reporting on links that are either broken or that link to very large\nfiles.\n\nhttps://moodle.org/plugins/tool_crawler\n\n# How does it work?\n\nIt is an admin tool plugin with a Moodle cron task. It logs into your Moodle\nvia curl effectively from outside Moodle. The cronjob scrapes each page,\nparses it and follows links. By using this architecture it will only find\nbroken links that actually matter to students.\n\nSince the plugin cronjob comes in from outside it needs to authenticate in Moodle.\n\n# Branches\n\n| Moodle version    | Branch                |\n| ----------------- | --------------------- |\n| Moodle 3.10+      | MOODLE_310_STABLE     |\n| Moodle 3.4 to 3.9 | master                |\n| Totara 12+        | master                |\n\n# Installation\n\nThe plugin has a dependency on the [moodle-auth_basic](https://moodle.org/plugins/auth_basic).\nTo install the dependency plugin as a git submodule:\n```\ngit submodule add git@github.com:catalyst/moodle-auth_basic.git auth/basic\n```\n\n\nInstall plugin moodle-tool_crawler as a git submodule:\n```\ngit submodule add git@github.com:catalyst/moodle-tool_crawler.git admin/tool/crawler\n```\n# Configuration\n\nWhen installing the plugins please keep in mind the official Moodle recommendations: [installing Moodle plugins](https://docs.moodle.org/32/en/Installing_add-ons)\n\n## Step 1\n\nLogin to Moodle after you have downloaded the plugin code with git. You will be\nforwarded to URL http://your_moodle_website.com/admin/index.php with Plugins check.\nThere you should see plugins \"Basic authentication\" and \"Link checker robot\".\n\nClick button \"Upgrade Moodle database now\" which should initiate plugins installation.\n\nNow you should see page \"Upgrading to new version\" with plugins installation\nstatuses and button \"Continue\".\n\n**Note! Plugin auth_basic is disabled by default after installation.\nYou will need to enable it manually from \n\n\nHome ► Site administration ► Plugins ► Authentication ► Manage authentication**\n\nAfter clicking \"Continue\" you will get to the page \"New settings - Link checker robot\".\nWhile you may leave other settings default, you might want to setup a custom bot username\nand make sure to change bot password.\n\n**It is recommended that bot user should be kept with readonly access to all\nthe site pages you wish to crawl. You can give the robot similar read\ncapabilities that real students have. Never give your bot user write capabilities.**\n\nIt can also be a good idea to give your robot some extra permissions, like visibility of hidden courses\nor activites so it can crawl content which is being developed and will be later delivered to students.\nIf you are worried about load and total crawl time then you can filter out whole courses, eg last years\narchives courses, see below for more details.\n\nAfter verifying all settings click \"Save changes\".\n\n## Step 2\n\nEnable auth_basic plugin (if you haven't done that earlier) from\n\nHome ► Site administration ► Plugins ► Authentication ► Manage authentication\n\nNow navigate to URL http://your_moodle_website.com/admin/tool/crawler/index.php\".\nIt will show some stats about the Link checker Robot.\n\nClick \"Auto create\" button against \"Bot user\". This actually creates the user\nwith the username and password you have configured previously on page\n\"New settings - Link checker robot\".\n\nOnce bot user is created \"Bot user\" line in status report should be showing \"Good\".\n\n## Disabling crawling of specific course categories\n\nThis is achieved by configuring proper security roles in Moodle and assigning\nthese roles to the robot user on desired categories.\n\nImport role \"Robot\" from admin/tool/crawler/roles/robot.xml on\n\nSite administration ► Users ► Permissions ► Define roles ► Add a new role\n\nAdd this role to the \"Link checker robot\" user on\n\n\nSite administration ► Users ► Permissions ► Assign system roles.\n\nImport role \"Robot nofollow\" from file \nadmin/tool/crawler/roles/robotnofollow.xml on \n\n\nSite administration ► Users ► Permissions ► Define roles ► Add a new role.\n\nTo disable crawling of, say \"Category ABC\", go to\n\n\nSite administration ► Courses ► Manage courses and categories ► Category ABC\n\nthen click on \"Assign roles\" in the left navigation menu.\nClick on role \"Robot nofollow\", click on user \"Link checker Robot\"\nunder \"Potential users\" and add him to \"Existing users\".\n\nThe above configuration applies role \"Robot\" on the whole Moodle site\nand lets crawler to access general content. And \"Role nofollow\" prohibits\ncrawler from accessing the specific category.\n\nIn the same way it is possible to restrict crawler from accessing other\nMoodle contexts such as courses, activities and blocks.\n\nThe same effect could be achieved even without role \"Robot nofollow\" by\nassigning role \"Robot\" on the contexts you want to be crawled. But\nusing the combination of two roles gives more flexibility.\n\n# Testing\n\n## Test basic authentication with curl\n\nExample in bash:\n\n```\ncurl -c /tmp/cookies -v -L --user moodlebot:moodlebot http://your_moodle_website.com/course/view.php?id=3\n```\n\nThis command should log you in with specified credentials via Basic HTTP Auth.\nIt will dump headers, requests and responses and among the output you should\nbe able to see the line \"You are logged in as \".\n\nOnce Basic HTTP auth works test running the robot task from the CLI:\n\n```\nphp admin/cli/scheduled_task.php --execute='\\tool_crawler\\task\\crawl_task'\nExecute scheduled task: Parallel crawling task (tool_crawler\\task\\crawl_task)\n... used 22 dbqueries\n... used 0.039698123931885 seconds\nScheduled task complete: Parallel crawling task (tool_crawler\\task\\crawl_task)\n```\n\nThis will create a batch of new adhoc crawl tasks in the mdl_task_adhoc table that\nwill run in parallel, depending on the crawl_task setting. \n\nYou can manually run the adhoc tasks from the CLI with:\n```\nphp admin/cli/adhoc_task.php --execute\nExecute adhoc task: tool_crawler\\task\\adhoc_crawl_task\n... used 5733 dbqueries\n... used 58.239180088043 seconds\nAdhoc task complete: tool_crawler\\task\\adhoc_crawl_task\n```\n\nIf this worked then it's a matter of sitting back and waiting for the\nrobot to do it's thing. It works incrementally spreading the load over many\ncron cycles, you can watch it's progress in\n\n/admin/tool/crawler/report.php?report=queued\n\nand\n\n/admin/tool/crawler/report.php?report=recent\n\n# Debugging\n\nYou can also run link crawler on given page by passing url. You might need to Reset Progress if its still running from Administration \u003e Reports \u003e Link crawler -\u003e Robot status\n\n```\nphp admin/tool/crawler/cli/crawl-as.php --url=http://localhost/\n```\n\n# Reports\n\n4 new admin reports are available for showing the current crawl status, broken\nlinks and URLs and slow links. They are available under:\n\nAdministration \u003e Reports \u003e Link checker\n\n# Support\n\nPlease raise any issues in GitHub:\n\nhttps://github.com/catalyst/moodle-tool_crawler/issues\n\nIf you need anything urgently and would like to sponsor it's implementation please\nemail me: [Brendan Heywood](mailto:brendan@catalyst-au.net).\n\n\n\nWarm thanks\n-----------\n\nThanks to Central Queensland University for sponsoring the initial creation of this plugin:\n\nhttps://www.cqu.edu.au/\n\nThis plugin was developed by Catalyst IT Australia:\n\nhttps://www.catalyst-au.net/\n\n\u003cimg alt=\"Catalyst IT\" src=\"https://cdn.rawgit.com/CatalystIT-AU/moodle-auth_saml2/master/pix/catalyst-logo.svg\" width=\"400\"\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcatalyst%2Fmoodle-tool_crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcatalyst%2Fmoodle-tool_crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcatalyst%2Fmoodle-tool_crawler/lists"}