{"id":19591945,"url":"https://github.com/catalyst/moodle-local_datacleaner","last_synced_at":"2025-07-25T18:33:21.667Z","repository":{"id":5997086,"uuid":"54431869","full_name":"catalyst/moodle-local_datacleaner","owner":"catalyst","description":"Reduce, filter, and anonymize moodle data for non-prod environments","archived":false,"fork":false,"pushed_at":"2025-07-18T03:14:38.000Z","size":3524,"stargazers_count":20,"open_issues_count":59,"forks_count":17,"subscribers_count":36,"default_branch":"MOODLE_311_STABLE","last_synced_at":"2025-07-18T07:17:58.705Z","etag":null,"topics":["anonymize","data-cleaning","datacleaner","moodle","php","plugin"],"latest_commit_sha":null,"homepage":"https://moodle.org/plugins/local_datacleaner","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/catalyst.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2016-03-22T00:07:25.000Z","updated_at":"2025-03-25T04:01:05.000Z","dependencies_parsed_at":"2023-02-19T12:15:52.653Z","dependency_job_id":"4f6a2385-7474-49f1-85cb-ab250d000858","html_url":"https://github.com/catalyst/moodle-local_datacleaner","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/catalyst/moodle-local_datacleaner","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/catalyst%2Fmoodle-local_datacleaner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/catalyst%2Fmoodle-local_datacleaner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/catalyst%2Fmoodle-local_datacleaner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/catalyst%2Fmoodle-local_datacleaner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/catalyst","download_url":"https://codeload.github.com/catalyst/moodle-local_datacleaner/tar.gz/refs/heads/MOODLE_311_STABLE","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/catalyst%2Fmoodle-local_datacleaner/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267046026,"owners_count":24026897,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-25T02:00:09.625Z","response_time":70,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anonymize","data-cleaning","datacleaner","moodle","php","plugin"],"created_at":"2024-11-11T08:32:08.782Z","updated_at":"2025-07-25T18:33:21.657Z","avatar_url":"https://github.com/catalyst.png","language":"PHP","funding_links":[],"categories":[],"sub_categories":[],"readme":"![Build Status](https://github.com/catalyst/moodle-local_datacleaner//actions/workflows/ci.yml/badge.svg?branch=MOODLE_310_STABLE)\n\n# DataCleaner Moodle Module\n\nMoodle DataCleaner is an anonymiser of your Moodle data.\n\n## Branches ##\n\nThe following maps the plugin version to use depending on your Moodle version.\n\n| Moodle verion         | Branch             |\n| --------------------- | -------------------|\n| Moodle 3.11 and above | MOODLE_311_STABLE  |\n| Moodle 3.10           | MOODLE_310_STABLE  |\n| Moodle up to 3.9      | master             |\n\nThe following maps the plugin version to use depending on your Totara version.\n\n| Totara verion     | Branch             |\n| ----------------- | -------------------|\n| Totara 13         | TOTARA_13          |\n\n## How it works\n\nStandard practice when hosting most applications, Moodle included, is to have\nvarious environments in a 'pipeline' leading to production at the end. eg a\ntypical flow might be `dev \u003e stage \u003e prod` but there could be as many as\nyou want for various reasons, like load testing, penetration testing etc.\n\nTo test properly it's often useful to have real production data in these other\nenvironments, but there are downsides:\n\n* Usually production can be quite massive, we don't need or want it all and\n  disk space can be a pain with multiple copies.\n* There may be sensitive data we don't want to expose to developers or\n  testers, eg personal data, grades, uploaded assignments etc\n* Moodle is integrated with 3rd party systems and we don't want test systems\n  interacting with real systems, eg sending emails, or touching assignments in\n  Turnitin etc, ie we want to remove any API keys and other related config\n\nSo we need a way to 'clean' the database after a refresh, to reduce the size of\nthe data, to remove anything sensitive, and to ensure it's not going to touch\nany other real system. This also needs to be configurable because every Moodle\ninstance has different needs and there is no one-size-fits all approach. This\ncould be configured outside Moodle in the deployments tools, but over time we\nhave found the most flexible and easiest approach is to have this configuration\ninside Moodle itself, so our clients can directly make these decisions, and not\nbe exposed to any of the complexity of our internal processes around continous\nintegration and deployment.\n\nPractically this means the cleaning configuration needs to be added into the\nproduction system (which initially sounds scary but isn't), then you refresh\nthe database to another environment where it can be washed. There are multiple\nlevels of safeguards in place to ensure this never gets run in production,\nwhich would of course be catastrophic:\n\n* It can only be run from the CLI. There is no GUI.\n* We store the hostname in the cleaning configuration data. If the hostname\n  matches production, DataCleaner will not run. If this data is missing then\n  it will not run.\n* Typically a refreshed database will be from a nightly snapshot and so the\n  data should be slightly stale. If a non admin user has logged in recently,\n  that's a sign this Moodle is being used, and the DataCleaner will not run.\n* If cron has run recently, DataCleaner will not run. This should only be run\n  on a data washing instance, cron should not be needed here.\n* It can only be run if and only if a 'local_datacleaner_allowexecution = true;'\n  has been added to config.php\n\n## Installation\n\nThe simplest method of installing the plugin is to choose \"Download ZIP\" on the\nright hand side of the Github page. Once you've done this, unzip the\nDataCleaner code and copy it to the local/datacleaner directory within your\nMoodle codebase. On most modern Linux systems, this can be accomplished with:\n\n```sh\nunzip ./mdl-local_datacleaner-master.zip\ncp -r ./mdl-local_datacleaner-master \u003cyour_moodle_directory\u003e/local/datacleaner\n```\n\nOnce you've copied the plugin, you can finish the installation process by\nlogging into your Moodle site as an administrator and visiting the\n\"notifications\" page:\n\n`\u003cyour.moodle.url\u003e/admin/index.php`\n\nYour site should prompt you to upgrade.\n\n## Configuration\n\nOnce the installation process is complete, you'll be prompted to fill in some\nconfiguration details. Note that you MUST visit the DataCleaner config page to\nsave the current wwwroot, or the cleaner will not run later in the other\nenvironments.\n\n```php\n$CFG-\u003elocal_datacleaner_allowexecution = true;\n```\n\nYou have to add the config item above to your config.php in each of the environments you\nwant the cleaner to run. DO NOT add that config setting to a Production environment!\n\nThere are multiple 'cleaners' which process different types of data in Moodle.\nEach one can be enabled individually and may have additional config settings.\n\nYou can find the DataCleaner configuration via the Moodle administration block:\n\n`Site Adminstration \u003e Plugins \u003e Local plugins \u003e Data cleaner`\n\n### Sub-plugin options\n\nEnable the sub-plugin options to clean the corresponding data area.\n\n#### Cleanup core:\n\nEnable this sub-plugin to clean core configuration settings.\n\n#### Remove config:\n\nEnable this sub-plugin to clean configuration settings. This has its own Settings page.\n\n#### Remove standard logs:\n\nEnable to truncate the standard log table.\n\n#### Remove users:\n\nThis will remove users who have not logged in for a specific number of days. This has its own Settings page.\n\n#### Remove courses:\n\nRemove courses older than a specific number of days and/or in specific categories. This has its own Settings page.\n\n#### Scramble user data:\n\nEnable this sub-plugin to anonymise user data. This has its own Settings page.\n\n#### Clean grades:\n\nEnable to delete grade history or replace with fake data. This has its own Settings page.\n\n#### Replace URLs:\n\nEnable to replace all occurrences of the production URL with another URL. This has its own Settings page.\n\n#### Cleanup sitedata:\n\nClean orphaned files or replace with a generic file for the specific file type.\n\n#### Cleanup email:\n\nWhen a suffix has been configured in the settings, this will append that value to all emails.\nThere is also a regular expression field that will ignore users when appending the suffix.\n\nAlso this will allow you to configure following Moodle settings:\n - noemailever\n - divertallemailsto\n - divertallemailsexcept\n\n#### Environment matrix:\n\n**Notice**: A soft dependency on local_envbar is required for populating the available environments that can be configured.\n\nThis facilitates searching values in the {config} and {config_plugins} tables to allow setting those values. Useful for scrubbing API keys to prevent them calling home on a development environment.\n\nA CLI script exists to run the Environment matrix cleaner as a standalone operation.\n\n```sh\nsudo -u apache /usr/bin/php /\u003cyour_moodle_directory\u003e/local/datacleaner/environment_matrix/cli/matrix_replace.php --run\n```\n\nAn additional CLI flag has been implemented. --reset.\n\nThis flag will purge all other saved environment configuration so that the new instance only has one set of environment data.\n\n## Running\n\nAfter installing and configuring DataCleaner, copy your database and optionally your site data to another Moodle instance.\n\nFrom here run the cli script. On most modern Linux systems, this can be accomplished with:\n\n```sh\nsudo -u apache /usr/bin/php /\u003cyour_moodle_directory\u003e/local/datacleaner/cli/clean.php --run\n```\n\nThere are protections in place which prevent accidental running on this on your production system - which would of course be catastrophic!\n\n### More options\n\nRun the cli script with --help for more options:\n\n```sh\nsudo -u apache /usr/bin/php /\u003cyour_moodle_directory\u003e/local/datacleaner/cli/clean.php --help\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcatalyst%2Fmoodle-local_datacleaner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcatalyst%2Fmoodle-local_datacleaner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcatalyst%2Fmoodle-local_datacleaner/lists"}