{"id":13753491,"url":"https://github.com/webfactory/zauberlehrling","last_synced_at":"2025-04-13T09:07:06.109Z","repository":{"id":15423186,"uuid":"78115147","full_name":"webfactory/zauberlehrling","owner":"webfactory","description":"Collection of tools and ideas for splitting up big monolithic PHP applications in smaller parts.","archived":false,"fork":false,"pushed_at":"2024-09-10T09:02:07.000Z","size":428,"stargazers_count":35,"open_issues_count":0,"forks_count":4,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-04-13T09:07:05.055Z","etag":null,"topics":["assets","composer","database","extraction","files","microservice","monolith","mysql","packages","php","tables"],"latest_commit_sha":null,"homepage":null,"language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/webfactory.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2017-01-05T13:28:40.000Z","updated_at":"2024-09-10T09:02:38.000Z","dependencies_parsed_at":"2024-08-03T09:15:52.386Z","dependency_job_id":"0885ea9c-77ab-4df8-8b18-2f3c8cf4a674","html_url":"https://github.com/webfactory/zauberlehrling","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/webfactory%2Fzauberlehrling","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/webfactory%2Fzauberlehrling/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/webfactory%2Fzauberlehrling/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/webfactory%2Fzauberlehrling/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/webfactory","download_url":"https://codeload.github.com/webfactory/zauberlehrling/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248688569,"owners_count":21145766,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["assets","composer","database","extraction","files","microservice","monolith","mysql","packages","php","tables"],"created_at":"2024-08-03T09:01:23.220Z","updated_at":"2025-04-13T09:07:06.087Z","avatar_url":"https://github.com/webfactory.png","language":"PHP","funding_links":[],"categories":["php"],"sub_categories":[],"readme":"zauberlehrling\n==============\n\nA collection of tools and ideas for splitting up a big monolithic PHP application in smaller parts, i.e. smaller\napplications and microservices. It contains console commands for identifying potentially unused PHP files, Composer\npackages, MySQL tables and public web assets.\n\nThe name \"zauberlehrling\" derives from [the famous poem by Johann Wolfgang von Goethe](https://en.wikipedia.org/wiki/The_Sorcerer%27s_Apprentice)\n(you may have also seen the iconic cartoon \"Fantasia\" by Walt Disney). In these tales, a sorcerer's apprentice splits up\na magical, out of control broom with an axe. Unfortunately for him, each piece has a life of it's own and only\nmultiplies the problem.\n\n\nInstallation\n------------\n\n    git clone https://github.com/webfactory/zauberlehrling.git\n    cd zauberlehrling\n    composer install\n\nWhen asked for the database parameters, provide the information for your local database of the monolith. If your\nmonolith has no database or you don't want any help with it, stay with the default parameters.\n\n\nSplitting up the monolith\n-------------------------\n\n### Your local development environment\n\nAt this point, you probably know your monolith way to well. You've fixed devious bugs and if you're brave/ruthless\nenough, you might even have added a feature. So I guess you've set up your local development environment already.\n\nJust a tip: During the split, you might wish to do several dumps of your production database. Consider\n[slimdump](https://github.com/webfactory/slimdump) for storing configurations. These configurations are really handy,\nas they can be shared among your coworkers and provide neat features. E.g. you can ignore more and more tables that\nemerged to be irrelevant for your extracted application; you can also ignore BLOB columns or dump only rows matching\ncertain conditions for speeding up the dump process. And you can easily anonymize personalized data to protect your\ncustomers.\n\n\n### Greenfield or Brownfield?\n\nThe answer to this question seems to depend mostly on the amount of code you want to reuse. If you know you want to\nreplace e.g. an old integrated messaging system with a shiny new microservice (i.e. a partial rewrite of the monolith),\nyou'll probably be fine with a greenfield project with your best and latest technology.\n\nBut if you just want to split up the monolith and you're afraid of hidden dependencies, or if you want to keep down your\neffort and rewrite only what's necessary: my guess is you'll be better off with a brownfield project. Clone the\nmonolith's repository to keep it's history of commit messages. I find it there is often much knowledge in these messages\nand linked ticket systems. Sometimes they're the only chance to get an understanding for the reasoning of a particular\ncrazy piece of code.\n\nThen, get rid of everything you don't need in your extracted application. The following chapters may help.\n\nAlso, my advice is to keep a separate local working copy of the monolith. Sooner or later you'll probably encounter an\nerror you cannot pinpoint to one of your refactorings, or you notice you've deleted too much and you cannot restore it\neasily from your VCS. In this cases, you'll be happy to have a quick look into the working monolith. \n\n\n### Determine used PHP files\n\nTo determine the used PHP files, I suggest writing black box tests for each use case of your application and collect the\ncode coverage information during their execution.\n\nFor the black box tests, e.g. you could write [behat](http://behat.org/) tests for\n \n* requesting the homepage\n* log in of a user\n* send a search form and retrieve results\n* create, edit an delete an entity\n* request a page without proper permissions\n* ...\n\nDepending on your project you may want to assert different things. In my experience, the following assertions were often\nhelpful: \n\n* correct URL (i.e. the user is not being redirected e.g. due to authorization problems)\n* HTTP status code being 200 (i.e. the user got no fancy error page)\n* text content like \"x was saved in the database\" (to detect failures after form submission) - may seem brittle for a\n  test, but I don't expect that message to be changed while you're extracting your microservice.  \n\nNow for the code coverage part. Most frameworks provide a frontcontroller, e.g. for Symfony it's\n```web/symfony-webapp.php```. If you have xdebug installed, you can write at the beginning of such a frontcontroller:\n\n```php\nxdebug_start_code_coverage();\n```\n\nand at it's end something like this:\n\n```php\n$filePointer = fopen($outputFile, 'ab');\nfwrite($filePointer, implode(PHP_EOL, array_keys(xdebug_get_code_coverage())));\nfclose($filePointer);\n```\n\nNow, when you execute your behat tests, all executed PHP files will be written to ```$outputFile```. I don't recommend\nexecuting your unit tests now, as these tests could cover code never used in production.  \n\nYou're in no way restricted to xdebug for collecting your coverage. E.g., you could also do some Aspect Oriented Programming (AOP) magic, just remember it may have more advanced requirements than your monolith runtime environment can fulfill. Another idea is utilizing [sysdig](https://www.sysdig.org/) or some other form of file system monitoring.\n\nFile system monitoring tools can be tricky to use:\n\n- You have to make sure the file system access you wish to log are done in reality. If the file system access is cached away by some shady component in your environment, you won't get all used files (false negatives).\n- Some tools like to index all files - e.g. for a desktop search or your IDE for static code analysis. You have to stop them from opening all files during your logging session or you will get too many results (false positives).\n\nBut if you manage to set up everything fine, file system monitoring tools have one big advantage: they're not restricted to logging executed PHP files, but can report accessed files of all sorts, e.g. configuration files. That improves the detection of (un)used packages.\n\nFor sysdig, you might want to try:\n\n    sudo sysdig -p \"%fd.name\" evt.type=open |grep \"/your/project/\" |grep -v \"/your/project/tmp/\" |grep -v \"/your/project/log/\" \u003e used-files.txt\n\n\nYou can consolidate this file (removing duplicates and sort the file names list) with\n\n    bin/console consolidate-used-files usedFiles\n\nwhere the ```usedFiles``` argument is the path to the file containing the list of used files. It will be overwritten\nwith it's consolidated version.\n\n\n### Unused PHP files\n\n    bin/console show-unused-php-files [--pathToInspect=...] [--pathToOutput=...] [--pathToBlacklist=...] usedFiles\n\nWith this argument:\n\n* ```usedFiles```: Path to a file containing the list of used files (see [Determine used PHP files](#determine-used-php-files))\n\nand these options:\n\n* ```-p```, ```--pathToInspect```: Path to the directory to search for PHP files. If not set, it will be determined as\n  the common parent path of the used files.\n* ```-o```, ```--pathToOutput```: Path to the output file. If not set, it will be \"potentially-unused-files.txt\" next to\n  the file named in the usedFiles argument.\n* ```-b```, ```--pathToBlacklist```: Path to a file containing a blacklist of regular expressions to exclude from the\n  output. The blacklist may grow over time. At first, you might want to exclude temp directories and libraries. But as\n  you inspect the list of potentially unused files, you may notice some file definitely needed by your application,\n  although the usage is not detected by your tests. You can persist such insights in this blacklist.\n   \n  The file should contain one regular expression per line, e.g.:\n \n      #/var/www/my-project/features/.*# \n      #/var/www/my-project/tmp/.*# \n      #/var/www/my-project/vendor/.*# \n      #/var/www/my-project/file-only-used-in-production-environment.php# \n\n\n### Unused Composer packages\n\n    bin/console show-unused-composer-packages [--vendorDir=...] composerJson usedFiles\n\nWith these arguments:\n\n* ```composerJson```: path to the composer.json of the project to analyze \n* ```usedFiles```: path to a file containing the list of used files (see [Determine used PHP files](#determine-used-php-files))\n\nAnd these options:\n\n* ```-l```, ```vendorDir```: path to the vendor directory of the project to analyze. Defaults to the directory of the composer.json + '/vendor'.\n* ```-b```, ```--pathToBlacklist``` Path to a file containing a blacklist of regular expressions to exclude from the output (see [Unused PHP files](#unused-php-files) for details).\n\n\n### Unused Public Assets\n\n    bin/console show-unused-public-assets [--regExpToFindFile=...] [--pathToOutput=...] [--pathToBlacklist=...] pathToPublic pathToLogFile\n\nWith these arguments:\n\n* ```pathToPublic```: Path to the public web root of your project.\n* ```pathToLogFile```: Path to the web server's access log file.\n\nAnd these options:\n\n* ```-r```, ```--regExpToFindFile``` Regular expression for the log file capturing the path of the accessed file as it's first capture group. Defaults to ```#\"(?:get|post) ([a-z0-9\\_\\-\\.\\/]*)#i```.\n* ```-o```, ```--pathToOutput``` Path to the output file. If not set, it will be \"potentially-unused-public-assets.txt\" in the folder above the public web root.\n* ```-b```, ```--pathToBlacklist``` Path to a file containing a blacklist of regular expressions to exclude from the output (see [Unused PHP files](#unused-php-files) for details).\n\n\n### Unused MySQL Tables\n\nSo, you've cloned your code base, and you have probably copied your database as well. How do you find the unused tables? \n\nThe idea is analogous to the code coverage. First, enable logging in MySQL and possibly delete old log date, e.g. with\n\n```mysql\nSET global general_log = 1;\nSET global log_output = 'table';\nTRUNCATE mysql.general_log;\n```\n\nThen execute your tests for all use cases of your application. Afterwards, you can disable MySQL logging with\n\n```mysql\nSET global general_log = 0;\n```\n\nFinally, call the following console command:\n\n    bin/console show-unused-mysql-tables\n\n\nCredits, Copyright and License\n------------------------------\n\nThis bundle was started at webfactory GmbH, Bonn.\n\n- \u003chttp://www.webfactory.de\u003e\n- \u003chttp://twitter.com/webfactory\u003e\n\nCopyright 2016-2017 webfactory GmbH, Bonn. Code released under [the MIT license](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwebfactory%2Fzauberlehrling","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwebfactory%2Fzauberlehrling","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwebfactory%2Fzauberlehrling/lists"}