{"id":13414207,"url":"https://github.com/adamtornhill/code-maat","last_synced_at":"2025-05-13T19:18:04.350Z","repository":{"id":9956381,"uuid":"11977311","full_name":"adamtornhill/code-maat","owner":"adamtornhill","description":"A command line tool to mine and analyze data from version-control systems","archived":false,"fork":false,"pushed_at":"2024-11-18T10:05:49.000Z","size":1413,"stargazers_count":2454,"open_issues_count":12,"forks_count":227,"subscribers_count":79,"default_branch":"master","last_synced_at":"2025-04-28T10:52:55.621Z","etag":null,"topics":["behavioral-code-analysis","code-analysis-tool","repository-mining","technical-debt"],"latest_commit_sha":null,"homepage":"http://www.adamtornhill.com/code/codemaat.htm","language":"Clojure","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/adamtornhill.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2013-08-08T13:48:26.000Z","updated_at":"2025-04-24T10:24:58.000Z","dependencies_parsed_at":"2024-12-10T09:05:31.681Z","dependency_job_id":"45abb93a-d5e1-4fa9-a847-8358ce48338b","html_url":"https://github.com/adamtornhill/code-maat","commit_stats":{"total_commits":327,"total_committers":18,"mean_commits":"18.166666666666668","dds":0.1651376146788991,"last_synced_commit":"48297319c595fb749b0f1951461735d54413cbf4"},"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adamtornhill%2Fcode-maat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adamtornhill%2Fcode-maat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adamtornhill%2Fcode-maat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adamtornhill%2Fcode-maat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/adamtornhill","download_url":"https://codeload.github.com/adamtornhill/code-maat/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254010830,"owners_count":21999004,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["behavioral-code-analysis","code-analysis-tool","repository-mining","technical-debt"],"created_at":"2024-07-30T21:00:16.481Z","updated_at":"2025-05-13T19:18:04.269Z","avatar_url":"https://github.com/adamtornhill.png","language":"Clojure","funding_links":[],"categories":["Clojure","Version Control"],"sub_categories":["TypeScript / JavaScript"],"readme":"[![Build Status](https://travis-ci.org/adamtornhill/code-maat.png)](https://travis-ci.org/adamtornhill/code-maat)\n\n# Code Maat\n\nCode Maat is a command line tool used to mine and analyze data from version-control systems (VCS).\n\nCode Maat was developed to accompany the discussions in my books [Your Code as a Crime Scene](https://pragprog.com/titles/atcrime/your-code-as-a-crime-scene) and\n[Software Design X-Rays](https://pragprog.com/titles/atevol/software-design-x-rays).\n\nNote that the analyses have evolved into [CodeScene](https://codescene.io/), which automates all the analyses found in Code Maat and several new ones.\n\n## Code Maat Maintainability\n\n[![CodeScene Code Health](https://codescene.io/projects/87/status-badges/code-health)](https://codescene.io/projects/6410/jobs/latest-successful/results)\n\n[![CodeScene System Mastery](https://codescene.io/projects/87/status-badges/system-mastery)](https://codescene.io/projects/6410/jobs/latest-successful/results)\n\n## The ideas behind Code Maat\n\nTo understand large-scale software systems we need to look at their evolution. The history of our system provides us with data we cannot derive from a single snapshot of the source code. Instead VCS data blends technical, social and organizational information along a temporal axis that let us map out our interaction patterns in the code. Analyzing these patterns gives us early warnings on potential design issues and development bottlenecks, as well as suggesting new modularities based on actual interactions with the code. Addressing these issues saves costs, simplifies maintenance and let us evolve our systems in the direction of how we actually work with the code.\n\n### About the name\n\nMaat was a goddess in ancient Egyptian myth. She was the one who gave us order out of the initial chaos in the universe. Code Maat hopes to continue the work of Maat, albeit on a smaller basis, by highlighting code with chaotic development practices and suggest the directions of future refactorings to bring order to it. Further, maat was used in ancient Egypt as a concept of truth. And metrics never lie (except when they do).\n\n## License\n\nCopyright © 2013 Adam Tornhill\n\nDistributed under the [GNU General Public License v3.0](http://www.gnu.org/licenses/gpl.html).\n\n## Usage\n\nCode Maat is written in Clojure. To build it from source, use [leiningen](https://github.com/technomancy/leiningen):\n\n\t   lein uberjar\n\nThe command above will create a standalone `jar` containing all the dependencies.\n\nWe also publish a pre-built executable JAR file as part of the [latest release](https://github.com/adamtornhill/code-maat/releases/tag/v1.0.2)\nthat you can download and run directly.\n\nYou can also build code-maat as a Docker image:\n\n      docker build -t code-maat-app .\n\nIf this fails on your Apple Silicon change \"clojure:alpine\" to \"clojure:latest\" in the first line of the Dockerfile.\n\nFinally, if you want to use Code Maat as a library, then add the following line to your leiningen dependencies:\n\n\t   [code-maat \"1.0.1\"]\n\n### Generating input data\n\nCode Maat operates on log files from version-control systems. The supported version-control systems are `git`, Mercurial (`hg`), `svn`, Perforce (`p4`), and Team Foundation Server (`tfs`). The log files are generated by using the version-control systems themselves as described in the following sections.\n\n#### Preparations\n\nTo analyze our VCS data we need to define a temporal period of interest. Over time, many design issues do get fixed and we don't want old data to interfere with our current analysis of the code. To limit the data Code Maat will consider, use one of the following flags depending on your version-control system:\n+ *git:* Use the `--after=\u003cdate\u003e` to specify the last date of interest. The `\u003cdate\u003e` is given as `YYYY-MM-DD`.\n+ *hg:* Use the `--date` switch to specify the last date of interest. The value is given as `\"\u003eYYYY-MM-DD\"`.\n+ *svn:* Use the `-r` option to specify a range of interest, for example `-r {20130820}:HEAD`.\n+ *p4:* Use the `-m` option to specify the last specified number of changelists, for example `-m 1000`.\n+ *tfs:* Use the `/stopafter` option to specify the number of changesets, for example `/stopafter:1000`\n\n#### Generate a Subversion log file using the following command:\n\n          svn log -v --xml \u003e logfile.log -r {YYYYmmDD}:HEAD\n\n#### Generate a git log file using the following command:\n\nThe first options is the legacy format used in Your Code As A Crime Scene. Use the `-c git` parse option when [Running Code Maat](#running-code-maat).\n\n          git log --pretty=format:'[%h] %aN %ad %s' --date=short --numstat --after=YYYY-MM-DD \u003e logfile.log\n\nThere's a second supported Git format as well. It's more tolerant and faster to parse, so please prefer it over the plain `git` format described above. Use the `-c git2` parse option when [Running Code Maat](#running-code-maat).\n\n          git log --all --numstat --date=short --pretty=format:'--%h--%ad--%aN' --no-renames --after=YYYY-MM-DD \u003e logfile.log\n\nMany codebases include third-party content or non-code artefacts,  which might generate noise in the analyses.\nYou can exclude such content via git's pathspecs that limit paths on the command line.\nFor example, let's say you want to exclude everything in a `vendor/ folder`. You would then append the following pattern to the `git log` commands above:\n\n           -- . \":(exclude)vendor/*\"\n\nTo exclude multiple folders, you just append more pathspecs:\n\n           -- . \":(exclude)vendor/\" \":(exclude)test/\"\n\n#### Generate a Mercurial log file using the following command:\n\n          hg log --template \"rev: {rev} author: {author} date: {date|shortdate} files:\\n{files %'{file}\\n'}\\n\" --date \"\u003eYYYY-MM-DD\"\n\n#### Generate a Perforce log file using the following command:\n\n          p4 changes -s submitted -m 5000 //depot/project/... | cut -d ' ' -f 2 | xargs -I commitid -n1 sh -c 'p4 describe -s commitid | grep -v \"^\\s*$\" \u0026\u0026 echo \"\"'\n\n#### Generate a TFS log file using the following command from a Developer command-prompt:\n###### Note:  The TFS CLI tool does not support custom date formatting.  The parser currently only supports the en-us default:  Friday, January 1, 2016 1:12:35 PM - you may need to adjust your system locale settings before using the following command.\n\n          tf hist /path/to/workspace /noprompt /format:detailed /recursive\n\n### Running Code Maat\n\nYou can run Code Maat directly from leiningen:\n\n    \t  lein run -l logfile.log -c \u003cvcs\u003e\n\nIf you've downloaded or built the standalone jar, then run it via java. Please note that you need at least Java 8 installed:\n\n     \t  java -jar code-maat-1.0.4-standalone.jar -l logfile.log -c \u003cvcs\u003e\n\nIf you've built a docker container, then you can run it as\n\n        docker run -v /home/xx/src/code-maat:/data -it code-maat-app -l /data/logfile.log -c \u003cvcs\u003e\n\nwhere the /home/xx/src/code-maat is the host's directory containing the file logfile.log.\n\nWhen invoked with `-h`, Code Maat prints its usage:\n\n             adam$ java -jar code-maat-1.0.4-standalone.jar\n             This is Code Maat, a program used to collect statistics from a VCS.\n             Version: 1.0-SNAPSHOT\n\n             Usage: program-name -l log-file [options]\n\n             Options:\n               -l, --log LOG                                         Log file with input data\n               -c, --version-control VCS                             Input vcs module type: supports svn, git, git2, hg, p4, or tfs\n               -a, --analysis ANALYSIS                      authors  The analysis to run (abs-churn, age, author-churn, authors, communication, coupling, entity-churn, entity-effort, entity-ownership, fragmentation, identity, main-dev, main-dev-by-revs, messages, refactoring-main-dev, revisions, soc, summary)\n              --input-encoding INPUT-ENCODING                        Specify an encoding other than UTF-8 for the log file\n               -r, --rows ROWS                                       Max rows in output\n               -g, --group GROUP                                     A file with a pre-defined set of layers. The data will be aggregated according to the group of layers.\n               -n, --min-revs MIN-REVS                      5        Minimum number of revisions to include an entity in the analysis\n               -m, --min-shared-revs MIN-SHARED-REVS        5        Minimum number of shared revisions to include an entity in the analysis\n               -i, --min-coupling MIN-COUPLING              30       Minimum degree of coupling (in percentage) to consider\n               -x, --max-coupling MAX-COUPLING              100      Maximum degree of coupling (in percentage) to consider\n               -s, --max-changeset-size MAX-CHANGESET-SIZE  30       Maximum number of modules in a change set if it shall be included in a coupling analysis\n               -e, --expression-to-match MATCH-EXPRESSION            A regex to match against commit messages. Used with -messages analyses\n               -t, --temporal-period TEMPORAL-PERIOD                 Instructs Code Maat to consider all commits during the same day as a single, logical commit\n               -d, --age-time-now AGE-TIME_NOW                       Specify a date as YYYY-MM-dd that counts as time zero when doing a code age analysis\n               -h, --help\n\n### Optional: specify an encoding\n\nBy default, Code Maat expects your log files to be UTF-8. If you use another encoding, override the default with `--input-encoding`, for example `--input-encoding UTF-16BE`.\n\n#### Generating a summary\n\nWhen starting out, I find it useful to get an overview of the mined data. With the `summary` analysis, Code Maat produces such an overview:\n\n   \t   java -jar code-maat-1.0.4-standalone.jar -l logfile.log -c git -a summary\n\nThe resulting output is on csv format:\n\n              statistic,                 value\n              number-of-commits,           919\n              number-of-entities,          730\n              number-of-entities-changed, 3397\n              number-of-authors,            79\n\nIf you use the second Git format, just specify `git2` instead:\n\n   \t   java -jar code-maat-1.0.4-standalone.jar -l logfile2.log -c git2 -a summary\n\n#### Mining organizational metrics\n\nBy default, Code Maat runs an analysis on the number of authors per module. The authors analysis is based on the idea that the more developers working on a module, the larger the communication challenges. The analysis is invoked with the following command:\n\n   \t   java -jar code-maat-1.0.4-standalone.jar -l logfile.log -c git\n\nThe resulting output is on CSV format:\n\n              entity,         n-authors, n-revs\n              InfoUtils.java, 12,        60\n              BarChart.java,   7,        30\n              Page.java,       4,        27\n              ...\n\nIn example above, the first column gives us the name of module, the second the total number of distinct authors that have made commits on that module, and the third column gives us the total number of revisions of the module. Taken together, these metrics serve as predictors of defects and quality issues.\n\n#### Mining logical coupling\n\nLogical coupling refers to modules that tend to change together. Modules that are logically coupled have a hidden, implicit dependency between them such that a change to one of them leads to a predictable change in the coupled module. To analyze the logical coupling in a system, invoke Code Maat with the following arguments:\n\n              java -jar code-maat-1.0.4-standalone.jar -l logfile.log -c git -a coupling\n\nThe resulting output is on CSV format:\n\n              entity,          coupled,        degree,  average-revs\n              InfoUtils.java,  Page.java,      78,      44\n              InfoUtils.java,  BarChart.java,  62,      45\n              ...\n\nIn the example above, the first column (`entity`) gives us the name of the module, the second (`coupled`) gives us the name of a logically\ncoupled module, the third column (`degree`) gives us the coupling as a percentage (0-100), and finally `average-revs` gives us the average number of revisions\nof the two modules.\n\nTo interpret the data, consider the `InfoUtils.java` module in the example output above.\nThe coupling tells us that each time it's modified, it's a 78% risk/chance that we'll have to change our `Page.java` module too.\nSince there's probably no reason they should change together, the analysis points to a part of the code worth investigating as a potential target for a future refactoring.\n\n*Advanced*: the coupling analysis also supports `--verbose-results`. In verbose mode, the coupling analysis also includes the number of revisions for each coupled entity together\nwith the number of shared revisions. The main use cases for this option are a) build custom filters to reduce noise, or b) research studies.\n\n### Calculate code age\n\nThe change frequency of code is a factor that should (but rarely do) drive the evolution of a software architecture. In general, you want to stabilize as much code as possible. A failure to stabilize means that you need to maintain a working knowledge of those parts of the code for the life-time of the system.\n\nOne way to measure the stability of a software architecture is by a code age analysis:\n\n              java -jar code-maat-1.0.4-standalone.jar -l logfile.log -c git -a age\n\nThe `age` analysis grades each module based on the date of last change. The measurement unit is age in months. Here's how the result may look:\n\n              entity,age-months\n              src/code_maat/app/app.clj,2\n              project.clj,4\n              src/code_maat/parsers/perforce.clj,5\n              ...\n\nBy default, Code Maat uses the current date as starting point for a code age analysis. You specify a different start time with the command line argument `--age-time-now`.\n\nBy using the techniques from [Your Code as a Crime Scene](https://pragprog.com/book/atcrime/your-code-as-a-crime-scene) we visualize the system with each module marked-up by its age (the more `red`, the more recent changes to the code):\n\n![code age visualized](doc/imgs/code_age_sample.png).\n\n### Visualise your Results\n\nCode Maat doesn't include any visualizations itself. However, you do have a bunch of options.\n\nOne option is [CodeScene](http://www.empear.com/#empear-products) which is [free for open source](https://codescene.io/) and delivers all these analyses as a service. CodeScene is also available in an [on-premise version](https://codescene.com/pricing). CodeScene is a complete application with automated repository mining, visualizations, and is built around the ideas prototyped in Code Maat.\n\nI also present a whole suite of different visualization techniques and options in [Your Code as a Crime Scene](https://pragprog.com/book/atcrime/your-code-as-a-crime-scene), so do check out the book if you want to dive deeper. You can also look at some of the basic tools I've open sourced such as [Metrics Tree Map](https://github.com/adamtornhill/MetricsTreeMap):\n\n![coupling visualized](doc/imgs/tree_map_sample.png).\n\nYet another alternative is to save the generated CSV to a file and import it into a spreadsheet program such as OpenOffice or Excel. That allows us to generate charts such as the ones below:\n\n![coupling visualized](doc/imgs/coupling_sample.png).\n\n## Code churn measures\n\nCode churn is related to post-release defects. Modules with higher churn tend to have more defects. There are several different aspects of code churn. I intend to support several of them in Code Maat.\n\n### Absolute churn\n\nThe absolute code churn numbers are calculated with the `-a abs-churn` option. Note that the option is only available for `git`. The analysis will output a CSV table with the churn accumulated per date:\n\n             date,       added, deleted\n             2013-08-09,   259,      20\n             2013-08-19,   146,      77\n             2013-08-21,     5,       6\n             2013-08-20,   773,     121\n             2013-08-30,   349,     185\n             ...\n\nVisualizing the result allows us to spot general trends over time:\n\n![abs churn visualized](doc/imgs/abs_churn_sample.png).\n\n### Churn by author\n\nThe idea behind this analysis is to get an idea of the overall contributions by each individual. The analysis is invoked with the `-a author-churn` option. The result will be given as CSV:\n\n             author,        added, deleted\n             Adam Tornhill, 13826,    1670\n             Some One Else,   123,      80\n             Mr Petersen,       3,       3\n             ...\n\nAnd, of course, you wouldn't use this data for any performance evaluation; it wouldn't serve well (in case anything should be rewarded it would be a net deletion of code - there's too much of it in the world).\n\n### Churn by entity\n\nThe pre-release churn of a module is a good predictor of its number of post-release defects. Such an analysis is supported in Code Maat by the `-a entity-churn` option.\n\nNote: Some research suggests that relative churn measures are better, while others don't find any significant differences. The metrics calculated by Code Maat are absolute for now because it's easier to calculate. I'm likely to include support for relative churn too.\n\n## Ownership patterns\n\nOnce we have mined the organizational metrics described above, we may find we have multiple developers working on the same modules. How is their effort distributed? Does a particular module have a major developer or is everyone contributing a small piece? Let's find out by running the `-a entity-ownership` analysis. This analysis gives us the following output:\n\n             entity,               author,  added, deleted\n             analysis/authors.clj,    apt,    164,      98\n             analysis/authors.clj,    qew,     81,      10\n             analysis/authors.clj,     jt,     42,      32\n             analysis/entities.clj,   apt,     72,      24\n             ...\n\nAnother ownership view is to consider the effort spent by individual authors on the different entities in the system. This analysis is run by the `-a entity-effort` option. The analysis gives us the following table:\n\n             entity,                author, author-revs, total-revs\n             analysis/authors.clj,     apt,           5,         10\n             analysis/authors.clj,     qew,           3,         10\n             analysis/authors.clj,      jt,           1,         10\n             analysis/authors.clj,     apt,           1,         10\n             ...\n\nThis information may be a useful guide to find the right author to discuss functionality and potential refactorings with. Just note that the ownership metrics are sensitive to the same biases as the churn metrics; they're both heuristics and no absolute truths.\n\n## Temporal periods\n\nSometimes we'd like to find patterns that manifests themselves over multiple commits. Code Maat provides the `--temporal-period` switch that let you consider all commits within a day as a logical change. Just provide the switch and add a digit - in the future that digit may even mean something; Right now the aggregation is limited to commits within a single day.\n\n## Architectural level analyses\n\nUsing the `-g` flag lets you specify a mapping from individual files to logical components. This feature makes it possible to\nscale the analyses to an architectural level and get hotspots, knowledge metrics, etc. on the level of sub-systems.\n\nThere are some sample mapping files in the `end_to_end` test folder, for\nexample [this one](https://github.com/adamtornhill/code-maat/blob/ebd2b757ae31510b5cf52d0e11fafa82a7e062d1/test/code_maat/end_to_end/regex-and-text-layers-definition.txt)\n\nThe format is `regex_pattern =\u003e logical_group_name`:\n\n```\nsrc/Features/Core      =\u003e Core\n^src\\/.*\\/.*Tests\\.cs$ =\u003e CS Tests\n```\n\nCode Maat takes everything that matches a regex and analyses it as a\nholistic whole by aggregating all file contributions for the matches.\n\n### Intermediate results\n\nCode Maat supports an `identity` analysis. By using this switch, Code Maat will output the intermediate parse result of the raw VCS file. This can be useful either as a debug aid or as input to other tools.\n\n### JVM options\n\nCode Maat uses the Incanter library. By default, Incanter will create an `awt frame`. You can suppress the frame by providing the following option to your `java` command: `-Djava.awt.headless=true`.\nCode Maat is quite memory hungry, particularly when working with larger change sets. Thus, I recommend specifying a larger heap size than the `JVM` defaults: `-Xmx4g`.\nNote that when running Code Maat through [leiningen](https://github.com/technomancy/leiningen), those options are already configured in the `project.clj` file.\n\n## Limitations\n\nThe current version of Code Maat processes all its content in memory. Thus, it may not scale to large input files (however, it depends a lot on the combination of parser and analysis). The recommendation is to limit the input by specifying a sensible start date (as discussed initially, you want to do that anyway to avoid confounds in the analysis).\n\n## Future directions\n\nIn future versions of Code Maat I plan to add more analysis methods such as code churn and developer patterns.\nI also plan on direct visualization support and a database backed analysis to allow processing of larger log files. Further, I plan to add a worked example. That example will be a case study of some well-known open source code. Until then, I hope you find Code Maat useful in its initial shape.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadamtornhill%2Fcode-maat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fadamtornhill%2Fcode-maat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadamtornhill%2Fcode-maat/lists"}