{"id":13477290,"url":"https://github.com/erikbern/git-of-theseus","last_synced_at":"2025-05-14T14:09:17.070Z","repository":{"id":39708511,"uuid":"68070932","full_name":"erikbern/git-of-theseus","owner":"erikbern","description":"Analyze how a Git repo grows over time","archived":false,"fork":false,"pushed_at":"2023-11-25T17:04:17.000Z","size":2496,"stargazers_count":2748,"open_issues_count":20,"forks_count":89,"subscribers_count":25,"default_branch":"master","last_synced_at":"2025-05-12T22:43:09.371Z","etag":null,"topics":["author-statistics","git","python","repository-management"],"latest_commit_sha":null,"homepage":"https://erikbern.com/2016/12/05/the-half-life-of-code.html","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/erikbern.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2016-09-13T03:29:40.000Z","updated_at":"2025-05-12T16:20:20.000Z","dependencies_parsed_at":"2023-11-25T18:45:34.353Z","dependency_job_id":null,"html_url":"https://github.com/erikbern/git-of-theseus","commit_stats":{"total_commits":133,"total_committers":17,"mean_commits":7.823529411764706,"dds":0.5413533834586466,"last_synced_commit":"1d77f082a9b25fb3a0c541641722cd4836135362"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erikbern%2Fgit-of-theseus","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erikbern%2Fgit-of-theseus/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erikbern%2Fgit-of-theseus/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erikbern%2Fgit-of-theseus/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/erikbern","download_url":"https://codeload.github.com/erikbern/git-of-theseus/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254160229,"owners_count":22024567,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["author-statistics","git","python","repository-management"],"created_at":"2024-07-31T16:01:40.663Z","updated_at":"2025-05-14T14:09:12.060Z","avatar_url":"https://github.com/erikbern.png","language":"Python","funding_links":[],"categories":["Python","Git Tools","Git and Version Control Systems"],"sub_categories":["Repository management tools"],"readme":"[![pypi badge](https://img.shields.io/pypi/v/git-of-theseus.svg?style=flat)](https://pypi.python.org/pypi/git-of-theseus)\n\nSome scripts to analyze Git repos. Produces cool looking graphs like this (running it on [git](https://github.com/git/git) itself):\n\n![git](https://raw.githubusercontent.com/erikbern/git-of-theseus/master/pics/git-git.png)\n\nInstalling\n----------\n\nRun `pip install git-of-theseus`\n\nRunning\n-------\n\nFirst, you need to run `git-of-theseus-analyze \u003cpath to repo\u003e` (see `git-of-theseus-analyze --help` for a bunch of config). This will analyze a repository and might take quite some time.\n\nAfter that, you can generate plots! Some examples:\n\n1. Run `git-of-theseus-stack-plot cohorts.json` will create a stack plot showing the total amount of code broken down into cohorts (what year the code was added)\n1. Run `git-of-theseus-line-plot authors.json --normalize` will show a plot of the % of code contributed by the top 20 authors\n1. Run `git-of-theseus-survival-plot survival.json`\n\nYou can run `--help` to see various options.\n\nIf you want to plot multiple repositories, have to run `git-of-theseus-analyze` separately for each project and store the data in separate directories using the `--outdir` flag. Then you can run `git-of-theseus-survival-plot \u003cfoo/survival.json\u003e \u003cbar/survival.json\u003e` (optionally with the `--exp-fit` flag to fit an exponential decay)\n\nHelp\n----\n\n`AttributeError: Unknown property labels` – upgrade matplotlib if you are seeing this. `pip install matplotlib --upgrade`\n  \nSome pics\n---------\n\nSurvival of a line of code in a set of interesting repos:\n\n![git](https://raw.githubusercontent.com/erikbern/git-of-theseus/master/pics/git-projects-survival.png)\n\nThis curve is produced by the `git-of-theseus-survival-plot` script and shows the *percentage of lines in a commit that are still present after x years*. It aggregates it over all commits, no matter what point in time they were made. So for *x=0* it includes all commits, whereas for *x\u003e0* not all commits are counted (because we would have to look into the future for some of them). The survival curves are estimated using [Kaplan-Meier](https://en.wikipedia.org/wiki/Kaplan%E2%80%93Meier_estimator).\n\nYou can also add an exponential fit:\n\n![git](https://raw.githubusercontent.com/erikbern/git-of-theseus/master/pics/git-projects-survival-exp-fit.png)\n\nLinux – stack plot:\n\n![git](https://raw.githubusercontent.com/erikbern/git-of-theseus/master/pics/git-linux.png)\n\nThis curve is produced by the `git-of-theseus-stack-plot` script and shows the total number of lines in a repo broken down into cohorts by the year the code was added.\n\nNode – stack plot:\n\n![git](https://raw.githubusercontent.com/erikbern/git-of-theseus/master/pics/git-node.png)\n\nRails – stack plot:\n\n![git](https://raw.githubusercontent.com/erikbern/git-of-theseus/master/pics/git-rails.png)\n\nTensorflow – stack plot:\n\n![git](https://raw.githubusercontent.com/erikbern/git-of-theseus/master/pics/git-tensorflow.png)\n\nRust – stack plot:\n\n![git](https://raw.githubusercontent.com/erikbern/git-of-theseus/master/pics/git-rust.png)\n\nPlotting other stuff\n--------------------\n\n`git-of-theseus-analyze` will write `exts.json`, `cohorts.json` and `authors.json`. You can run `git-of-theseus-stack-plot authors.json` to plot author statistics as well, or `git-of-theseus-stack-plot exts.json` to plot file extension statistics. For author statistics, you might want to create a [.mailmap](https://git-scm.com/docs/gitmailmap) file in the root directory of the repository to deduplicate authors. If you need to create a .mailmap file the following command can list the distinct author-email combinations in a repository:\n\nMac / Linux\n\n```shell\ngit log --pretty=format:\"%an %ae\" | sort | uniq\n```\n\nWindows Powershell\n\n```powershell\ngit log --pretty=format:\"%an %ae\" | Sort-Object | Select-Object -Unique\n```\n\nFor instance, here's the author statistics for [Kubernetes](https://github.com/kubernetes/kubernetes):\n\n![git](https://raw.githubusercontent.com/erikbern/git-of-theseus/master/pics/git-kubernetes-authors.png)\n\nYou can also normalize it to 100%. Here's author statistics for Git:\n\n![git](https://raw.githubusercontent.com/erikbern/git-of-theseus/master/pics/git-git-authors-normalized.png)\n\nOther stuff\n-----------\n\n[Markovtsev Vadim](https://twitter.com/tmarkhor) implemented a very similar analysis that claims to be 20%-6x faster than Git of Theseus. It's named [Hercules](https://github.com/src-d/hercules) and there's a great [blog post](https://web.archive.org/web/20180918135417/https://blog.sourced.tech/post/hercules.v4/) about all the complexity going into the analysis of Git history.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ferikbern%2Fgit-of-theseus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ferikbern%2Fgit-of-theseus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ferikbern%2Fgit-of-theseus/lists"}