{"id":13843588,"url":"https://github.com/dreamyguy/gitlogg","last_synced_at":"2025-04-15T11:48:02.670Z","repository":{"id":7293727,"uuid":"8609527","full_name":"dreamyguy/gitlogg","owner":"dreamyguy","description":"💾 🧮 🤯 Parse the 'git log' of multiple repos to 'JSON'","archived":false,"fork":false,"pushed_at":"2023-07-07T15:03:40.000Z","size":17633,"stargazers_count":132,"open_issues_count":7,"forks_count":27,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-04-06T22:47:07.030Z","etag":null,"topics":["data-mining","git","git-log","json","json-parser","multiple-repositories","repository-mining","repository-utilities","statistics"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dreamyguy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2013-03-06T17:54:27.000Z","updated_at":"2025-01-14T21:18:48.000Z","dependencies_parsed_at":"2024-02-08T21:11:28.418Z","dependency_job_id":null,"html_url":"https://github.com/dreamyguy/gitlogg","commit_stats":{"total_commits":92,"total_committers":3,"mean_commits":"30.666666666666668","dds":0.04347826086956519,"last_synced_commit":"77419e4d7f8ed2efec485b7271900024e0c49d69"},"previous_names":[],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dreamyguy%2Fgitlogg","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dreamyguy%2Fgitlogg/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dreamyguy%2Fgitlogg/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dreamyguy%2Fgitlogg/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dreamyguy","download_url":"https://codeload.github.com/dreamyguy/gitlogg/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249066054,"owners_count":21207392,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-mining","git","git-log","json","json-parser","multiple-repositories","repository-mining","repository-utilities","statistics"],"created_at":"2024-08-04T17:02:14.804Z","updated_at":"2025-04-15T11:48:02.635Z","avatar_url":"https://github.com/dreamyguy.png","language":"JavaScript","readme":"![Gitlogg](https://raw.githubusercontent.com/dreamyguy/gitlogg/master/docs/gitlogg-icon-github.png \"Parse the 'git log' of one or several 'git' repositories into a sanitised and distributable 'JSON' file\")\n\n\u003e _Parse the 'git log' of one or several 'git' repositories into a sanitised and distributable 'JSON' file._\n\n[![MIT Licence](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/dreamyguy/gitlogg/blob/master/LICENSE) [![Data served by Gitlogg API](https://img.shields.io/badge/data_can_be_served_by-gitlogg--api-89336e.svg)](https://github.com/dreamyguy/gitlogg-api) [![Data served by Gitlogg API](https://img.shields.io/badge/data_can_be_rendered_by-gitinsight-89336e.svg)](https://github.com/dreamyguy/gitinsight)\n\n## Why?\n\n`git log` is a wonderful tool. However its output can be not only surprisingly inconsistent, but also long, difficult to scan and to distribute.\n\n**Gitlogg** sanitises the `git log` and outputs it to `JSON`, a format that can easily be consumed by other applications. As long as the repositories being scanned are kept up to date, **Gitlogg** will return fresh data every time it runs.\n\n#### **Gitlogg** addresses the following challenges:\n\n* `git log` can only be used on a repository at a time.\n* `git log` can't be easily consumed by other applications in its original format.\n* `git log` doesn't return **impact**, which is the cumulative change brought by a single commit. Very interesting graphs can be built with that data, as shown on [sidhree.com][1].\n* Fields that allow user input, like `subject`, need to be sanitised to be consumed.\n* File changes shown under `--stat` or `--shortstat` are currently not available as placeholders under `--pretty=format:\u003cstring\u003e`, and it is cumbersome to get commit logs to output neatly in single lines - with stats.\n* It is hard to retrieve commits made on a specific but generic moment, like \"11pm\"; at the \"27th minute\" of an hour; on a \"Sunday\"; on \"March\"; on \"GMT -5\"; on the \"53rd second of a minute\".\n* Some commits don't have stats, and that can cause the structure of the output to break, making it harder to distribute it.\n\n#### Script execution feedback\n\n**Gitlogg** is not a very complex application, but I still made an effort to provide some feedback on what is happening under the hood. Below are some screenshots of dialogs one can expect to see while executing it:\n\n![Error 001](https://raw.githubusercontent.com/dreamyguy/gitlogg/master/docs/error-001.png \"'Error 001' message as on release v0.1.3\")\n\u003e **Øh nøes!** The path to the folder containing all repositories *does not exist!*\n\n![Error 002](https://raw.githubusercontent.com/dreamyguy/gitlogg/master/docs/error-002.png \"'Error 002' message as on release v0.1.3\")\n\u003e **Øh nøes!** The path to the folder containing all repositories *exists, but is empty!*\n\n![Success!](https://raw.githubusercontent.com/dreamyguy/gitlogg/master/docs/success.png \"Success messages as on release v0.1.6\")\n\u003e **Success!** `JSON` parsed, based on **9** different repositories with a total of **25,537** commits.\n\nNote that I've included two huge repos _(*react* \u0026 *react-native*, that have 7,813 \u0026 10,065 commits respectively at the time of this writting)_ for the sake of demonstration. The resulting parsed `JSON` file has 715,040 lines. All that done in less than 25 seconds.\n\n_I have successfully compiled **`470`** repositories at once_ (all repos under the organization I work for). Then I got these specs:\n\n* `gitlogg.tmp` generated in `154s` (`~2.57mins`)\n* `JSON` output parsed in `2792ms`\n* `JSON` file size: `121,5MB`\n* Commits processed: `118,117`\n* Parsed `JSON` file, lines: `3,307,280`\n\n## Getting started\n\n**Gitlogg** requires [NodeJS][2] and [BabelJS][3].\n\n1. Install `NodeJS` (visit [their page][2] to find the right install for your system).\n2. Run `npm run setup`. That will:\n\n* Install `BabelJS` globally by running `npm install babel-cli -g`.\n* Install all the local dependencies, through `npm install`.\n* Create the directory in which all repos to be parsed to `JSON` will be at (only on **Simple Mode**).\n* Create the directories expected by the scripts that output files.\n\n## The `JSON` output\n\nThe output will look like this (first commit for **Font Awesome**):\n\n    [\n      {\n        \"repository\": \"Font-Awesome\",\n        \"commit_nr\": 1,\n        \"commit_hash\": \"7ed221e28df1745a20009329033ac690ef000575\",\n        \"author_name\": \"Dave Gandy\",\n        \"author_email\": \"dave@davegandy.com\",\n        \"author_date\": \"Fri Feb 17 09:27:26 2012 -0500\",\n        \"author_date_relative\": \"4 years, 3 months ago\",\n        \"author_date_unix_timestamp\": \"1329488846\",\n        \"author_date_iso_8601\": \"2012-02-17 09:27:26 -0500\",\n        \"subject\": \"first commit\",\n        \"subject_sanitized\": \"first-commit\",\n        \"stats\": \" 1 file changed, 0 insertions(+), 0 deletions(-)\",\n        \"time_hour\": 9,\n        \"time_minutes\": 27,\n        \"time_seconds\": 26,\n        \"time_gmt\": \"-0500\",\n        \"date_day_week\": \"Fri\",\n        \"date_month_day\": 17,\n        \"date_month_name\": \"Feb\",\n        \"date_month_number\": 2,\n        \"date_year\": \"2012\",\n        \"date_iso_8601\": \"2012-02-17\",\n        \"files_changed\": 1,\n        \"insertions\": 0,\n        \"deletions\": 0,\n        \"impact\": 0\n      },\n      {\n        (...)\n      },\n      {\n        (...)\n      }\n    ]\n\nNote that many `git log` fields were not printed here, but that's only because I've commented out some of them in the **gitlogg-parse-json.js** script. All the fields below are available. Fields marked with a `*` are either non-standard or not available as placeholders on `--pretty=format:\u003cstring\u003e`:\n\n    * repository\n    * commit_nr\n      commit_hash\n      commit_hash_abbreviated\n      tree_hash\n      tree_hash_abbreviated\n      parent_hashes\n      parent_hashes_abbreviated\n      author_name\n      author_name_mailmap\n      author_email\n      author_email_mailmap\n      author_date\n      author_date_RFC2822\n      author_date_relative\n      author_date_unix_timestamp\n      author_date_iso_8601\n      author_date_iso_8601_strict\n      committer_name\n      committer_name_mailmap\n      committer_email\n      committer_email_mailmap\n      committer_date\n      committer_date_RFC2822\n      committer_date_relative\n      committer_date_unix_timestamp\n      committer_date_iso_8601\n      committer_date_iso_8601_strict\n      ref_names\n      ref_names_no_wrapping\n      encoding\n      subject\n      subject_sanitized\n      commit_notes\n    * stats\n    * time_hour\n    * time_minutes\n    * time_seconds\n    * time_gmt\n    * date_day_week\n    * date_month_day\n    * date_month_name\n    * date_month_number\n    * date_year\n    * date_iso_8601\n    * files_changed\n    * insertions\n    * deletions\n    * impact\n\n## Creating the `JSON` file\n\nThere are two modes and they are basically the same, except that the **Simple Mode** doesn't require configuration. The **Advanced Mode** requires one to set the absolute path to the directory containing all the repositories you'd like to parse to a single `JSON` file.\n\n#### Simple Mode\n\nTo simplify the generation process to a point that no configuration is required, follow this directory structure:\n\n    gitlogg/          \u003c== This repository's root\n    ├── scripts/\n    │   ├── colors.sh\n    │   ├── gitlogg-generate-log.sh\n    │   ├── gitlogg-parse-json.js\n    │   └── gitlogg.sh\n    └── _repos/       \u003c== Copy/place/keep your repositories under the folder \"_repos/\"\n        ├── repo1\n        ├── repo2\n        ├── repo3\n        └── repo4\n\n1. Copy the all the repositories you wish to parse to `JSON` to the `_repos/` folder, as shown above.\n\n2. Granted that you are within the `gitlogg` folder (this repo's root), run:\n\n        $ npm run gitlogg\n\n#### Advanced Mode\n\nTo generate the `JSON` file based on repositories in any other location, you'll have to define the path to the folder that contains all your repositories.\n\n1. Open [`gitlogg-generate-log.sh`](https://github.com/dreamyguy/gitlogg/blob/master/scripts/gitlogg-generate-log.sh#L4) with an editor of your choice and edit the `yourpath` variable:\n\n        # define the absolute path to the directory that contains all your repositories\n        yourpath=/absolute/system/path/to/directory/that/contains/all/your/repositories/\n\n_**Tip:** drag the folder that contain your repositories to a terminal window, and you'll get the absolute system path to that folder._\n\n2. Granted that you are within the `gitlogg` folder (this repo's root), run:\n\n        $ npm run gitlogg\n\n#### Parallel Processing\n\nThe parallel processing that was released on [v0.1.8](https://github.com/dreamyguy/gitlogg/tree/v0.1.8) had problems with `xargs` and was temporarily removed. The issue is being dealt with through [pull-request #16](https://github.com/dreamyguy/gitlogg/pull/16).\n\n## The parsed `JSON` file\n\n\u003e Two files will be generated when running `npm run gitlogg`: **`_tmp/gitlogg.tmp`** and **`_output/gitlogg.json`**.\n\n    gitlogg/                \u003c== This repository's root\n    ├── scripts/\n    │   ├── colors.sh\n    │   ├── gitlogg-generate-log.sh\n    │   ├── gitlogg-parse-json.js\n    │   └── gitlogg.sh\n    ├── _output/\n    │   └── gitlogg.json    \u003c== The parsed 'JSON', what we're all after. It's parsed from 'gitlogg.tmp'\n    └── _tmp/\n        └── gitlogg.tmp     \u003c== The processed 'git log'\n\nTwo files were necessary because of the nature of the script, that loops through all subdirectories and outputs the `git log` for all valid `git` repositories. Once that loop is done, a valid `JSON` file (`gitlogg.json`) is generated out of `gitlogg.tmp`.\n\n`gitlogg.tmp` is just a temporary file from which `gitlogg.json` bases itself on. In case the parsing fails `gitlogg.tmp` can come in handy for debugging.\n\n## Further Notes\n\n#### Debugging\n\nI've created error messages with suggested solutions, to help you get past the most common issues.\n\nHowever, `git log`'s output can break while it's being processed. That's most certainly caused by fields that allow user input, like _commit messages_. These fields may contain characters (like `\\r`) that crash with those reserved for the generation of `gitlogg.tmp`, namely `\\n`.\n\nEfforts have been made to mitigate errors by sanitizing characters that have caused errors before, but it might still happen in some edge cases. If it does happen, have a look at the generated `gitlogg.tmp` and see if the expected structure (which is obvious) breaks. Once you have identified the line, have a closer look at the commit and look for an unusual character.\n\nPost an issue with a link to a _gist_ containing your broken `gitlogg.tmp` and I will try to reproduce the error.\n\n#### Documentation\n\nDocumentation is done either by:\n\n* Commit messages,\n* Commit comments,\n* Code comments,\n* `README.md` files, like this one.\n\nSome of the initial commits were done deliberately to show what one gets with short commands like `$ git log`. From that initial state commits keep on introducing simplicity or complexity to the code, depending on the work flow. That in itself is a form of documentation. In other words, if you're really that interested in details, there are plenty to be had in the code itself and in its own progressive enhancement.\n\n#### License\n\n[MIT](LICENSE)\n\n#### Disclaimer\n\nThis project is by no means the smartest way to parse a `git log` to `JSON`, nor does it aim at becoming so. It is simply a _learn-by-doing_ project in which I experiment with commands available on OSX's Terminal and whatever else I find along the way.\n\n**Gitlogg** was built and tested on OSX. Though an effort has been done to make it cross-platform, there could be errors on other systems.\n\nIt's certainly not harmful to your repositories and it won't change any data in it. Having said that, it's served _raw_ and _'as is'_. You may get support, but don't expect it nor take it for granted.\n\n#### Known Issues\n\nThere are _no known issues_ at this point. The parallelization that was introduced on [v0.1.8](https://github.com/dreamyguy/gitlogg/tree/v0.1.8) had issues with `xargs`, so its introduction was temporarily reverted until the problem has been dealt with through [pull-request #16](https://github.com/dreamyguy/gitlogg/pull/16). [v0.1.9](https://github.com/dreamyguy/gitlogg/tree/v0.1.9) was released to revert those changes.\n\nThe [javascript](https://github.com/dreamyguy/gitlogg/tree/javascript) branch is a very fine piece of programming; you should definitely check it out. I haven't tested it extensively, but found a few issues, which are reported in the [issue tracker](https://github.com/dreamyguy/gitlogg/issues).\n\nThe current version [v0.2.1](https://github.com/dreamyguy/gitlogg/tree/v0.2.1) is still quite stable after all these years, with no known issues. Try it! :sparkles:\n\n#### Release History\n\n* 2018-07-12   [v0.2.1](https://github.com/dreamyguy/gitlogg/tree/v0.2.1) - [View Changes](https://github.com/dreamyguy/gitlogg/compare/v0.2.0...v0.2.1)\n  * Use `ȝ` instead of `\\0` when replacing `\\n` during the extraction of `git log`. `\\0` is not as reliable as it seemed.\n    * The main idea here is to use a character that occurs as seldom as possible - preferably never in `git` context.\n    * `ȝ` (Yogh) is an old English character. If that gives problems, I'll try `ƿ` (Wynn), another abandoned English char.\n* 2018-07-11   [v0.2.0](https://github.com/dreamyguy/gitlogg/tree/v0.2.0) - [View Changes](https://github.com/dreamyguy/gitlogg/compare/v0.1.9...v0.2.0)\n  * Improve console output readability\n  * Simplify `JSON` format.\n    * Reduce filesize of output `JSON`, in some scenarios quite dramatically\n    * Make it importable into `MongoDB`, which is what is being used on **gitlogg-api**\n  * Use `\\0` instead of `ò` when replacing `\\n` during the extraction of `git log`.\n    * The main idea here is to use a character that occurs as seldom as possible - preferably never in `git` context.\n* 2016-12-15   [v0.1.9](https://github.com/dreamyguy/gitlogg/tree/v0.1.9) - [View Changes](https://github.com/dreamyguy/gitlogg/compare/v0.1.8...v0.1.9)\n  * Remove parallelization of processes until the problem with `xargs` has been dealt with.\n* 2016-12-14   [v0.1.8](https://github.com/dreamyguy/gitlogg/tree/v0.1.8) - [View Changes](https://github.com/dreamyguy/gitlogg/compare/v0.1.7...v0.1.8)\n  * Parse `JSON` through a read/write stream, so we get around the 268MB `Node`'s buffer limitation.\n    * This limited the whole operation to a number between 173,500 and 174,000 commits.\n  * Parallelize the generation of `git log` for multiple repos, optionally passing number of processes as a CLI argument.\n  * Mitigate encoding problems caused by `ISO-8859-1` characters not being properly encoded to `UTF-8`.\n* 2016-11-21   [v0.1.7](https://github.com/dreamyguy/gitlogg/tree/v0.1.7) - [View Changes](https://github.com/dreamyguy/gitlogg/compare/v0.1.6...v0.1.7)\n  * Better readability for 'Release History'\n  * Correct url to logo, so it also renders outside Github\n  * Rename sub-folder 'gitlogg' to 'scripts' to avoid confusion\n  * Simplify initial setup and running of 'gitlogg'\n  * Set vars instead of hardcoding values\n  * Separate scripts from output files\n  * Introduce 'Debugging' as a 'Further Notes' item\n  * Tip on how to get the absolute system path to a directory\n  * Introduce 'View Changes' links under 'Release History'\n* 2016-11-19   [v0.1.6](https://github.com/dreamyguy/gitlogg/tree/v0.1.6) - [View Changes](https://github.com/dreamyguy/gitlogg/compare/v0.1.5...v0.1.6)\n  * Introduce `commit_nr`, a commit count within each repo\n  * Show how many repos are about to be processed on console\n  * Show what repo is being processed on console\n  * Replace carriage return with space\n* 2016-06-12   [v0.1.5](https://github.com/dreamyguy/gitlogg/tree/v0.1.5) - [View Changes](https://github.com/dreamyguy/gitlogg/compare/v0.1.4...v0.1.5)\n  * Introduce logo\n  * Correct wrong reference to 'yourpath'\n  * Output numbers instead of strings\n* 2016-05-23   [v0.1.4](https://github.com/dreamyguy/gitlogg/tree/v0.1.4) - [View Changes](https://github.com/dreamyguy/gitlogg/compare/v0.1.3...v0.1.4)\n  * Fix a bug that would break the output in some rare cases\n* 2016-05-21   [v0.1.3](https://github.com/dreamyguy/gitlogg/tree/v0.1.3) - [View Changes](https://github.com/dreamyguy/gitlogg/compare/v0.1.2...v0.1.3)\n  * Even better error handling\n* 2016-05-21   [v0.1.2](https://github.com/dreamyguy/gitlogg/tree/v0.1.2) - [View Changes](https://github.com/dreamyguy/gitlogg/compare/v0.1.1...v0.1.2)\n  * Better error handling\n* 2016-05-21   [v0.1.1](https://github.com/dreamyguy/gitlogg/tree/v0.1.1) - [View Changes](https://github.com/dreamyguy/gitlogg/compare/v0.1.0...v0.1.1)\n  * The 'gitlogg' release, the node-based JSON generation\n* 2016-05-20   [v0.1.0](https://github.com/dreamyguy/gitlogg/tree/v0.1.0)\n  * The 'git-log-to-json' release, now considered legacy\n\n-------------\n\n\u003e _Brought to you by [Wallace Sidhrée][1]._\n\n  [1]: http://sidhree.com/ \"Wallace Sidhrée\"\n  [2]: https://nodejs.org/en/ \"NodeJS\"\n  [3]: https://babeljs.io/ \"BabelJS\"\n","funding_links":[],"categories":["JavaScript"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdreamyguy%2Fgitlogg","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdreamyguy%2Fgitlogg","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdreamyguy%2Fgitlogg/lists"}