{"id":18078450,"url":"https://github.com/techgaun/git-internals","last_synced_at":"2026-03-05T16:51:21.186Z","repository":{"id":40778068,"uuid":"193165849","full_name":"techgaun/git-internals","owner":"techgaun","description":"An overview of git internals","archived":false,"fork":false,"pushed_at":"2019-06-23T03:24:30.000Z","size":28,"stargazers_count":27,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-10-07T01:37:17.521Z","etag":null,"topics":["git","git-internals","hacktoberfest","paylease","porcelain-commands"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/techgaun.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-06-21T22:14:36.000Z","updated_at":"2024-10-28T01:10:32.000Z","dependencies_parsed_at":"2022-08-29T12:30:24.197Z","dependency_job_id":null,"html_url":"https://github.com/techgaun/git-internals","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/techgaun/git-internals","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/techgaun%2Fgit-internals","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/techgaun%2Fgit-internals/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/techgaun%2Fgit-internals/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/techgaun%2Fgit-internals/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/techgaun","download_url":"https://codeload.github.com/techgaun/git-internals/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/techgaun%2Fgit-internals/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278708074,"owners_count":26031932,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-06T02:00:05.630Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["git","git-internals","hacktoberfest","paylease","porcelain-commands"],"created_at":"2024-10-31T12:14:07.910Z","updated_at":"2025-10-07T01:37:19.264Z","avatar_url":"https://github.com/techgaun.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# git-internals\n\n\u003e An overview of git internals\n\nThis repo consists of the talk given at PayLease's Show and Tell on 06/21/2019.\n\nGit has a content-addressable filesystem as the layer which acts as KV store in a way.\nYou give some content to git and git gives you a 40 character sha1 hash. You can then\nuse the sha1 hash in the future to talk with git about that content.\n\n## [Slides](slides.md) - Use with `mdp slides.md`\n\n## Walkthrough\n\n### Git aliases/Shell aliases/Global gitignore\n\n- [My gitconfig](https://github.com/techgaun/dotfiles/blob/79cad9d116bdff6d05a16806668df72bd50af3c0/.gitconfig#L11-L43)\n- [My global gitignore](https://github.com/techgaun/dotfiles/blob/79cad9d116bdff6d05a16806668df72bd50af3c0/.gitignore)\n- [My git shell\nalias](https://github.com/techgaun/dotfiles/blob/79cad9d116bdff6d05a16806668df72bd50af3c0/.bash_aliases#L94-L97) with\n[autocompletion](https://github.com/techgaun/dotfiles/blob/79cad9d116bdff6d05a16806668df72bd50af3c0/.bashrc.defaults#L14-L20)\n\n### .git directory\n\nWe create a git repository first and look at the initial tree structure of .git.\nGit repo is a directory with .git sub-directory with relevant metadata.\n\n```shell\n$ git init git_demo\nInitialized empty Git repository in /tmp/git_demo/.git/\n\n$ cd git_demo\n\n$ tree .git\n.git\n├── branches\n├── config\n├── description\n├── HEAD\n├── hooks\n│   ├── applypatch-msg.sample\n│   ├── commit-msg.sample\n│   ├── fsmonitor-watchman.sample\n│   ├── post-update.sample\n│   ├── pre-applypatch.sample\n│   ├── pre-commit.sample\n│   ├── prepare-commit-msg.sample\n│   ├── pre-push.sample\n│   ├── pre-rebase.sample\n│   ├── pre-receive.sample\n│   └── update.sample\n├── info\n│   └── exclude\n├── objects\n│   ├── info\n│   └── pack\n└── refs\n    ├── heads\n    └── tags\n\n9 directories, 15 files\n```\n\n- `.git/config` holds local git configuration that applies to the repo we are in.\n- `.git/description` holds description that is shown by gitweb.\n- `.git/HEAD` holds pointer/reference to what branch/tag/commit id we are at.\n- `.git/hooks` holds sample hooks initially and you can create your own.\n- `.git/info/exclude` holds repo level gitignore that doesn't go in repo's .gitignore.\n- `.git/objects` holds all kind of objects git stores.\n- `.git/refs` holds all kind of references git makes use of (branch/tag/stash, etc.).\n- `.git/logs` doesn't exist initially but gets created as you travel through your git repo. It holds all the logs that\nshow up on `git reflog` subcommand.\n- `.git/index` doesn't exist initially but holds information about the staging area.\n\n## [Git hooks](https://githooks.com/)\n\n- scripts that executes before or after certain events such as: `commit`, `push`, `receive`, etc.\n- `pre-commit` - usage could be something like running lint or unit tests on files changed. Exits without making commit\nif the `pre-commit` hook returns non-zero exit code.\n- `post-receive` - usage could be for pushing code to the production.\n- to enable hooks, overwrite or create one of the scripts in `.git/hooks` and make it executable.\n\n## Git plumbing vs porcelain commands\n\n- Most of the commands we use on our day to day interaction with git are porcelain commands that are much simpler to\nuse. Think of them as the frontend for git with simplified interface.\n- There are another sets of commands that are lower level and can be composed together to form the porcelain commands.\nThese commands are called plumbing commands.\n- As we explore further, we will look at some of the plumbing commands here in a bit with example.\n- Some of the plumbing commands we will look at are `hash-object`, `update-index`, `write-tree`, `commit-tree` and\n`cat-file`.\n\n## Git objects\n\n4 types of Objects:\n\n- `blob` (binary large object) - the data we want git to store and version\n- `tree` - pointers to file names, contents \u0026 other trees. A git tree object creates hierarchy between files and\ndirectories in a git repository.\n- `commit` - tree of changes together with some additional metadata (like author, commit message, committer, etc.). It\nrepresents snapshot of the state of the repository.\n- `tag` - For annotated tags which contains hash of tagged object (usually commits are tagged).\n\n## Git references\n\n- names that point to sha1 hashes.\n- stored in directories inside `.git/refs`.\n- `heads` contain branch references.\n- `tags` contain tag references.\n- `remotes` contain references on remote urls added.\n\n## Git packfiles\n\n- Git stores objects on disk in so called loose object format initially.\n- It would be inefficient if git kept on storing entire content everytime we make change on a file.\n- Git occasionally packs up several of these loose objects into a single binary file called packfile to save space and\nbe more efficient. This allows storing versions of objects in the form of deltas.\n\n## Git gc/reflog/fsck\n\n### gc\n\n- performs cleanup and optimizes the repository.\n- several housekeepings such as compressing file revisions, removing unreachable objects, packing refs and pruning\nreflogs \u0026 stale working trees.\n- As it relates to packfiles, it gathers up loose objects \u0026 places them in packfiles. Also, it consolidates packfiles\ninto a single large packfile as necessary.\n- Auto gc happens once in a while as git deems necessary for example when you try to push to remote.\n\n### reflog\n\n- git records what repo's `HEAD` is everytime it changes which we call `reflog`.\n- stored in `.git/logs` directory.\n- can be useful to recover accidentally deleted branches.\n\n### fsck\n\n- Integrity check of your objects in the database.\n- Often gives us the knowledge of dangling objects that can no longer be reached.\n- Could be potentially useful in cases when we don't have reflogs.\n\n## Working Example\n\nWe will run series of commands and see how things work under the hood based on the understanding from information above.\nWe've already created `git_demo` repository earlier while looking at the tree structure of `.git` directory.\n\n```shell\n# lets create a simple text file\n$ echo \"Hello World\" \u003e readme.md\n\n# now we can see what would the sha1 hash of readme.md according to git\n# the way it works is, a format of data is created as below and then sha1 hash for that is created\n# \u003ctype_of_object\u003e \u003csize_of_object\u003e\u003cnullbyte\u003e\u003ccontent_of_object\u003e | sha1_hash_function\n$ git hash-object readme.md\n557db03de997c86a4a028e1ebd3a1ceb225be238\n\n# we can replicate what git did by doing something like below:\n# blob is the type of object in this case\n# as you can see below, the hash matches, simple ;)\n$ echo -e \"blob $(wc -m readme.md | cut -d' ' -f1)\\000$(cat readme.md)\" | sha1sum \n557db03de997c86a4a028e1ebd3a1ceb225be238  -\n\n# we ran hash-object which is a git plumbing command\n# however that doesn't add our content to object database until we instruct git to do so\n# next we will do that\n# open a new terminal (or tmux split) with the following command\n$ watch -n 0.5 tree .git\n\n# now we will hash the object and ask git to store it in git object database as well\n# when we do so, git will create new directory .git/objects/55\n# and create file with name `7db03de997c86a4a028e1ebd3a1ceb225be238`\n# first two characters of sha1 hash form directory and rest the filenames\n# thats how git organizes objects in its objects database.\n$ git hash-object -w readme.md\n557db03de997c86a4a028e1ebd3a1ceb225be238\n\n# next we will cat the content of file\n$ cat .git/objects/55/7db03de997c86a4a028e1ebd3a1ceb225be238 \nxK��OR04b�H����/�I�A�I\n\n# above we see some unrecognizable text and thats not what we saved though\n# git saves our content with header + nullbyte + content as we saw earlier\n# we can use git cat-file plumbing command to look at the object we just created.\n# -p means pretty print the content of that object\n$ git cat-file -p 557db03de997c86a4a028e1ebd3a1ceb225be238\nHello World\n\n# and now lets look at the type of the file\n# -t means print the type of that object\n$ git cat-file -t 557db03de997c86a4a028e1ebd3a1ceb225be238\nblob\n\n# now the reason the raw content of object is some sort of gibberish\n# is because its stored after running zlib compression\n# lets see what it has in there\n# as we see next, it stores type of object (blob) and content length (12)\n# and actual content separated by nullbyte character.\n$ cat .git/objects/55/7db03de997c86a4a028e1ebd3a1ceb225be238 | zlib-flate -uncompress\nblob 12Hello World\n\n# now lets see an example of creating a tree out of the object we added\n# we have added the object to git object database but it has no idea about\n# where and how that should exist in our repository\n# before we do that, lets look at our git status\n$ g status -s\n?? readme.md\n\n# so there's an untracked file which we will add to git's staging area\n# we normally do that via git add readme.md for example\n# this time, we will use git update-index plumbing command\n# which updates .git/index file (the file that holds staging info)\n$ git update-index --add readme.md\n\n# if we run git status, that matches with the fact that readme.md is now in staging area\n# the ?? from previous status has changed to A now :)\n$ git status -s\nA  readme.md\n\n# now we can take a look at our staging area more deeply\n# 100644 - 100 means blob (040 means tree) and 644 represents permission\n# permission in git looks like unix permissions except its much more limited\n$ git ls-files --stage\n100644 557db03de997c86a4a028e1ebd3a1ceb225be238 0\treadme.md\n\n# now we can create a new tree with the current state of index\n# note that current state of index has readme.md file in staging area\n# for this, we use git write-tree plumbing command\n# and we get a hash back\n$ git write-tree\n3a3aff7fa9639da674465c43fac565c1291f952b\n\n# we can use cat-file to look into the content \u0026 type of object that hash represents\n$ git cat-file -p 3a3aff7fa9639da674465c43fac565c1291f952b\n100644 blob 557db03de997c86a4a028e1ebd3a1ceb225be238\treadme.md\n\n$ git cat-file -t 3a3aff7fa9639da674465c43fac565c1291f952b\ntree\n\n# now that we have a tree object that holds a blob object we want to be committed\n# we can use git commit-tree plumbing command to create new commit object\n# with the tree object we just created\n$ echo \"initial commit\" | git commit-tree 3a3aff7fa9639da674465c43fac565c1291f952b\n86aa1cb0eec333b600d5b8c23c9c95d4983d5e6d\n\n# now lets look at the log of current branch\n# we should see new commit we just made\n# but for some reason, we don't :(\n$ git log --oneline\nfatal: your current branch 'master' does not have any commits yet\n\n# so where did that commit go then\n# if you search for that object in .git/objects, we do see .git/objects/86/aa1cb0eec333b600d5b8c23c9c95d4983d5e6d\n# then why didn't it show up on the git log?\n# lets see what data we have in that object\n$ cat .git/objects/86/aa1cb0eec333b600d5b8c23c9c95d4983d5e6d | zlib-flate -uncompress\ncommit 181tree 3a3aff7fa9639da674465c43fac565c1291f952b\nauthor techgaun \u003ccoolsamar207@gmail.com\u003e 1561256669 -0500\ncommitter techgaun \u003ccoolsamar207@gmail.com\u003e 1561256669 -0500\n\ninitial commit\n\n# so we have the data such as tree the commit object was created from,\n# author, committer and finally commit message\n# now we come back to the same question we had\n# why did the git log not show that commit?\n# the reason is that this commit is not associated to the current branch\n# we only created the commit object so far\n# now we can do that using git update-ref plumbing command\n# which updates .git/refs/heads/master file among other things\n# we could have done: echo 86aa1cb0eec333b600d5b8c23c9c95d4983d5e6d \u003e .git/refs/heads/master\n# but git does it in a safer way while handling other side effects as necessary\n$ git update-ref refs/heads/master 86aa1cb0eec333b600d5b8c23c9c95d4983d5e6d\n\n# now lets see what happens with git log\n# as you will see next, our commit is now part of master branch. Voila!\n# we just made a commit to git without using normal commands we are used to with\n$ git log --oneline\n86aa1cb (HEAD -\u003e master) initial commit\n\n# now lets create another commit with the earlier commit id as the parent\n# we repeat same stuff again, this time we create file with much larger content\n$ printf 'Hello World.%.0s' {1..1000} \u003e new.md\n\n# lets check the status real quick\n$ git status -s\n?? new.md\n\n# now lets add that file to staging area\n# note that we will skip hash-object this time\n# and the reason why it still works is because\n# update-index goes through the process of hashing all the objects\n# while adding them to the staging area\n$ git update-index --add new.md\n\n# and if we check status, we see its added to the staging area\n$ git status -s\nA  new.md\n\n$ git write-tree\nc4996cfea245445e4bdb0561bf18e29436568e58\n\n# now lets inspect that tree\n# we see that this tree contains complete snapshot of what we have in the git repo\n$ git cat-file -p c4996cfea245445e4bdb0561bf18e29436568e58\n100644 blob c7fc1d8f722cc984f6c90f4151de8b250eeb6343\tnew.md\n100644 blob 557db03de997c86a4a028e1ebd3a1ceb225be238\treadme.md\n\n# and now lets make commit object with our newly created tree\n# as you will see, we pass part of commit hash from first commit we made\n# as you can see, we only passed 7 first characters of hash\n# as long as git can resolve the part of hash into an object,\n# we can use such short partials of sha1 hash\n$ echo \"Added new file\" | git commit-tree c4996cfea245445e4bdb0561bf18e29436568e58 -p 86aa1cb\nfed6ba87e445db5175c628cfecbbd0b83526a54a\n\n# we can do cat-file on commit object as well\n# note the parent line in this case\n$ git cat-file -p fed6ba87e445db5175c628cfecbbd0b83526a54a\ntree c4996cfea245445e4bdb0561bf18e29436568e58\nparent 86aa1cb0eec333b600d5b8c23c9c95d4983d5e6d\nauthor techgaun \u003ccoolsamar207@gmail.com\u003e 1561258195 -0500\ncommitter techgaun \u003ccoolsamar207@gmail.com\u003e 1561258195 -0500\n\nAdded new file\n\n# also, lets look at the type of commit object with cat-file\n$ git cat-file -t fed6ba87e445db5175c628cfecbbd0b83526a54a\ncommit\n\n# finally, lets update master ref to this commit\n$ git update-ref refs/heads/master fed6ba87e445db5175c628cfecbbd0b83526a54a\n\n# and lets check the git log one more time\n# and we see things as expected\n$ git log --oneline\nfed6ba8 (HEAD -\u003e master) Added new file\n86aa1cb initial commit\n```\n\n## Other Examples\n\nWe will continue to operate on the above repository we created earlier\n\n### gc and packfile\n\n```shell\n# lets look at the size of .git/objects once\n# and as per the output below, we are at around 41K with our git object\n\n$ du -b .git/objects/\n4096\t.git/objects/pack\n4224\t.git/objects/86\n4096\t.git/objects/info\n4150\t.git/objects/3a\n4177\t.git/objects/c4\n4124\t.git/objects/55\n4227\t.git/objects/0a\n4255\t.git/objects/fe\n4207\t.git/objects/c7\n41652\t.git/objects/\n\n# and now lets look at the tree of .git directory after all the things we did\n# hooks directory is not shown here to preserve space\n$ tree .git\n.git\n├── branches\n├── config\n├── description\n├── HEAD\n├── hooks\n├── index\n├── info\n│   └── exclude\n├── logs\n│   ├── HEAD\n│   └── refs\n│       └── heads\n│           └── master\n├── objects\n│   ├── 0a\n│   │   └── 9c3e68d37858d478ad2692e01126e6851d1c93\n│   ├── 3a\n│   │   └── 3aff7fa9639da674465c43fac565c1291f952b\n│   ├── 55\n│   │   └── 7db03de997c86a4a028e1ebd3a1ceb225be238\n│   ├── 86\n│   │   └── aa1cb0eec333b600d5b8c23c9c95d4983d5e6d\n│   ├── c4\n│   │   └── 996cfea245445e4bdb0561bf18e29436568e58\n│   ├── c7\n│   │   └── fc1d8f722cc984f6c90f4151de8b250eeb6343\n│   ├── fe\n│   │   └── d6ba87e445db5175c628cfecbbd0b83526a54a\n│   ├── info\n│   └── pack\n└── refs\n    ├── heads\n    │   └── master\n    └── tags\n\n19 directories, 26 files\n\n# Now lets see if we can optimize our repo like git promises to by running gc\n$ git gc\n\n# And once again, lets see the size of .git/objects\n$ du -b .git/objects/\n5855\t.git/objects/pack\n4150\t.git/objects/info\n4227\t.git/objects/0a\n18328\t.git/objects/\n\n# many of the objects are gone as we see above\n# and our git object database is down to 18K\n# if we look at the tree of .git repo, it will be different now\n$ tree .git\n.git\n├── branches\n├── config\n├── description\n├── HEAD\n├── hooks\n├── index\n├── info\n│   ├── exclude\n│   └── refs\n├── logs\n│   ├── HEAD\n│   └── refs\n│       └── heads\n│           └── master\n├── objects\n│   ├── 0a\n│   │   └── 9c3e68d37858d478ad2692e01126e6851d1c93\n│   ├── info\n│   │   └── packs\n│   └── pack\n│       ├── pack-5dda0074f5c0745e99fad6c6d639ca69f009091e.idx\n│       └── pack-5dda0074f5c0745e99fad6c6d639ca69f009091e.pack\n├── packed-refs\n└── refs\n    ├── heads\n    └── tags\n\n13 directories, 24 files\n\n# As we see above, we have .git/objects/pack with two files .idx and .pack\n# git has optimized our repository and created packfile like we said earlier\n# there's git show-index command to which you can pipe .idx file\n# I leave that as homework for you to look into that and see what you will see in those\n```\n\n### reflog and fsck\n\n```shell\n# lets move master branch to the first commit\n$ git reset --hard 86aa1cb\nHEAD is now at 86aa1cb initial commit\n\n# now we don't have the top commit because none of the branches reach to that commit\n# if you come back at later point in time, you will not remember sha1 hash\n# which means we effectively lost that commit\n# imagine that it was not intended action, how could we recover?\n# given that our master branch had that commit at some point of time,\n# reflog records that meaning we can recover that commit\n$ git reflog\n86aa1cb (HEAD -\u003e master) HEAD@{0}: reset: moving to 86aa1cb\nfed6ba8 HEAD@{1}: reset: moving to HEAD\nfed6ba8 HEAD@{2}: \n86aa1cb (HEAD -\u003e master) HEAD@{3}:\n\n# here we see the sha1 hash of our commit fed6ba8\n# and now we can create new branch from that commit effectively saving us from disaster\n$ git branch master-recovered fed6ba8\n\n# lets checkout to the newly recovered branch\n$ git checkout master-recovered\n\n# now lets look at the log and we see that we have the lost commit back. Voila!\n$ git log --oneline\nfed6ba8 (HEAD -\u003e master-recovered) Added new file\n86aa1cb (master) initial commit\n\n# now imagine that our reflogs were gone\n# is there a possibility of recovery in such case?\n# maybe? fsck may help although its not straightforward, esp. in large repos\n# to mimic loss of reflog, lets delete .git/logs\n$ rm -rf .git/logs\n\n# and if we check our reflog, its empty\n$ git reflog\n\n# lets delete master-recovered branch once again\n$ git branch -D master-recovered\nDeleted branch master-recovered (was fed6ba8).\n\n# now lets run fsck with --full argument\n# --full is default in recent git versions which means you don't have to specify it anymore\n# this performs a full fledged database verification\n$ git fsck --full\nChecking object directories: 100% (256/256), done.\nChecking objects: 100% (6/6), done.\ndangling commit fed6ba87e445db5175c628cfecbbd0b83526a54a\n\n# and if you look above, we see dangling commit\n# which is the commit that got lost in oblivion\n# now that we know our dangling commit, we can recover that commit just like earlier\n$ git branch master-recovered-new fed6ba87e445db5175c628cfecbbd0b83526a54a\n```\n\n## Author\n\n- [techgaun](https://github.com/techgaun)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftechgaun%2Fgit-internals","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftechgaun%2Fgit-internals","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftechgaun%2Fgit-internals/lists"}