Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bragdond/rgit
A rust implementation of git
https://github.com/bragdond/rgit
Last synced: 4 days ago
JSON representation
A rust implementation of git
- Host: GitHub
- URL: https://github.com/bragdond/rgit
- Owner: BragdonD
- Created: 2023-08-02T17:26:41.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-08-03T10:16:17.000Z (over 1 year ago)
- Last Synced: 2023-08-03T11:32:31.382Z (over 1 year ago)
- Language: Rust
- Size: 8.79 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Let's learn Git by building it (Part 1)
I intend to write this whole project in Rust to learn the language along the way.
## How does Git handle each files and each directories?
In Git, each file and folder is considerated as a git object. The git object are stored in the .git/objects folder.
There are 2 main types of git objects: blobs and trees.### Blobs
A blob is a git object that contains the content of a file.
### Trees
A tree is a git object that contains the direct content of a directory.
### OID (Object IDentifier)
You might be wondering but how do I know which git object is representing which file or directory? Well, git use a special technique to create a unique identifier for each git object. This unique identifier is called OID (Object IDentifier).
The OID is created by hashing the content of the git object. The hashing algorithm used is SHA-1. The SHA-1 algorithm will generate a 40 characters long hexadecimal string. This string is the OID of the git object.
Let's take an example. Let's say we have a file named `foo.txt` with the following content:
```txt
Hello World!
```The OID of this file will be generated by using:
```
SHA-1("blob 12\0Hello World!")
```So what is happening here?
- First, we are using the type of the git object: `blob` or `tree`.
- Then, we are adding a space. (This is just a convention)
- Then, we are adding the length of the content of the file: `12`.
- Then, we are adding a null character `\0`. This is used to separate the header from the content.
- Finally, we are adding the content of the file: `Hello World!\n`.The SHA-1 algorithm will generate the following hash:
```
c57eff55ebc0c54973903af5f72bac72762cf4f4
```So now we have a unique identifier for our file. Let's try to do that with git.
```bash
mkdir git-objects
cd git-objects
git init
echo "Hello World!" > foo.txt
git add foo.txt
```Now, go inside the `.git/objects` folder. Right now, you should be wondering where is the file name with the `c57eff55ebc0c54973903af5f72bac72762cf4f4` oid. Well, git is smart, having too many files in the same directory can make the system slow. To prevent this, git is using a special technique to store the git objects.
Git is using the first 2 characters of the OID as a directory name and the last 38 characters as the file name. So in our case, the file will be stored in the `14` directory with the name `c57eff55ebc0c54973903af5f72bac72762cf4f4`.
Let's try to see the content of the file.
```bash
cat .git/objects/c5/7eff55ebc0c54973903af5f72bac72762cf4f4
```Ok all of this is fun but we only have the name of the file. By the way, you cannot reversed the SHA-1 algorithm. So how do we get the content of the file?
### Git objects content
Each git object has for content the exact content of the file or directory. If you tried previously to see the content of the file, you should have seen something unreadable. It is completly intended.
Each file has for goal to be store on a database to be restore later. But a database cannot store GB of data in a single object and having multiple objects would break the OID concept we just explain.
To save place, git is compressing with ZLIB the content of the file and get a binary output. This binary output is the content of the git object.
If you want to see the content of the file, you need to decompress the content of the git object. To do that, you can use the `git cat-file` command.
```bash
git cat-file -p c57eff55ebc0c54973903af5f72bac72762cf4f4
```You should see the content of the file.
### Git objects header
Each git object has a header. The header is used to store the type of the git object and the length of the content of the git object.
To see the header of a git object, you can use the `git cat-file` command.
```bash
git cat-file -t c57eff55ebc0c54973903af5f72bac72762cf4f4
```You should see the type of the git object: `blob`.
```bash
git cat-file -s c57eff55ebc0c54973903af5f72bac72762cf4f4
```You should see the size of the content of the git object: `12`.
#### Side note about SHA-1
The SHA-1 algorithm is not considered as secure anymore. It is possible to create 2 different content that will generate the same SHA-1 hash. This is called a collision. This is why Git is moving to SHA-256 in the newest version (2.x<)