How Git Works Internally
What Lies Inside the .git Folder?

The .git folder: the whole repo lives here
When you run, git init you create a .git directory. Everything Git needs to track and reconstruct your project is inside that hidden folder:
.git/objects/- object storage (blobs, trees, commits, tags). Objects are stored compressed and named by their hash..git/refs/- named references:refs/heads/*(branches),refs/tags/*(tags),refs/remotes/*..git/HEAD- a pointer to the current branch (for exampleref: refs/heads/main) or directly to a commit hash in detached HEAD state..git/index- the staging area (also called the index): a binary file that maps paths to blob hashes and stores stat info used for fast diffs..git/packed-refs,.git/config,.git/logs/and other housekeeping files..git/objects/pack/- packfiles: compressed, delta-compressed collections of objects used to save space and speed operations.
Think of .git as a small, efficient database that stores your project’s history and metadata — not a copy of your working files.
The three core object types
Git stores content as objects. The most important ones:
Blob - raw file contents. A blob is the file data (no filename metadata). A blob is created when you stage a file.
Tree - a snapshot of a directory. It maps filenames (and metadata like mode) to blob hashes or to other tree hashes (subdirectories).
Commit - a snapshot of the project at a point in time. A commit contains:
a reference to a tree (the root snapshot),
zero or more parent commit hashes (one for normal commits, multiple for merges),
author/committer metadata and the commit message.
There are other object types (annotated tag objects and internal plumbing objects), but blobs, trees, and commits are the mental model you need.
Content-addressable storage & hashes
Every object is identified by a cryptographic hash of its contents (historically, SHA-1, moving toward SHA-256). The hash is computed over the object type, size, and content — e.g., blob 14\0Hello, world\n. The result is a unique ID like 3b18e3.... Because the ID depends on content:
The same file content stored in different repositories yields the same blob hash — Git can deduplicate automatically.
Any corruption or accidental change in an object changes its hash —
git fsckcan detect this.Commits are safe and verifiable because they reference tree hashes, which reference blob hashes, forming a hash chain you can verify.
This chaining is what makes Git extremely robust.
What happens on git add
High-level flow when you run git add file.txt:
Git reads the file contents from your working directory.
Git creates a blob object from the file contents and writes it to
.git/objects/(zlib compressed). You can replicate this step with plumbing:git hash-object -w file.txt # writes blob and prints its hashGit updates the index (
.git/index) to record thatfile.txtnow corresponds to that blob hash and stores stat info (mtime, size, mode) so it can cheaply tell later whether the file changed.
Important notes:
git addstages content, not filenames alone — the index maps filenames → blob-hash.If the same file content already exists (same hash), Git reuses the blob; no duplicate content is stored.
What happens on git commit
When you run git commit -m "message" (assuming staged changes exist), Git:
Write a tree object that represents the project state in the index. The tree references blobs (and subtrees) for every tracked file/directory. You can reproduce this with:
git write-tree # prints tree-hashCreates a commit object that includes:
tree: <tree-hash>parent: <parent-hash>(if any)author/committer metadata
commit message
You can create a commit manually (plumbing) with:
echo "My commit" | git commit-tree <tree-hash> -p <parent-hash>
That prints the new commit hash.
Updates the branch ref that
HEADpoints to (for examplerefs/heads/main) to point to the new commit hash.This is what moves the branch forward.
HEADitself either points to that ref (ref: refs/heads/main) or contains a hash when detached.
Optionally writes reflog entries in
.git/logs/to track ref changes.
So commits are lightweight pointers to trees (snapshots), and trees point to blobs (file contents).
Visualizing the commit → tree → blob relationship
commit C (hash: ccccc)
├─ tree T (hash: ttttt)
│ ├─ file a.txt -> blob A (hash: aaaaa)
│ ├─ file b.txt -> blob B (hash: bbbbb)
│ └─ dir/ -> tree T2 (hash: t2t2t)
│ └─ file c.txt -> blob C (hash: ccccc)
└─ parent -> commit B (hash: bbbbb)
Each arrow is a reference by hash. To inspect any object:
git cat-file -p <hash> # pretty-print the object
Branches, HEAD, and refs
A branch is just a file under
.git/refs/heads/that contains a commit hash. Example:.git/refs/heads/mainmight containd34db33f....HEAD points to either a branch ref (
ref: refs/heads/main) or directly to a commit (detached HEAD).Moving a branch is simply updating the file that stores the hash. This is why Git operations are fast.
Packs and garbage collection
Storing every object as a separate file becomes inefficient for big repos. Git periodically compresses many objects into packfiles under .git/objects/pack/ (files like pack-*.pack and pack-*.idx). git gc (garbage collect) will pack loose objects and remove unreferenced objects older than a grace period.
How Git ensures integrity & consistency
Objects are content-addressed: their SHA is derived from content - changing content changes the hash.
Commits reference trees, trees reference blobs - this creates an immutable web of hashes. If any object is corrupted or tampered with, the hash chain breaks.
git fsckverifies object connectivity and integrity.Reflogs and multiple copies (remotes) provide redundancy and recovery points.




