The `.git` folder: the whole repo lives here

When you run, git init you create a .git directory. Everything Git needs to track and reconstruct your project is inside that hidden folder:

.git/objects/ - object storage (blobs, trees, commits, tags). Objects are stored compressed and named by their hash.
.git/refs/ - named references: refs/heads/* (branches), refs/tags/* (tags), refs/remotes/*.
.git/HEAD - a pointer to the current branch (for example ref: refs/heads/main) or directly to a commit hash in detached HEAD state.
.git/index - the staging area (also called the index): a binary file that maps paths to blob hashes and stores stat info used for fast diffs.
.git/packed-refs, .git/config, .git/logs/ and other housekeeping files.
.git/objects/pack/ - packfiles: compressed, delta-compressed collections of objects used to save space and speed operations.

Think of .git as a small, efficient database that stores your project’s history and metadata — not a copy of your working files.

The three core object types

Git stores content as objects. The most important ones:

Blob - raw file contents. A blob is the file data (no filename metadata). A blob is created when you stage a file.
Tree - a snapshot of a directory. It maps filenames (and metadata like mode) to blob hashes or to other tree hashes (subdirectories).
Commit - a snapshot of the project at a point in time. A commit contains:
- a reference to a tree (the root snapshot),
- zero or more parent commit hashes (one for normal commits, multiple for merges),
- author/committer metadata and the commit message.

There are other object types (annotated tag objects and internal plumbing objects), but blobs, trees, and commits are the mental model you need.

Content-addressable storage & hashes

Every object is identified by a cryptographic hash of its contents (historically, SHA-1, moving toward SHA-256). The hash is computed over the object type, size, and content — e.g., blob 14\0Hello, world\n. The result is a unique ID like 3b18e3.... Because the ID depends on content:

The same file content stored in different repositories yields the same blob hash — Git can deduplicate automatically.
Any corruption or accidental change in an object changes its hash — git fsck can detect this.
Commits are safe and verifiable because they reference tree hashes, which reference blob hashes, forming a hash chain you can verify.

This chaining is what makes Git extremely robust.

What happens on `git add`

High-level flow when you run git add file.txt:

Git reads the file contents from your working directory.
Git creates a blob object from the file contents and writes it to .git/objects/ (zlib compressed). You can replicate this step with plumbing:
```
 git hash-object -w file.txt   # writes blob and prints its hash
```
Git updates the index (.git/index) to record that file.txt now corresponds to that blob hash and stores stat info (mtime, size, mode) so it can cheaply tell later whether the file changed.

Important notes:

git add stages content, not filenames alone — the index maps filenames → blob-hash.
If the same file content already exists (same hash), Git reuses the blob; no duplicate content is stored.

What happens on `git commit`

When you run git commit -m "message" (assuming staged changes exist), Git:

Write a tree object that represents the project state in the index. The tree references blobs (and subtrees) for every tracked file/directory. You can reproduce this with:
```
 git write-tree   # prints tree-hash
```
Creates a commit object that includes:
- tree: <tree-hash>
- parent: <parent-hash> (if any)
- author/committer metadata
- commit message
  You can create a commit manually (plumbing) with:

    echo "My commit" | git commit-tree <tree-hash> -p <parent-hash>

That prints the new commit hash.

Updates the branch ref that HEAD points to (for example refs/heads/main) to point to the new commit hash.
- This is what moves the branch forward.
- HEAD itself either points to that ref (ref: refs/heads/main) or contains a hash when detached.
Optionally writes reflog entries in .git/logs/ to track ref changes.

So commits are lightweight pointers to trees (snapshots), and trees point to blobs (file contents).

Visualizing the commit → tree → blob relationship

commit C (hash: ccccc)
  ├─ tree T (hash: ttttt)
  │   ├─ file a.txt -> blob A (hash: aaaaa)
  │   ├─ file b.txt -> blob B (hash: bbbbb)
  │   └─ dir/ -> tree T2 (hash: t2t2t)
  │        └─ file c.txt -> blob C (hash: ccccc)
  └─ parent -> commit B (hash: bbbbb)

Each arrow is a reference by hash. To inspect any object:

git cat-file -p <hash>   # pretty-print the object

Branches, HEAD, and refs

A branch is just a file under .git/refs/heads/ that contains a commit hash. Example: .git/refs/heads/main might contain d34db33f....
HEAD points to either a branch ref (ref: refs/heads/main) or directly to a commit (detached HEAD).
Moving a branch is simply updating the file that stores the hash. This is why Git operations are fast.

Packs and garbage collection

Storing every object as a separate file becomes inefficient for big repos. Git periodically compresses many objects into packfiles under .git/objects/pack/ (files like pack-*.pack and pack-*.idx). git gc (garbage collect) will pack loose objects and remove unreferenced objects older than a grace period.

How Git ensures integrity & consistency

Objects are content-addressed: their SHA is derived from content - changing content changes the hash.
Commits reference trees, trees reference blobs - this creates an immutable web of hashes. If any object is corrupted or tampered with, the hash chain breaks.
git fsck verifies object connectivity and integrity.
Reflogs and multiple copies (remotes) provide redundancy and recovery points.

How Git Works Internally

The `.git` folder: the whole repo lives here

The three core object types

Content-addressable storage & hashes

What happens on `git add`

What happens on `git commit`

Visualizing the commit → tree → blob relationship

Branches, HEAD, and refs

Packs and garbage collection

How Git ensures integrity & consistency

Comments

More from this blog

Emmet for HTML: A Beginner’s Guide to Writing Faster Markup

CSS Selectors: A Beginner's Guide

Understanding HTML Tags and Elements

How a Browser Works

Understanding How TCP Works

Command Palette

The .git folder: the whole repo lives here

The three core object types

Content-addressable storage & hashes

What happens on git add

What happens on git commit

Visualizing the commit → tree → blob relationship

Branches, HEAD, and refs

Packs and garbage collection

How Git ensures integrity & consistency

Comments

More from this blog

The `.git` folder: the whole repo lives here

What happens on `git add`

What happens on `git commit`