The Object Model¶
Everything you've learned so far - commits, branches, staging, merging - is built on top of a surprisingly simple storage system. Git is, at its core, a content-addressable filesystem: a key-value store where the key is a SHA-1 hash of the content and the value is the content itself. Understanding this layer explains why Git behaves the way it does and gives you the tools to inspect and repair repositories at the lowest level.
Content-Addressable Storage¶
The term content-addressable means that the address (name) of every piece of data is derived from the data itself. Git computes a SHA-1 hash of each object's content, and that hash becomes the object's identity. Two files with identical content produce the same hash and are stored once. Change a single byte and the hash - and therefore the identity - changes completely.
This has profound implications:
- Deduplication is automatic. If the same file appears in 1,000 commits, Git stores one copy.
- Integrity is guaranteed. If any bit of a stored object changes (disk corruption, tampering), the hash no longer matches and Git detects it immediately.
- History is tamper-evident. Since each commit's hash includes its parent hash, changing any commit changes every subsequent hash in the chain. You can't alter history silently.
SHA-1 and SHA-256
Git has historically used SHA-1 (160-bit, 40 hex characters). While SHA-1 has known collision vulnerabilities in theory, Git includes additional hardening against known attacks. Git is transitioning to SHA-256 (256-bit, 64 hex characters) with a compatibility layer. New repository formats can opt into SHA-256, but most repositories still use SHA-1.
The Four Object Types¶
Every object in Git's database is one of four types:
flowchart TD
C["commit<br/>snapshot + metadata"] --> T["tree<br/>directory listing"]
T --> B1["blob<br/>file content"]
T --> B2["blob<br/>file content"]
T --> T2["tree<br/>subdirectory"]
T2 --> B3["blob<br/>file content"]
TAG["tag<br/>named reference + message"] --> C
Blob (Binary Large Object)¶
A blob stores the content of a single file - nothing else. No filename, no permissions, no metadata. Just the raw bytes. Two files with identical content, regardless of their names or locations, produce the same blob.
# Hash a string as a blob
echo "Hello, Git" | git hash-object --stdin
# 41e40e5a20c7e8657a8a92e2ce0bfa39a9e0d40c
# Hash a file
git hash-object README.md
Tree¶
A tree represents a directory. It contains entries, each pointing to a blob (file) or another tree (subdirectory), along with the file's name and permission mode:
Permission modes:
| Mode | Meaning |
|---|---|
100644 |
Regular file |
100755 |
Executable file |
120000 |
Symbolic link |
040000 |
Subdirectory (tree) |
Commit¶
A commit ties everything together. It references:
- A tree (the root directory snapshot)
- Zero or more parent commits
- Author and committer identity with timestamps
- A message
The first commit in a repository has no parent. A merge commit has two or more parents. Every other commit has exactly one parent.
Annotated Tag¶
An annotated tag is a named reference to a commit (or any object) with additional metadata: a tagger identity, timestamp, and message. Unlike lightweight tags (which are just refs), annotated tags are full objects stored in the database.
# Create an annotated tag
git tag -a v1.0 -m "First stable release"
# Show the tag object
git cat-file -p v1.0
The .git/objects Directory¶
All objects are stored in .git/objects/. Git uses the first two characters of the hash as a directory name and the remaining 38 as the filename:
.git/objects/
├── 41/
│ └── e40e5a20c7e8657a8a92e2ce0bfa39a9e0d40c (a blob)
├── 8f/
│ └── a3c9b1d2e4f5a6b7c8d9e0f1a2b3c4d5e6f7a8 (a tree)
├── e4/
│ └── f5a6b7c8d9e0f1a2b3c4d5e6f7a8b9c0d1e2f3 (a commit)
├── info/
└── pack/
├── pack-abc123.idx
└── pack-abc123.pack
Individual objects are called loose objects. As a repository grows, Git periodically packs loose objects into packfiles (.pack with an .idx index) for efficiency. Packfiles use delta compression - storing only the differences between similar objects. The Refs, the Reflog, and the DAG guide covers packfiles in depth.
Each loose object is stored as: type size\0content, compressed with zlib.
Plumbing Commands¶
Git has two categories of commands: porcelain (user-facing: commit, merge, push) and plumbing (low-level: hash-object, cat-file, write-tree). Plumbing commands let you interact directly with the object database.
git hash-object - Store Content¶
# Hash content from stdin (just compute the hash, don't store)
echo "Hello" | git hash-object --stdin
# ce013625030ba8dba906f756967f9e9ca394464a
# Hash and store into the object database
echo "Hello" | git hash-object --stdin -w
# Hash a file
git hash-object README.md
git cat-file - Read Objects¶
# Show object type
git cat-file -t a1b2c3d
# blob, tree, commit, or tag
# Show object size
git cat-file -s a1b2c3d
# 42
# Pretty-print object content
git cat-file -p a1b2c3d
git ls-tree - List Tree Contents¶
# List the root tree of HEAD
git ls-tree HEAD
# List recursively (all files in all subdirectories)
git ls-tree -r HEAD
# List a specific directory
git ls-tree HEAD src/
git write-tree - Create a Tree from the Index¶
git commit-tree - Create a Commit Object¶
# Create a commit from a tree, with a parent and message
echo "My commit message" | git commit-tree <tree-hash> -p <parent-hash>
# Returns the hash of the new commit
Building a Commit with Plumbing Commands¶
This is the most illuminating exercise in the entire course. Instead of using git add and git commit, you'll create a commit entirely with low-level plumbing commands - the same operations Git performs internally.
Tracing the Object Graph¶
Every commit points to a tree, every tree points to blobs and subtrees, and everything is connected by SHA-1 hashes. You can trace the entire object graph starting from any commit:
Object Graph Visualization¶
The relationships between objects form a directed acyclic graph (DAG). Here's what a small repository's object graph looks like:
flowchart TD
C2["commit: e4f5a6b<br/>Add utils module"] --> C1["commit: a1b2c3d<br/>Initial commit"]
C2 --> T2["tree: 8fa3c9b<br/>(root)"]
C1 --> T1["tree: c3b8bb1<br/>(root)"]
T1 --> B1["blob: d670460<br/>README.md content"]
T1 --> B2["blob: f1e2d3c<br/>app.py content v1"]
T2 --> B1
T2 --> B3["blob: a9b8c7d<br/>app.py content v2"]
T2 --> ST["tree: 7e8f9a0<br/>src/"]
ST --> B4["blob: 5d6e7f8<br/>utils.py content"]
Notice that B1 (README.md) is referenced by both trees - the file didn't change between commits, so Git reuses the same blob. This is content-addressable deduplication in action.
Further Reading¶
- Pro Git - Chapter 10.2: Git Objects - comprehensive walkthrough of blob, tree, commit, and tag objects
- Git Internals PDF (Scott Chacon) - deep dive into the object model
- Official git-cat-file documentation - inspecting objects
- Official git-hash-object documentation - creating objects
Previous: Configuring Git | Next: Refs, the Reflog, and the DAG | Back to Index