Skip to content

Refs, the Reflog, and the DAG

The Object Model guide showed you that everything in Git is stored as objects identified by SHA-1 hashes. But nobody wants to type e4f5a6b7c8d9e0f1a2b3c4d5e6f7a8b9c0d1e2f3 to reference a commit. References (refs) are human-readable names that point to objects. They're the layer that makes Git usable. This guide covers how refs work, how the reflog tracks every change, how commits form a directed acyclic graph, and how Git manages storage efficiency with garbage collection and packfiles.


References

A reference (ref) is a file that contains a SHA-1 hash pointing to a Git object - usually a commit. Branches, tags, and remote-tracking branches are all refs.

The .git/refs/ Directory

.git/refs/
├── heads/          # Local branches
│   ├── main
│   └── feature/auth
├── tags/           # Tags
│   ├── v1.0
│   └── v2.0
├── remotes/        # Remote-tracking branches
│   └── origin/
│       ├── main
│       └── feature/auth
└── stash           # Stash ref

Each file contains a single line: the 40-character hash of the commit (or tag object) it points to.

# Read a ref directly
cat .git/refs/heads/main
# e4f5a6b7c8d9e0f1a2b3c4d5e6f7a8b9c0d1e2f3

# Or use the plumbing command
git rev-parse main
# e4f5a6b7c8d9e0f1a2b3c4d5e6f7a8b9c0d1e2f3

Packed Refs

When a repository has many refs, Git packs them into a single file for efficiency:

cat .git/packed-refs
# pack-refs with: peeled fully-peeled sorted
e4f5a6b7c8d9e0f1a2b3c4d5e6f7a8b9c0d1e2f3 refs/heads/main
a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0 refs/tags/v1.0
^c9d0e1f2a3b4c5d6e7f8a9b0b1c2d3e4f5a6b7c8

Lines starting with ^ show the commit that an annotated tag points to (the "peeled" value). Loose refs in .git/refs/ take precedence over packed refs.

Plumbing Commands for Refs

# Update a ref to point to a commit
git update-ref refs/heads/new-branch a1b2c3d

# Delete a ref
git update-ref -d refs/heads/old-branch

# List all refs
git for-each-ref

Symbolic References

Most refs contain a commit hash. A symbolic reference contains the name of another ref instead. The most important symbolic ref is HEAD.

HEAD is a symbolic ref that points to the current branch:

cat .git/HEAD
# ref: refs/heads/main

When you commit, Git: 1. Reads HEAD to find the current branch (refs/heads/main) 2. Creates the new commit with the current branch tip as parent 3. Updates the branch ref to point to the new commit

When HEAD points directly to a commit hash (not a branch name), you're in detached HEAD state:

cat .git/HEAD
# e4f5a6b7c8d9e0f1a2b3c4d5e6f7a8b9c0d1e2f3

Other Symbolic References

Reference Created by Contains
HEAD Always present Current branch or commit
ORIG_HEAD merge, rebase, reset HEAD before the operation (for easy undo)
MERGE_HEAD merge (during conflict) The commit being merged in
FETCH_HEAD fetch The tips of fetched branches
CHERRY_PICK_HEAD cherry-pick (during conflict) The commit being cherry-picked
REBASE_HEAD rebase (during conflict) The current commit being rebased
# Read/write symbolic refs with plumbing
git symbolic-ref HEAD
# refs/heads/main

git symbolic-ref HEAD refs/heads/feature/auth
# Now on feature/auth (don't do this normally - use git switch)

The Directed Acyclic Graph (DAG)

Commits in Git form a directed acyclic graph (DAG). Each commit points to its parent(s), creating directed edges. The graph is acyclic - you can never follow parent pointers and arrive back at the same commit.

What Makes It a DAG

  • Directed: Each edge goes one way - from child commit to parent commit
  • Acyclic: No cycles - you can't follow parent links and loop back
  • Graph (not a tree): Merge commits have multiple parents, creating diamond shapes
flowchart RL
    E["E (HEAD -> main)"] --> D
    D --> B
    D --> C
    C --> A
    B --> A
    A["A (root)"]

In this graph, D is a merge commit with two parents (B and C). Both B and C have A as their parent. The graph has a diamond shape - this can't happen in a simple tree.

Reachability

A commit is reachable from a ref if you can get to it by following parent pointers. In the graph above, all commits are reachable from main (which points to E). A commit that is unreachable from any ref is eligible for garbage collection.

This is why deleting a branch can "lose" commits. If the branch was the only ref that could reach certain commits, those commits become unreachable. They still exist in the object database (and in the reflog for a time), but they're invisible to normal commands like git log.


The Reflog

The reflog records every time a ref (branch tip or HEAD) changes. It's a per-repository, per-ref log of all movements. Reflog entries are local only - they're never pushed or shared.

Viewing the Reflog

# HEAD reflog (every checkout, commit, reset, rebase, etc.)
git reflog

# Reflog for a specific branch
git reflog show main

# With timestamps
git reflog --date=iso

# With relative dates
git reflog --date=relative

Reflog Entry Format

e5f6a7b HEAD@{0}: commit: Add error handling
c3d4e5f HEAD@{1}: checkout: moving from main to feature/auth
c3d4e5f HEAD@{2}: merge feature/search: Fast-forward
a1b2c3d HEAD@{3}: commit: Initial commit

Each entry records: the resulting hash, the ref position (HEAD@{n}), the operation type, and a description.

Using Reflog References

The @{n} syntax works in any Git command:

# Show where HEAD was 3 moves ago
git show HEAD@{3}

# Diff between current state and 5 moves ago
git diff HEAD@{5} HEAD

# Create a branch at an old position
git branch recovery HEAD@{7}

# Time-based references
git diff HEAD@{yesterday} HEAD
git log HEAD@{2.weeks.ago}..HEAD

Reflog Expiration

Reflog entries don't live forever:

  • Entries for reachable commits expire after 90 days (default)
  • Entries for unreachable commits expire after 30 days (default)

After expiration, git gc removes the entries. Configure with:

git config --global gc.reflogExpire 180.days
git config --global gc.reflogExpireUnreachable 90.days

Garbage Collection

Git's object database grows over time. Unreachable objects - commits orphaned by rebase, old trees from amended commits, blobs from files no longer in any tree - accumulate. Garbage collection (git gc) cleans them up.

What git gc Does

  1. Packs loose objects into packfiles (delta-compressed, more efficient)
  2. Removes unreachable objects that are past the reflog expiration window
  3. Packs refs (consolidates loose ref files into packed-refs)
  4. Prunes the reflog (removes expired entries)
# Run garbage collection
git gc

# Aggressive GC (more thorough compression, slower)
git gc --aggressive

# See what would be pruned
git prune --dry-run

# Check for corruption and find unreachable objects
git fsck

When GC Runs

Git runs gc --auto automatically after certain operations (like receiving a push). Auto GC only runs if there are more than ~6,700 loose objects or more than ~50 packfiles.


Finding Unreachable Objects

git fsck (filesystem check) examines the object database for integrity and reports unreachable objects:

# Check integrity and find unreachable objects
git fsck

# Find dangling (unreachable) objects only
git fsck --unreachable

# Find and recover lost commits
git fsck --lost-found

--lost-found writes unreachable commits and blobs to .git/lost-found/. This is the last resort for recovering data that's been lost from the reflog.


Packfiles

When you first create objects, Git stores them as individual loose objects - one zlib-compressed file per object in .git/objects/. As the repository grows, this becomes inefficient. Git uses packfiles to compress many objects into a single file.

How Packfiles Work

A packfile (.pack) stores objects sequentially, using delta compression. Instead of storing each version of a file as a full blob, Git stores one version in full (the base) and subsequent versions as deltas (differences) against the base.

Git typically uses the most recent version as the base (since that's what you check out most often) and stores older versions as deltas. This is the opposite of what you might expect - older versions take slightly longer to reconstruct.

Each packfile has an accompanying index file (.idx) that maps object hashes to their position in the packfile, enabling fast lookups.

# List packfiles
ls .git/objects/pack/

# Examine a packfile's contents
git verify-pack -v .git/objects/pack/pack-*.idx | head -20

# Repack the repository
git repack -a -d

# Count loose and packed objects
git count-objects -v

Multi-Pack Index

Large repositories (especially those receiving frequent pushes) can accumulate many packfiles. The multi-pack index (MIDX, Git 2.34+) creates a single index across all packfiles, speeding up object lookups:

# Generate multi-pack index
git multi-pack-index write

# Verify multi-pack index
git multi-pack-index verify

git maintenance - Automatic Optimization

Git 2.29+ includes git maintenance for scheduling background optimization tasks:

# Register the current repo for maintenance
git maintenance register

# Run all maintenance tasks now
git maintenance run

# Start a background maintenance scheduler
git maintenance start

# Stop the scheduler
git maintenance stop

Maintenance tasks include: gc (garbage collection), commit-graph (update commit graph file), prefetch (background fetch from remotes), loose-objects (pack loose objects), and incremental-repack (consolidate packfiles).


Exercise


Further Reading


Previous: The Object Model | Next: Transfer Protocols and Plumbing | Back to Index

Comments