Transfer Protocols and Plumbing¶
When you run git fetch or git push, Git negotiates with a remote server to figure out which objects need to be transferred, packages them efficiently, and sends them over the wire. This guide covers how that transfer works at the protocol level, the different transport mechanisms, and clone strategies for large repositories.
How Transfers Work¶
Every transfer between Git repositories follows the same basic pattern:
- Discovery - the client and server exchange lists of refs (branches, tags) and their commit hashes
- Negotiation - they compare what each side has to determine the minimal set of objects to transfer
- Transfer - the sender packages the needed objects into a packfile and transmits it
- Update - the receiver integrates the objects and updates its refs
The two processes involved are git-upload-pack (runs on the server during fetch) and git-receive-pack (runs on the server during push).
Transport Protocols¶
SSH Transport¶
The most common protocol for authenticated access. Git connects via SSH and runs git-upload-pack or git-receive-pack on the server:
SSH handles authentication (key-based or password) and encryption. Git just pipes data through the SSH channel.
Smart HTTP¶
The standard for HTTPS access. The server runs a CGI program (or equivalent) that speaks Git's pack protocol over HTTP POST/GET:
Smart HTTP supports both read and write, authentication via HTTP headers (tokens, Basic auth), and works through proxies and firewalls. Most hosting platforms default to this.
Dumb HTTP¶
An older, read-only protocol where Git downloads objects individually over plain HTTP. It doesn't require any special server-side software - just a web server serving static files. Rarely used today because it's much slower (no pack negotiation).
Native Git Protocol¶
The git:// protocol is unauthenticated and unencrypted. It was used for fast, read-only access to public repositories. Most platforms no longer support it due to security concerns.
Pack Negotiation¶
The most interesting part of the transfer is pack negotiation - how the client and server figure out which objects to send. This is what makes git fetch fast even for large repositories.
The Want/Have Exchange¶
During a fetch:
- The server sends a list of all its refs and their hashes
- The client identifies which commits it wants (remote refs it doesn't have)
- The client sends have lines - commits it already has
- The server uses the have list to find the common ancestor - the newest commit both sides share
- The server sends a packfile containing all objects reachable from the wanted commits but not from the common ancestors
CLIENT SERVER
refs: main=abc123, feature=def456
want abc123
want def456
have 789fed
have 456abc
ACK 456abc (common ancestor found)
<sends packfile>
done
The negotiation is optimized with multi-ack: the server can acknowledge multiple common ancestors, allowing it to send a more precisely targeted packfile.
Protocol v2¶
Git protocol v2 (default since Git 2.26) improves on v1 with:
- Ref filtering - the server only sends refs the client asks about, not the entire ref list (huge improvement for repos with thousands of branches)
- Server capabilities - structured capability negotiation
- Stateless mode - better for HTTP-based transports
# Force protocol v2
git config --global protocol.version 2
# See protocol exchange
GIT_TRACE=1 GIT_TRACE_PACKET=1 git fetch origin 2>&1 | head -40
Watching the Protocol¶
You can observe Git's transfer protocol in action using trace environment variables:
# General trace output
GIT_TRACE=1 git fetch origin
# Packet-level protocol exchange
GIT_TRACE_PACKET=1 git fetch origin
# HTTP request/response details
GIT_CURL_VERBOSE=1 git fetch origin
# Performance timing for each operation
GIT_TRACE_PERFORMANCE=1 git fetch origin
# Combine multiple traces
GIT_TRACE=1 GIT_TRACE_PACKET=1 git fetch origin 2>&1 | less
Clone Strategies for Large Repositories¶
Not every clone needs the entire history. Git provides several strategies for faster clones of large repositories.
Shallow Clone¶
Downloads only recent history, not the full commit chain:
# Only the most recent commit
git clone --depth 1 https://github.com/user/large-repo.git
# Last 10 commits
git clone --depth 10 https://github.com/user/large-repo.git
# Deepen a shallow clone later
git fetch --deepen=50
# Convert shallow to full clone
git fetch --unshallow
Shallow clones are fast but limited. Some operations (like git log and git blame) only show the shallow history. You can't push from a shallow clone in some configurations.
Partial Clone¶
Downloads commits and trees but skips large blobs until you actually need them (Git 2.22+):
# Skip all blobs (download on demand when you checkout)
git clone --filter=blob:none https://github.com/user/large-repo.git
# Skip blobs larger than 1MB
git clone --filter=blob:limit=1m https://github.com/user/large-repo.git
# Skip all trees (extremely minimal, mainly for CI)
git clone --filter=tree:0 https://github.com/user/large-repo.git
Partial clones maintain full history (all commits) but defer downloading file content until checkout or diff. Git fetches missing blobs transparently when needed. This is ideal for large repositories where you don't need every file's content upfront.
Single-Branch Clone¶
Only downloads one branch:
git clone --single-branch https://github.com/user/repo.git
git clone --single-branch --branch develop https://github.com/user/repo.git
Sparse Checkout¶
After cloning, check out only a subset of files:
git clone --filter=blob:none --sparse https://github.com/user/monorepo.git
cd monorepo
git sparse-checkout set src/my-service tests/my-service
This combination (partial clone + sparse checkout) is the fastest way to work with a large monorepo when you only need a few directories.
Git Bundle¶
git bundle creates a file containing Git objects and refs - a portable repository snapshot that can be transferred offline (USB drive, email, sneakernet). Useful when network access to a remote isn't available.
Creating a Bundle¶
# Bundle the entire repository
git bundle create repo.bundle --all
# Bundle a specific branch
git bundle create feature.bundle main
# Bundle only new commits since a known point
git bundle create update.bundle main ^v1.0
Using a Bundle¶
# Verify a bundle file
git bundle verify repo.bundle
# Clone from a bundle
git clone repo.bundle my-repo
# Fetch from a bundle into an existing repo
git fetch repo.bundle main:refs/remotes/bundle/main
git archive - Export Without .git¶
git archive creates a tar or zip archive of a tree without the .git directory - useful for creating release tarballs:
# Create a tar.gz of the current HEAD
git archive --format=tar.gz --prefix=project-v1.0/ HEAD > project-v1.0.tar.gz
# Create a zip of a specific tag
git archive --format=zip v1.0 > project-v1.0.zip
# Archive a specific directory
git archive HEAD src/ > src-only.tar
The --prefix option adds a directory prefix so the archive extracts into a named directory rather than the current directory.
Exercise¶
Further Reading¶
- Pro Git - Chapter 10.6: Transfer Protocols - how fetch and push work at the protocol level
- Git Protocol v2 Documentation - improvements over protocol v1
- Official git-bundle documentation - offline transfers
- Official git-archive documentation - creating release tarballs
- Partial Clone Documentation - filtering objects during clone and fetch
Previous: Refs, the Reflog, and the DAG | Next: Collaboration Workflows | Back to Index