Skip to content

Transfer Protocols and Plumbing

When you run git fetch or git push, Git negotiates with a remote server to figure out which objects need to be transferred, packages them efficiently, and sends them over the wire. This guide covers how that transfer works at the protocol level, the different transport mechanisms, and clone strategies for large repositories.


How Transfers Work

Every transfer between Git repositories follows the same basic pattern:

  1. Discovery - the client and server exchange lists of refs (branches, tags) and their commit hashes
  2. Negotiation - they compare what each side has to determine the minimal set of objects to transfer
  3. Transfer - the sender packages the needed objects into a packfile and transmits it
  4. Update - the receiver integrates the objects and updates its refs

The two processes involved are git-upload-pack (runs on the server during fetch) and git-receive-pack (runs on the server during push).


Transport Protocols

SSH Transport

The most common protocol for authenticated access. Git connects via SSH and runs git-upload-pack or git-receive-pack on the server:

git@github.com:user/repo.git
ssh://git@github.com/user/repo.git

SSH handles authentication (key-based or password) and encryption. Git just pipes data through the SSH channel.

Smart HTTP

The standard for HTTPS access. The server runs a CGI program (or equivalent) that speaks Git's pack protocol over HTTP POST/GET:

https://github.com/user/repo.git

Smart HTTP supports both read and write, authentication via HTTP headers (tokens, Basic auth), and works through proxies and firewalls. Most hosting platforms default to this.

Dumb HTTP

An older, read-only protocol where Git downloads objects individually over plain HTTP. It doesn't require any special server-side software - just a web server serving static files. Rarely used today because it's much slower (no pack negotiation).

Native Git Protocol

git://github.com/user/repo.git

The git:// protocol is unauthenticated and unencrypted. It was used for fast, read-only access to public repositories. Most platforms no longer support it due to security concerns.


Pack Negotiation

The most interesting part of the transfer is pack negotiation - how the client and server figure out which objects to send. This is what makes git fetch fast even for large repositories.

The Want/Have Exchange

During a fetch:

  1. The server sends a list of all its refs and their hashes
  2. The client identifies which commits it wants (remote refs it doesn't have)
  3. The client sends have lines - commits it already has
  4. The server uses the have list to find the common ancestor - the newest commit both sides share
  5. The server sends a packfile containing all objects reachable from the wanted commits but not from the common ancestors
CLIENT                          SERVER
                                refs: main=abc123, feature=def456
want abc123
want def456
have 789fed
have 456abc
                                ACK 456abc (common ancestor found)
                                <sends packfile>
done

The negotiation is optimized with multi-ack: the server can acknowledge multiple common ancestors, allowing it to send a more precisely targeted packfile.

Protocol v2

Git protocol v2 (default since Git 2.26) improves on v1 with:

  • Ref filtering - the server only sends refs the client asks about, not the entire ref list (huge improvement for repos with thousands of branches)
  • Server capabilities - structured capability negotiation
  • Stateless mode - better for HTTP-based transports
# Force protocol v2
git config --global protocol.version 2

# See protocol exchange
GIT_TRACE=1 GIT_TRACE_PACKET=1 git fetch origin 2>&1 | head -40

Watching the Protocol

You can observe Git's transfer protocol in action using trace environment variables:

# General trace output
GIT_TRACE=1 git fetch origin

# Packet-level protocol exchange
GIT_TRACE_PACKET=1 git fetch origin

# HTTP request/response details
GIT_CURL_VERBOSE=1 git fetch origin

# Performance timing for each operation
GIT_TRACE_PERFORMANCE=1 git fetch origin

# Combine multiple traces
GIT_TRACE=1 GIT_TRACE_PACKET=1 git fetch origin 2>&1 | less

Clone Strategies for Large Repositories

Not every clone needs the entire history. Git provides several strategies for faster clones of large repositories.

Shallow Clone

Downloads only recent history, not the full commit chain:

# Only the most recent commit
git clone --depth 1 https://github.com/user/large-repo.git

# Last 10 commits
git clone --depth 10 https://github.com/user/large-repo.git

# Deepen a shallow clone later
git fetch --deepen=50

# Convert shallow to full clone
git fetch --unshallow

Shallow clones are fast but limited. Some operations (like git log and git blame) only show the shallow history. You can't push from a shallow clone in some configurations.

Partial Clone

Downloads commits and trees but skips large blobs until you actually need them (Git 2.22+):

# Skip all blobs (download on demand when you checkout)
git clone --filter=blob:none https://github.com/user/large-repo.git

# Skip blobs larger than 1MB
git clone --filter=blob:limit=1m https://github.com/user/large-repo.git

# Skip all trees (extremely minimal, mainly for CI)
git clone --filter=tree:0 https://github.com/user/large-repo.git

Partial clones maintain full history (all commits) but defer downloading file content until checkout or diff. Git fetches missing blobs transparently when needed. This is ideal for large repositories where you don't need every file's content upfront.

Single-Branch Clone

Only downloads one branch:

git clone --single-branch https://github.com/user/repo.git
git clone --single-branch --branch develop https://github.com/user/repo.git

Sparse Checkout

After cloning, check out only a subset of files:

git clone --filter=blob:none --sparse https://github.com/user/monorepo.git
cd monorepo
git sparse-checkout set src/my-service tests/my-service

This combination (partial clone + sparse checkout) is the fastest way to work with a large monorepo when you only need a few directories.


Git Bundle

git bundle creates a file containing Git objects and refs - a portable repository snapshot that can be transferred offline (USB drive, email, sneakernet). Useful when network access to a remote isn't available.

Creating a Bundle

# Bundle the entire repository
git bundle create repo.bundle --all

# Bundle a specific branch
git bundle create feature.bundle main

# Bundle only new commits since a known point
git bundle create update.bundle main ^v1.0

Using a Bundle

# Verify a bundle file
git bundle verify repo.bundle

# Clone from a bundle
git clone repo.bundle my-repo

# Fetch from a bundle into an existing repo
git fetch repo.bundle main:refs/remotes/bundle/main

git archive - Export Without .git

git archive creates a tar or zip archive of a tree without the .git directory - useful for creating release tarballs:

# Create a tar.gz of the current HEAD
git archive --format=tar.gz --prefix=project-v1.0/ HEAD > project-v1.0.tar.gz

# Create a zip of a specific tag
git archive --format=zip v1.0 > project-v1.0.zip

# Archive a specific directory
git archive HEAD src/ > src-only.tar

The --prefix option adds a directory prefix so the archive extracts into a named directory rather than the current directory.


Exercise


Further Reading


Previous: Refs, the Reflog, and the DAG | Next: Collaboration Workflows | Back to Index

Comments