Skip to content

MongoDB

MongoDB is the most widely adopted document database. Instead of rows and columns, you work with flexible JSON-like documents that map naturally to objects in your application code. No rigid schema, no required joins, no impedance mismatch between your data layer and your programming language.

This guide covers the document model, CRUD operations, the aggregation pipeline, indexing, replica sets, sharding, and the mongosh shell.


The Document Model

MongoDB stores data as BSON (Binary JSON) documents - a binary-encoded superset of JSON that adds data types JSON lacks. A document is a set of key-value pairs, analogous to a row in a relational table but far more flexible.

A Simple Document

{
  "_id": ObjectId("65a1b2c3d4e5f6a7b8c9d0e1"),
  "name": "Jane Chen",
  "email": "jane@example.com",
  "age": 34,
  "roles": ["admin", "developer"],
  "address": {
    "street": "742 Evergreen Terrace",
    "city": "Portland",
    "state": "OR",
    "zip": "97201"
  },
  "created_at": ISODate("2024-01-15T09:30:00Z"),
  "active": true
}

Arrays, nested sub-documents, and typed fields beyond what plain JSON supports - all in a single document.

BSON Types

Type Example Notes
String "hello" UTF-8 encoded
Int32 NumberInt(42) 32-bit signed integer
Int64 NumberLong(9007199254740993) 64-bit signed integer
Double 3.14159 64-bit IEEE 754 floating point
Decimal128 NumberDecimal("19.99") 128-bit decimal - use for currency
Boolean true / false
Date ISODate("2024-01-15T09:30:00Z") UTC datetime, millisecond precision
ObjectId ObjectId("65a1...") 12-byte unique identifier
Array ["a", "b", "c"] Ordered list of any BSON types
Embedded Document { "key": "value" } Nested document
Binary BinData(0, "base64...") Arbitrary binary data
Null null Explicit null value

Decimal128 for money

Never use Double for financial data. Floating-point arithmetic produces rounding errors (0.1 + 0.2 = 0.30000000000000004). Decimal128 provides exact decimal representation - use it for prices, balances, and any value where precision matters.

The _id Field

Every document must have an _id field that acts as the primary key. If you don't provide one, MongoDB generates an ObjectId automatically - a 12-byte value containing a timestamp (4 bytes), random value (5 bytes), and incrementing counter (3 bytes). ObjectIds are roughly time-ordered and globally unique without coordination.

A single BSON document cannot exceed 16 MB. If your data model approaches this limit, restructure it into separate documents or use GridFS for large files.


The mongosh Shell

mongosh is MongoDB's modern command-line shell - a JavaScript REPL with syntax highlighting, auto-completion, and full MongoDB API access.

Connecting

# Connect to localhost on default port 27017
mongosh

# Connect to a specific host and database
mongosh "mongodb://192.168.1.50:27017/myapp"

# Connect to a replica set
mongosh "mongodb://host1:27017,host2:27017,host3:27017/myapp?replicaSet=rs0"

# Connect with authentication
mongosh "mongodb://admin:password@localhost:27017/admin"

Essential Commands

show dbs              // List databases
use myapp             // Switch database (creates on first write)
show collections      // List collections in current database
db.users.stats()      // Collection statistics
db.users.help()       // Method help

Customizing with .mongoshrc.js

Create ~/.mongoshrc.js to add shell aliases and custom prompts:

// ~/.mongoshrc.js
const last = (coll, n = 5) => db[coll].find().sort({ _id: -1 }).limit(n);
const count = (coll, query = {}) => db[coll].countDocuments(query);

CRUD Operations

All operations target a single collection (the rough equivalent of a relational table).

Create: Inserting Documents

// Insert a single document
db.users.insertOne({ name: "Alice Rivera", email: "alice@example.com", age: 28, department: "Engineering" })

// Insert multiple documents
db.users.insertMany([
  { name: "Bob Park", email: "bob@example.com", age: 35, department: "Marketing" },
  { name: "Carol Okafor", email: "carol@example.com", age: 42, department: "Engineering" },
  { name: "Dave Singh", email: "dave@example.com", age: 31, department: "Sales" }
])

Both methods generate _id values automatically if omitted.

Read: Finding Documents

Pass a filter document to match records and an optional projection to control which fields come back.

db.users.find()                                             // all documents
db.users.findOne({ email: "alice@example.com" })            // single match
db.users.find({ department: "Engineering" })                // equality filter
db.users.find({ department: "Engineering" }, { name: 1, email: 1, _id: 0 })  // with projection

Query Operators

Query operators prefixed with $ handle comparisons beyond simple equality:

// Comparison
db.users.find({ age: { $gt: 30 } })                        // greater than
db.users.find({ age: { $gte: 28, $lte: 42 } })             // range

// Membership
db.users.find({ department: { $in: ["Engineering", "Sales"] } })

// Logical
db.users.find({ $and: [{ age: { $gt: 25 } }, { department: "Engineering" }] })
db.users.find({ $or: [{ department: "Engineering" }, { department: "Sales" }] })

// Pattern matching and existence
db.users.find({ name: { $regex: /^A/i } })                 // names starting with A
db.users.find({ phone: { $exists: true } })                 // has a phone field

Implicit $and

When you specify multiple conditions in the same filter document, MongoDB treats them as an implicit $and. Writing { age: { $gt: 25 }, department: "Engineering" } is equivalent to using $and explicitly. You only need $and when you have multiple conditions on the same field.

Sorting, Limiting, and Skipping

db.users.find().sort({ age: -1, name: 1 })     // sort by age desc, name asc
db.users.find().limit(10)                       // limit results
db.users.find().skip(20).limit(10)              // pagination (page 3, 10 per page)

Update: Modifying Documents

Update operators modify specific fields without replacing the entire document:

// Set or change fields
db.users.updateOne({ email: "alice@example.com" }, { $set: { age: 29, title: "Senior Engineer" } })

// Remove a field entirely
db.users.updateOne({ email: "bob@example.com" }, { $unset: { phone: "" } })

// Increment a numeric field
db.orders.updateOne({ _id: orderId }, { $inc: { quantity: 1 } })

// Add to / remove from an array
db.users.updateOne({ email: "alice@example.com" }, { $push: { roles: "team-lead" } })
db.users.updateOne({ email: "alice@example.com" }, { $pull: { roles: "junior" } })

// Update multiple documents
db.users.updateMany({ department: "Engineering" }, { $set: { building: "HQ-3" } })

Don't forget the operator

If you pass a plain document as the second argument to updateOne without $set, MongoDB replaces the entire document (except _id) with that object. This is almost never what you want. Always use update operators like $set, $inc, $push, etc.

Delete: Removing Documents

db.users.deleteOne({ email: "dave@example.com" })    // Delete one match
db.sessions.deleteMany({ expired: true })             // Delete all matches
db.temp_data.deleteMany({})                           // Delete everything (careful!)

The Aggregation Pipeline

The aggregation pipeline processes documents through a sequence of stages, each transforming data before passing it to the next - like a Unix pipeline for your database.

db.collection.aggregate([
  { $stage1: { ... } },
  { $stage2: { ... } },
  { $stage3: { ... } }
])

Pipeline Stages

$match - Filter Documents

Works like find but as a pipeline stage. Place $match as early as possible to reduce documents flowing through later stages.

{ $match: { status: "active", age: { $gte: 18 } } }

$group - Aggregate Values

Groups documents by a key and applies accumulator operators ($sum, $avg, $min, $max, $first, $last, $push, $addToSet):

{ $group: { _id: "$department", avgAge: { $avg: "$age" }, total: { $sum: 1 } } }

$sort, $project, $limit, $skip

{ $sort: { total: -1 } }          // order results
{ $project: { name: 1, email: 1, fullName: { $concat: ["$first", " ", "$last"] } } }  // reshape
{ $limit: 10 }                     // cap output
{ $skip: 20 }                      // offset

$lookup - Join Collections

Performs a left outer join against another collection:

{ $lookup: { from: "orders", localField: "_id", foreignField: "customer_id", as: "customer_orders" } }

$unwind - Flatten Arrays

Deconstructs an array field, outputting one document per element:

{ $unwind: "$customer_orders" }

Practical Example: Revenue by Category

db.orders.aggregate([
  { $match: { status: "completed" } },
  { $group: { _id: "$category", totalRevenue: { $sum: "$amount" }, orderCount: { $sum: 1 } } },
  { $sort: { totalRevenue: -1 } },
  { $limit: 5 },
  { $project: { category: "$_id", totalRevenue: { $round: ["$totalRevenue", 2] }, orderCount: 1, _id: 0 } }
])

Indexes

Without indexes, MongoDB performs a collection scan - reading every document. Indexes are B-tree structures that let MongoDB locate documents without examining the entire collection.

Single Field Indexes

db.users.createIndex({ email: 1 })          // ascending
db.users.createIndex({ created_at: -1 })    // descending (matters for sorted queries)

Compound Indexes

A compound index covers multiple fields. Field order matters - MongoDB uses the index for prefix queries (left to right) but not for queries that skip leading fields.

// Supports queries on: department alone, department+age, department+age+name
// Does NOT efficiently support queries on age alone or name alone
db.users.createIndex({ department: 1, age: -1, name: 1 })

Unique Indexes

db.users.createIndex({ email: 1 }, { unique: true })  // duplicate values throw an error

Text Indexes

Text indexes support full-text search across string content. A collection can have at most one text index, but it can cover multiple fields:

db.articles.createIndex({ title: "text", body: "text" })
db.articles.find(
  { $text: { $search: "mongodb aggregation" } },
  { score: { $meta: "textScore" } }
).sort({ score: { $meta: "textScore" } })

Geospatial Indexes

2dsphere indexes support queries on GeoJSON data - finding documents near a point, within a polygon, or intersecting a geometry:

// Create a 2dsphere index
db.restaurants.createIndex({ location: "2dsphere" })

// Find restaurants within 2km of a point
db.restaurants.find({
  location: {
    $near: {
      $geometry: { type: "Point", coordinates: [-122.6750, 45.5120] },
      $maxDistance: 2000  // meters
    }
  }
})

TTL Indexes

TTL (Time to Live) indexes automatically delete documents after a specified duration - ideal for sessions, logs, or temporary records:

db.sessions.createIndex({ createdAt: 1 }, { expireAfterSeconds: 86400 })  // expire after 24h

Query Planning with explain()

Use explain() to understand query execution. Key fields in the output:

Field Meaning
winningPlan.stage IXSCAN (index used) vs COLLSCAN (full scan)
totalKeysExamined Number of index entries scanned
totalDocsExamined Number of documents loaded
nReturned Number of documents returned
executionTimeMillis Total execution time
db.users.find({ department: "Engineering", age: { $gt: 30 } }).explain("executionStats")

The goal: get totalDocsExamined as close to nReturned as possible.


Replica Sets

A replica set is a group of MongoDB instances maintaining the same data for redundancy and high availability. Always run MongoDB as a replica set in production - it gives you automatic failover and read scaling.

Architecture

A replica set consists of:

  • Primary: Receives all write operations. There is exactly one primary at any time.
  • Secondary: Replicates data from the primary. Can serve read operations if configured. You typically have two or more secondaries.
  • Arbiter: Participates in elections but holds no data. Used when you need an odd number of voting members but don't want to store a third full copy of the data.
graph TB
    C[Application] --> P[Primary]
    P -->|Replication| S1[Secondary 1]
    P -->|Replication| S2[Secondary 2]
    P -.->|Heartbeat| A[Arbiter]
    S1 -.->|Heartbeat| P
    S2 -.->|Heartbeat| P
    A -.->|Heartbeat| P
    S1 -.->|Heartbeat| S2
    S1 -.->|Heartbeat| A
    S2 -.->|Heartbeat| A

Elections

If the primary becomes unreachable, remaining members hold an election. A majority of voting members must agree - a 3-member set tolerates 1 failure, a 5-member set tolerates 2. A 2-member set cannot elect a new primary (no majority), which is why you always need at least 3 members. Elections typically complete within 10-12 seconds.

Read Preferences

Read preference controls which members receive read operations:

Mode Behavior Use Case
primary All reads go to the primary Default. Guaranteed latest data
primaryPreferred Reads from primary; falls back to secondary if primary unavailable Availability over consistency
secondary Reads go to secondaries only Offload analytics queries from primary
secondaryPreferred Reads from secondary; falls back to primary Analytics with fallback
nearest Reads from the member with lowest network latency Geographically distributed deployments
// In a connection string
"mongodb://host1,host2,host3/myapp?readPreference=secondaryPreferred"

Stale reads from secondaries

Secondaries replicate asynchronously, so they may lag behind the primary. Reading from a secondary can return data that's a few seconds (or, under heavy load, longer) behind. If your application requires reading the data it just wrote, use primary read preference for those queries.

Write Concern

Write concern controls acknowledgment requirements before MongoDB confirms a write:

Write Concern Behavior
w: 1 Acknowledged by the primary only (default)
w: "majority" Acknowledged by a majority of voting members
w: 0 No acknowledgment - fire and forget
db.orders.insertOne({ item: "widget", qty: 5 }, { writeConcern: { w: "majority", wtimeout: 5000 } })

Use w: "majority" for critical data. Use w: 1 for high-throughput, loss-tolerant workloads.


Sharding

Sharding distributes data across multiple machines. When a single replica set can't handle your data volume or throughput, you split data across shards - each shard being its own replica set. Config servers store metadata about data placement, and mongos routers route queries to the correct shard(s). Your application connects to mongos, not directly to shards.

Shard Key Selection

The shard key determines how documents are distributed. A good shard key has high cardinality (many distinct values), even distribution (no hotspots), and supports query targeting (queries including the shard key go to one shard instead of all).

Hashed vs Ranged Sharding

Strategy How it works Pros Cons
Ranged Documents with nearby key values go to the same shard Efficient range queries Hotspots if writes cluster at one end
Hashed A hash of the key determines the shard Even write distribution Range queries hit all shards
sh.enableSharding("myapp")
sh.shardCollection("myapp.orders", { customer_id: 1 })   // ranged
sh.shardCollection("myapp.events", { _id: "hashed" })    // hashed

Chunks and Balancing

MongoDB divides the shard key range into chunks (default 128 MB). The balancer migrates chunks between shards in the background to keep distribution even.


Command Builder


Putting It All Together


Further Reading


Previous: NoSQL Concepts & Architecture | Next: Redis | Back to Index

Comments