System Information¶

These tools help you understand what's running on a system - hardware details, resource usage, and what processes are doing. Essential for troubleshooting and capacity planning.

uname - Kernel and OS Info¶

uname prints system information about the kernel and OS.

uname              # kernel name (e.g., Linux)
uname -r           # kernel release (e.g., 5.15.0-91-generic)
uname -m           # machine architecture (e.g., x86_64, aarch64)
uname -n           # hostname
uname -a           # all of the above combined

For distribution-specific info:

cat /etc/os-release       # distro name, version, ID
lsb_release -a            # if lsb_release is installed

uptime and Load Averages¶

uptime shows how long the system has been running and current load averages.

uptime
# 14:32:07 up 45 days, 3:12, 2 users, load average: 0.52, 0.78, 0.91

The three load average numbers represent the average number of processes waiting to run over the last 1, 5, and 15 minutes.

How to interpret them: - On a single-CPU system, a load of 1.0 means the CPU is fully utilized. Above 1.0, processes are waiting. - On a 4-core system, a load of 4.0 means full utilization. Above 4.0, processes are queuing.

General rule: divide the load by the number of CPU cores. If the result is consistently above 1.0, the system is overloaded.

# Check number of cores
nproc

A rising 1-minute average with stable 15-minute average indicates a recent spike. A high 15-minute average means sustained load.

free - Memory Usage¶

free shows RAM and swap usage.

free -h            # human-readable
free -m            # megabytes

Example output:

              total        used        free      shared  buff/cache   available
Mem:          15Gi        6.2Gi       1.8Gi       312Mi        7.5Gi        8.7Gi
Swap:          4Gi        0.1Gi       3.9Gi

Key columns: - total - physical RAM installed - used - memory actively used by processes - free - completely unused memory - buff/cache - memory used for filesystem buffers and page cache - available - memory that can be used for new processes (free + reclaimable cache)

The available column is what matters. Linux aggressively uses free memory for caching disk data. This cached memory is immediately available when a process needs it. A system with low "free" but high "available" is healthy.

If swap is heavily used, the system is running low on RAM and performance will suffer.

lscpu - CPU Information¶

lscpu displays detailed CPU architecture information.

lscpu

Key fields: - Architecture - x86_64, aarch64, etc. - CPU(s) - total logical CPUs (cores x threads) - Core(s) per socket - physical cores per CPU - Socket(s) - number of physical CPUs - Model name - CPU model - CPU MHz - current clock speed

The formula for total logical CPUs is: Sockets x Cores per socket x Threads per core. For example, a machine with 2 sockets, 8 cores per socket, and 2 threads per core has 2 x 8 x 2 = 32 logical CPUs. The "threads per core" value is usually 1 (no hyperthreading) or 2 (hyperthreading/SMT enabled). Hyperthreading lets each physical core present itself as two logical CPUs by sharing execution resources. It helps with workloads that have a mix of CPU-bound and I/O-waiting threads, but doesn't double performance - expect 15-30% improvement at best for most workloads.

# Quick core count
nproc                      # number of processing units
nproc --all                # all installed (may differ if some are offline)

lsof - Open Files and Connections¶

lsof (list open files) shows which files, sockets, and pipes are in use by which processes. In Unix, everything is a file - including network connections.

lsof                             # all open files (very long output)
lsof -u ryan                     # files opened by a user
lsof -p 12345                    # files opened by a specific PID
lsof /var/log/syslog             # processes using a specific file
lsof +D /var/log                 # processes using files in a directory

Network connections:

lsof -i                          # all network connections
lsof -i :80                      # processes using port 80
lsof -i TCP                      # TCP connections only
lsof -i TCP:443                  # TCP connections on port 443
lsof -i @192.168.1.100           # connections to a specific host

Finding deleted files still held open:

lsof +L1                         # files with zero link count (deleted but open)

This is useful when df shows a disk is full but du can't account for all the space.

vmstat - System Performance Snapshot¶

vmstat reports virtual memory, CPU, and I/O statistics.

vmstat                   # single snapshot
vmstat 5                 # update every 5 seconds
vmstat 5 10              # update every 5 seconds, 10 times
vmstat -S M              # show memory in megabytes

Example output:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0  10240 183264  45872 768432    0    0     5    12  125  230  8  2 89  1  0

Key columns: - r - processes waiting to run (high = CPU bottleneck) - b - processes in uninterruptible sleep (high = I/O bottleneck) - si/so - swap in/out (should be near zero) - bi/bo - blocks read/written to disk - us - user CPU time - sy - system CPU time - id - idle CPU time - wa - I/O wait time (high = disk bottleneck)

What values indicate problems:

wa > 10% - the CPU is spending significant time waiting for disk I/O. Investigate with iostat -x to find which disk is the bottleneck.
r consistently > number of CPUs - more processes want to run than you have CPUs. The system needs more CPU power or the workload needs optimization.
si/so > 0 - the system is actively swapping memory to/from disk. Even small amounts of swap activity cause noticeable slowdowns because disk is orders of magnitude slower than RAM. If you see sustained swapping, the system needs more RAM or a process is using too much.
b > 0 for extended periods - processes are blocked on I/O. Combined with high wa%, this points to a storage bottleneck.

/proc and /sys¶

The /proc filesystem is a virtual filesystem that exposes kernel and process information as files.

/proc and /sys are virtual filesystems - they don't exist on disk. The kernel generates their contents on the fly when you read them. /proc exposes process information and kernel internals: every running process gets a directory at /proc/<PID>/, and files like /proc/cpuinfo and /proc/meminfo provide system-wide stats. /sys is organized around the kernel's internal object model - devices, drivers, buses, and kernel subsystems. The key difference: /proc is older and somewhat disorganized (it mixes process info with hardware info), while /sys follows a clean hierarchy. New kernel features expose their interfaces through /sys.

cat /proc/cpuinfo          # CPU details
cat /proc/meminfo          # detailed memory info
cat /proc/version          # kernel version
cat /proc/uptime           # uptime in seconds
cat /proc/loadavg          # load averages
cat /proc/mounts           # mounted filesystems

Process-specific info lives in /proc/<PID>/:

cat /proc/1234/cmdline     # command that started the process
cat /proc/1234/status      # process status (memory, state, etc.)
ls -l /proc/1234/fd        # see what files a process has open
cat /proc/1234/environ     # environment variables (null-separated)

The /sys filesystem exposes kernel objects - devices, drivers, and configuration:

cat /sys/class/net/eth0/speed          # network interface speed
cat /sys/block/sda/size                # disk size in sectors
cat /sys/class/thermal/thermal_zone0/temp    # CPU temperature

Both /proc and /sys are virtual - they don't take up disk space. They're generated on the fly by the kernel.

dmesg - Kernel Messages¶

dmesg displays kernel ring buffer messages - hardware detection, driver loading, errors, and warnings.

dmesg                      # all kernel messages
dmesg -T                   # human-readable timestamps
dmesg -l err,warn          # only errors and warnings
dmesg | tail -20           # recent messages
dmesg -w                   # follow new messages in real time

Useful for: - Diagnosing hardware issues (disk errors, USB detection) - Checking boot messages - Investigating OOM (out of memory) kills - Spotting driver errors

# Check for disk errors
dmesg -T | grep -i "error\|fail\|i/o"

# See USB device detection
dmesg -T | grep -i usb

# Find OOM killer activity
dmesg -T | grep -i "oom\|killed process"

Reading dmesg output for real problems. Here's an example of spotting a disk error:

dmesg -T | grep -i 'error\|fail\|i/o'
# [Mon Jan 15 14:23:01 2024] ata1.00: failed command: READ FPDMA QUEUED
# [Mon Jan 15 14:23:01 2024] ata1.00: status: { DRDY ERR }
# [Mon Jan 15 14:23:01 2024] ata1.00: error: { UNC }

The UNC (uncorrectable) error means the disk has a bad sector it can't recover from. Repeated disk errors like this mean the drive is failing and should be replaced. Another common scenario:

dmesg -T | grep -i 'oom\|killed process'
# [Mon Jan 15 15:45:22 2024] Out of memory: Killed process 3421 (java) total-vm:4096000kB

The OOM (Out of Memory) killer activates when the system runs out of RAM and swap. It picks the process using the most memory and kills it. If you see this, you either need more RAM, a swap file, or to investigate why that process consumed so much memory.

Putting It Together¶

When troubleshooting a slow or unresponsive system, check in this order:

# 1. What's the load? Is the system overloaded?
uptime

# 2. Is it CPU, memory, or I/O?
vmstat 1 5

# 3. Memory pressure?
free -h

# 4. What processes are consuming resources?
top    # or htop

# 5. Disk I/O issues?
iostat -x 1 5    # if sysstat is installed

# 6. Disk space?
df -h

# 7. Any kernel errors?
dmesg -T | tail -30

# 8. What's a specific process doing?
lsof -p <PID>
strace -p <PID>   # system calls (careful - high overhead)