Skip to content

System Information

These tools help you understand what's running on a system - hardware details, resource usage, and what processes are doing. Essential for troubleshooting and capacity planning.


uname - Kernel and OS Info

uname prints system information about the kernel and OS.

uname              # kernel name (e.g., Linux)
uname -r           # kernel release (e.g., 5.15.0-91-generic)
uname -m           # machine architecture (e.g., x86_64, aarch64)
uname -n           # hostname
uname -a           # all of the above combined

For distribution-specific info:

cat /etc/os-release       # distro name, version, ID
lsb_release -a            # if lsb_release is installed

uptime and Load Averages

uptime shows how long the system has been running and current load averages.

uptime
# 14:32:07 up 45 days, 3:12, 2 users, load average: 0.52, 0.78, 0.91

The three load average numbers represent the average number of processes waiting to run over the last 1, 5, and 15 minutes.

How to interpret them: - On a single-CPU system, a load of 1.0 means the CPU is fully utilized. Above 1.0, processes are waiting. - On a 4-core system, a load of 4.0 means full utilization. Above 4.0, processes are queuing.

Divide load average by nproc to assess CPU pressure

A load average of 8.0 on a 2-core machine means severe overload, but on a 16-core machine it means the system is barely working. Always divide by the number of CPU cores (nproc) to get a meaningful ratio. Consistently above 1.0 per core indicates the system needs attention.

General rule: divide the load by the number of CPU cores. If the result is consistently above 1.0, the system is overloaded.

# Check number of cores
nproc

A rising 1-minute average with stable 15-minute average indicates a recent spike. A high 15-minute average means sustained load.


free - Memory Usage

free shows RAM and swap usage.

free -h            # human-readable
free -m            # megabytes

Example output:

              total        used        free      shared  buff/cache   available
Mem:          15Gi        6.2Gi       1.8Gi       312Mi        7.5Gi        8.7Gi
Swap:          4Gi        0.1Gi       3.9Gi

Key columns: - total - physical RAM installed - used - memory actively used by processes - free - completely unused memory - buff/cache - memory used for filesystem buffers and page cache - available - memory that can be used for new processes (free + reclaimable cache)

Low 'free' memory is normal - check 'available' instead

Linux intentionally uses free RAM for disk caching, so the free column in free -h is often near zero on a healthy system. The available column is what matters - it shows how much memory can actually be used by new applications, including reclaimable cache. Only worry if available is low.

The available column is what matters. Linux aggressively uses free memory for caching disk data. This cached memory is immediately available when a process needs it. A system with low "free" but high "available" is healthy.

If swap is heavily used, the system is running low on RAM and performance will suffer.


lscpu - CPU Information

lscpu displays detailed CPU architecture information.

lscpu

Key fields: - Architecture - x86_64, aarch64, etc. - CPU(s) - total logical CPUs (cores x threads) - Core(s) per socket - physical cores per CPU - Socket(s) - number of physical CPUs - Model name - CPU model - CPU MHz - current clock speed

Sockets x Cores x Threads = Total logical CPUs

The formula Sockets x Cores per socket x Threads per core gives the total logical CPU count shown by nproc. A 2-socket server with 8 cores per socket and 2 threads per core has 32 logical CPUs. This number is what you compare load averages against.

The formula for total logical CPUs is: Sockets x Cores per socket x Threads per core. For example, a machine with 2 sockets, 8 cores per socket, and 2 threads per core has 2 x 8 x 2 = 32 logical CPUs. The "threads per core" value is usually 1 (no hyperthreading) or 2 (hyperthreading/SMT enabled). Hyperthreading lets each physical core present itself as two logical CPUs by sharing execution resources. It helps with workloads that have a mix of CPU-bound and I/O-waiting threads, but doesn't double performance - expect 15-30% improvement at best for most workloads.

# Quick core count
nproc                      # number of processing units
nproc --all                # all installed (may differ if some are offline)

lsof - Open Files and Connections

lsof (list open files) shows which files, sockets, and pipes are in use by which processes. In Unix, everything is a file - including network connections.

lsof                             # all open files (very long output)
lsof -u ryan                     # files opened by a user
lsof -p 12345                    # files opened by a specific PID
lsof /var/log/syslog             # processes using a specific file
lsof +D /var/log                 # processes using files in a directory

Network connections:

lsof -i                          # all network connections
lsof -i :80                      # processes using port 80
lsof -i TCP                      # TCP connections only
lsof -i TCP:443                  # TCP connections on port 443
lsof -i @192.168.1.100           # connections to a specific host

Finding deleted files still held open:

lsof +L1                         # files with zero link count (deleted but open)

lsof +L1 finds deleted files still consuming disk space

When df shows a full disk but du can't account for all the space, deleted files still held open by running processes are often the cause. Run lsof +L1 to find them. The space is only freed when the process closes the file or is restarted.

This is useful when df shows a disk is full but du can't account for all the space.


vmstat - System Performance Snapshot

vmstat reports virtual memory, CPU, and I/O statistics.

vmstat's first line is an average since boot - ignore it

The first line of vmstat output shows averages since the system booted, not current values. Always use vmstat N (e.g., vmstat 1 5) to get real-time samples, and read from the second line onward for meaningful data.

vmstat                   # single snapshot
vmstat 5                 # update every 5 seconds
vmstat 5 10              # update every 5 seconds, 10 times
vmstat -S M              # show memory in megabytes

Example output:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0  10240 183264  45872 768432    0    0     5    12  125  230  8  2 89  1  0

Key columns: - r - processes waiting to run (high = CPU bottleneck) - b - processes in uninterruptible sleep (high = I/O bottleneck) - si/so - swap in/out (should be near zero) - bi/bo - blocks read/written to disk - us - user CPU time - sy - system CPU time - id - idle CPU time - wa - I/O wait time (high = disk bottleneck)

What values indicate problems:

  • wa > 10% - the CPU is spending significant time waiting for disk I/O. Investigate with iostat -x to find which disk is the bottleneck.
  • r consistently > number of CPUs - more processes want to run than you have CPUs. The system needs more CPU power or the workload needs optimization.
  • si/so > 0 - the system is actively swapping memory to/from disk. Even small amounts of swap activity cause noticeable slowdowns because disk is orders of magnitude slower than RAM. If you see sustained swapping, the system needs more RAM or a process is using too much.
  • b > 0 for extended periods - processes are blocked on I/O. Combined with high wa%, this points to a storage bottleneck.

/proc and /sys

/proc and /sys virtual filesystem hierarchy showing key files and directories

The /proc filesystem is a virtual filesystem that exposes kernel and process information as files.

/proc and /sys are virtual filesystems - they don't exist on disk. The kernel generates their contents on the fly when you read them. /proc exposes process information and kernel internals: every running process gets a directory at /proc/<PID>/, and files like /proc/cpuinfo and /proc/meminfo provide system-wide stats. /sys is organized around the kernel's internal object model - devices, drivers, buses, and kernel subsystems. The key difference: /proc is older and somewhat disorganized (it mixes process info with hardware info), while /sys follows a clean hierarchy. New kernel features expose their interfaces through /sys.

cat /proc/cpuinfo          # CPU details
cat /proc/meminfo          # detailed memory info
cat /proc/version          # kernel version
cat /proc/uptime           # uptime in seconds
cat /proc/loadavg          # load averages
cat /proc/mounts           # mounted filesystems

Process-specific info lives in /proc/<PID>/:

cat /proc/1234/cmdline     # command that started the process
cat /proc/1234/status      # process status (memory, state, etc.)
ls -l /proc/1234/fd        # see what files a process has open
cat /proc/1234/environ     # environment variables (null-separated)

The /sys filesystem exposes kernel objects - devices, drivers, and configuration:

cat /sys/class/net/eth0/speed          # network interface speed
cat /sys/block/sda/size                # disk size in sectors
cat /sys/class/thermal/thermal_zone0/temp    # CPU temperature

Both /proc and /sys are virtual - they don't take up disk space. They're generated on the fly by the kernel.


dmesg - Kernel Messages

dmesg displays kernel ring buffer messages - hardware detection, driver loading, errors, and warnings.

dmesg -T timestamps may not be accurate across suspend/resume

The -T flag converts kernel timestamps to wall-clock time using the system's boot time as a reference. If the system has been suspended and resumed, these timestamps drift because the kernel timer doesn't advance during suspend. For accurate timestamps on systems that sleep, cross-reference with journalctl.

dmesg                      # all kernel messages
dmesg -T                   # human-readable timestamps
dmesg -l err,warn          # only errors and warnings
dmesg | tail -20           # recent messages
dmesg -w                   # follow new messages in real time

Useful for: - Diagnosing hardware issues (disk errors, USB detection) - Checking boot messages - Investigating OOM (out of memory) kills - Spotting driver errors

# Check for disk errors
dmesg -T | grep -i "error\|fail\|i/o"

# See USB device detection
dmesg -T | grep -i usb

# Find OOM killer activity
dmesg -T | grep -i "oom\|killed process"

Reading dmesg output for real problems. Here's an example of spotting a disk error:

dmesg -T | grep -i 'error\|fail\|i/o'
# [Mon Jan 15 14:23:01 2024] ata1.00: failed command: READ FPDMA QUEUED
# [Mon Jan 15 14:23:01 2024] ata1.00: status: { DRDY ERR }
# [Mon Jan 15 14:23:01 2024] ata1.00: error: { UNC }

The UNC (uncorrectable) error means the disk has a bad sector it can't recover from. Repeated disk errors like this mean the drive is failing and should be replaced. Another common scenario:

dmesg -T | grep -i 'oom\|killed process'
# [Mon Jan 15 15:45:22 2024] Out of memory: Killed process 3421 (java) total-vm:4096000kB

OOM killer events in dmesg indicate critical memory pressure

If dmesg shows "Out of memory: Killed process," the system exhausted both RAM and swap and the kernel's OOM killer chose a process to terminate. This is a last-resort measure that indicates the system needs more RAM, a larger swap file, or investigation into which process is consuming excessive memory.

The OOM (Out of Memory) killer activates when the system runs out of RAM and swap. It picks the process using the most memory and kills it. If you see this, you either need more RAM, a swap file, or to investigate why that process consumed so much memory.


Putting It Together

System performance troubleshooting flowchart from uptime through vmstat, free, and df

When troubleshooting a slow or unresponsive system, check in this order:

# 1. What's the load? Is the system overloaded?
uptime

# 2. Is it CPU, memory, or I/O?
vmstat 1 5

# 3. Memory pressure?
free -h

# 4. What processes are consuming resources?
top    # or htop

# 5. Disk I/O issues?
iostat -x 1 5    # if sysstat is installed

# 6. Disk space?
df -h

# 7. Any kernel errors?
dmesg -T | tail -30

# 8. What's a specific process doing?
lsof -p <PID>
strace -p <PID>   # system calls (careful - high overhead)

Further Reading


Previous: Networking | Next: Archiving and Compression | Back to Index

Comments