Skip to content

Finding Files

The find command searches directory trees for files matching specified criteria. Combined with xargs, it forms a powerful pattern for batch operations on files.


find

Basic Usage

find /path/to/search -name "*.txt"     # find by filename pattern
find . -type f                          # all regular files
find . -type d                          # all directories
find /var/log -name "*.log"             # absolute path search

The general form is find [path] [expression]. If you omit the path, find searches the current directory.

Tests

By name:

find . -name "*.conf"          # case-sensitive name match
find . -iname "readme*"        # case-insensitive name match
find . -path "*/src/*.js"      # match against the full path

Use -iname for case-insensitive matching

File naming conventions vary across projects and platforms. Use -iname instead of -name when you're unsure about capitalization: find . -iname 'readme*' matches README.md, Readme.txt, and readme.rst all at once.

By type:

Flag Type
-type f Regular file
-type d Directory
-type l Symbolic link
-type b Block device
-type c Character device
-type p Named pipe (FIFO)
-type s Socket

In practice, you'll use -type f (regular files) and -type d (directories) constantly, and -type l (symlinks) occasionally. The others are rare: -type b (block devices) for finding disk devices in /dev, -type c (character devices) for things like terminal devices and /dev/null, -type p (named pipes) when debugging inter-process communication, and -type s (sockets) when tracking down Unix domain sockets used by services like MySQL or Docker.

By size:

find . -size +10M              # larger than 10 megabytes
find . -size -1k               # smaller than 1 kilobyte
find . -size 100c              # exactly 100 bytes

Size suffixes: c (bytes), k (kilobytes), M (megabytes), G (gigabytes). Without a suffix, the unit is 512-byte blocks.

By time:

-mtime counts in 24-hour periods, not calendar days

find -mtime +7 means "more than 7 full 24-hour periods ago," not "more than 7 calendar days." A file modified 7.5 days ago has an mtime of 7 (truncated), so +7 won't match it - you'd need +6. For minute-level precision, use -mmin instead.

Timestamps are measured in 24-hour periods. +7 means "more than 7 days ago", -1 means "within the last day", and 7 (no sign) means "between exactly 7 and 8 days ago."

find . -mtime -7               # modified within the last 7 days
find . -mtime +30              # modified more than 30 days ago
find . -atime -1               # accessed within the last day
find . -ctime +90              # metadata changed more than 90 days ago
find . -newer reference.txt    # modified more recently than reference.txt

For minute-level precision, use -mmin, -amin, -cmin:

find . -mmin -60               # modified within the last 60 minutes

By permissions:

find . -perm 644               # exactly 644
find . -perm -644              # at least these permissions (all specified bits set)
find . -perm /111              # any execute bit set (user, group, or other)
find . -perm -u+x              # user execute bit set

The three -perm modes correspond to different questions. -perm 644 (exact) asks 'are the permissions exactly 644?' - nothing more, nothing less. -perm -644 (dash prefix, all-bits) asks 'are at least these bits set?' - the file could have more permissions than specified. -perm /111 (slash prefix, any-bit) asks 'is any of these bits set?' - useful for finding anything executable. Think of exact as '=', dash as 'includes all of', and slash as 'includes any of'.

By owner:

find . -user root              # owned by root
find . -group www-data         # group is www-data
find . -nouser                 # files with no matching user in /etc/passwd
find . -nogroup                # files with no matching group

Depth Control

find . -maxdepth 1             # current directory only (no recursion)
find . -maxdepth 2             # at most 2 levels deep
find . -mindepth 1             # skip the starting directory itself
find . -mindepth 2 -maxdepth 3 # between 2 and 3 levels deep

Use -maxdepth 1 for current directory only

find . -maxdepth 1 -type f lists files in the current directory without recursing into subdirectories. This is often more reliable than ls parsing for scripts, and you can combine it with other find tests like -name or -mtime.

Logical Operators

find expression evaluation chain showing how tests are evaluated left to right with short-circuit logic
find . -name "*.txt" -and -size +1M    # both conditions (-and is implicit)
find . -name "*.txt" -or -name "*.md"  # either condition
find . ! -name "*.tmp"                 # negation
find . \( -name "*.txt" -or -name "*.md" \) -and -mtime -7  # grouping

Note that -and is the default operator between tests. When you write find . -name '*.txt' -size +1M, the -and is implicit - both conditions must be true. You only need to write -and explicitly for readability, or when combining it with -or and grouping.

Actions

-exec (per file):

find . -name "*.tmp" -exec rm {} \;

The {} is replaced with each filename. The \; marks the end of the command. One rm process runs per file.

-exec (batching):

find . -name "*.tmp" -exec rm {} +

The + passes as many filenames as possible to a single command invocation. This is much faster when operating on many files.

-exec {} + doesn't work with commands needing a single filename

The + terminator batches multiple filenames into one command invocation. This means the command sees all files as arguments at once. Commands like mv that expect a specific filename argument structure need \; (one invocation per file) or xargs -I {} for placeholder substitution.

The performance difference between \; and + is significant. With \;, find spawns a new process for every single file. If you're operating on 1000 files, that's 1000 separate rm processes. With +, find passes as many filenames as will fit on one command line, so 1000 files might be handled in a single rm invocation. The limit on how many arguments + can batch is determined by ARG_MAX (the kernel's maximum argument length, typically 2MB on modern Linux). You can check it with getconf ARG_MAX. For very large file sets, + will make multiple invocations as needed to stay within this limit.

-delete:

find . -name "*.tmp" -delete

Built-in deletion - faster than -exec rm. Note that -delete implies -depth (processes files before their parent directories).

-delete implies -depth processing order

When you use -delete, find automatically enables -depth mode, processing directory contents before the directory itself. This changes the order that other expressions see files. If your command combines -delete with -prune, they will conflict - -prune needs breadth-first traversal, but -delete forces depth-first.

-print0:

find . -name "*.txt" -print0

Separates results with null characters instead of newlines. This handles filenames containing spaces, newlines, or other special characters. Pair with xargs -0.

Always pair -print0 with xargs -0

The null byte is the only character that cannot appear in a Unix filename. Using -print0 | xargs -0 is the only fully safe way to pass filenames between commands. Regular newline-delimited output breaks on the (legal) filename my\nfile.txt.

Practical Examples

# Delete files older than 30 days
find /tmp -type f -mtime +30 -delete

# Find large files
find / -type f -size +100M 2>/dev/null

# Find and chmod directories
find . -type d -exec chmod 755 {} +

# Find empty files and directories
find . -empty

# Find zero-byte files only
find . -type f -empty

# Find broken symlinks
find . -xtype l

# Find files modified today
find . -daystart -mtime -1

# Find setuid programs
find / -perm -4000 -type f 2>/dev/null

xargs

xargs reads items from STDIN and passes them as arguments to a command. It bridges the gap between commands that produce output (like find) and commands that expect arguments.

Basic Usage

echo "file1 file2 file3" | xargs rm
# equivalent to: rm file1 file2 file3

Safe Filename Handling

The default xargs splits on whitespace, which breaks on filenames with spaces. Use null-delimited input:

find . -name "*.txt" -print0 | xargs -0 rm
find . -name "*.log" -print0 | xargs -0 grep "error"

The -print0 / -0 pair is the standard pattern for safely processing arbitrary filenames.

Placeholder Substitution

Use -I {} to control where arguments are placed:

find . -name "*.bak" | xargs -I {} mv {} /tmp/backups/

With -I, xargs runs the command once per input line (not batched).

Argument Batching

Control how many arguments are passed at once:

echo "1 2 3 4 5 6" | xargs -n 2 echo
# echo 1 2
# echo 3 4
# echo 5 6

Parallel Execution

Use -P to run multiple processes in parallel:

find . -name "*.png" -print0 | xargs -0 -P 4 -I {} convert {} -resize 50% {}

This runs up to 4 convert processes at a time.

xargs -P with output-producing commands causes interleaved output

When using xargs -P for parallel execution, output from concurrent processes can mix together unpredictably. This is safe for silent operations like gzip or chmod, but produces garbled results for commands like grep or wc. For parallel output, consider GNU parallel which buffers output per job.

A few things to be aware of with parallel xargs. First, output interleaving: when multiple processes write to the terminal simultaneously, their output lines can mix together. This is fine for operations that don't produce output (like gzip or chmod), but problematic for commands that do. Second, choosing a -P value: a good starting point is the number of CPU cores (nproc), but for I/O-bound tasks you can often go higher. Third, safety: parallel execution is safe when each invocation operates on independent files. It's risky when operations have side effects that interact - for example, parallel appends to the same log file will produce garbled output.

Confirmation

Use -p to prompt before each execution:

find . -name "*.tmp" | xargs -p rm
# rm file1.tmp file2.tmp?...y

Practical Examples

# Delete files found by grep
grep -rl "deprecated" src/ | xargs rm

# Count lines in all Python files
find . -name "*.py" -print0 | xargs -0 wc -l

# Compress files in parallel
find . -name "*.log" -print0 | xargs -0 -P 4 gzip

# Search for a pattern in files found by find
find . -name "*.conf" -print0 | xargs -0 grep -l "listen"

# Rename files
ls *.jpeg | xargs -I {} bash -c 'mv "$1" "${1%.jpeg}.jpg"' _ {}

find -exec vs xargs

Both can run commands on found files. The differences:

Feature find -exec {} \; find -exec {} + find | xargs
Process per file Yes No (batched) No (batched)
Handles special filenames Yes Yes Only with -print0 \| xargs -0
Parallel execution No No Yes (-P)
Speed Slowest Fast Fast

For most tasks, find -exec {} + is the simplest safe option. Use xargs when you need parallel execution or more control over argument handling.


Further Reading


Previous: Regular Expressions | Next: File Permissions | Back to Index

Comments