Text Processing and One-Liners¶

Perl on the Command Line¶

Version: 1.1 Year: 2026

Copyright Notice¶

Copyright (c) 2025-2026 Ryan Thomas Robson / Robworks Software LLC. Licensed under CC BY-NC-ND 4.0. You may share this material for non-commercial purposes with attribution, but you may not distribute modified versions.

Perl was built for text processing. Before it was a web language, before CPAN existed, Perl was a tool for extracting information from files and generating reports. That heritage lives on in the command-line flags that turn Perl into a filter, a transformer, and a replacement for entire sed/awk pipelines - all in a single line. Every one-liner maps to a real script, and every script can be compressed into a one-liner. Understanding both directions makes you faster at solving text problems.

Command-Line Flags¶

The perl interpreter accepts flags that change how it reads input and executes code. These flags are the foundation of one-liner programming.

-e: Execute Code¶

The -e flag runs a string of Perl code directly from the command line:

perl -e 'print "Hello, World!\n"'

-n: Implicit Read Loop¶

The -n flag wraps your code in a while (<>) { ... } loop. Each line from STDIN or named files is read into $_ automatically:

# Print lines containing "error" - equivalent to grep
perl -ne 'print if /error/' access.log

The implicit loop reads every line. Your code runs once per line. If your code does not print anything, nothing is output - you are a filter that discards by default.

-p: Read Loop with Print¶

The -p flag works like -n but automatically prints $_ after each iteration. Your code transforms lines; the flag handles the output:

# Replace "foo" with "bar" on every line - equivalent to sed
perl -pe 's/foo/bar/g' data.txt

The difference between -n and -p is simple: -n requires you to print explicitly, -p prints for you.

-l: Automatic Line Handling¶

The -l flag strips the trailing newline from each input line and adds a newline to each print output:

perl -ne 'print length($_)' file.txt    # without -l: includes \n in count
perl -lne 'print length($_)' file.txt   # with -l: accurate character count

-a and -F: Autosplit Mode¶

The -a flag splits each input line on whitespace into the @F array (zero-indexed). The -F flag changes the split delimiter:

perl -lane 'print $F[1]' data.txt                  # split on whitespace
perl -F: -lane 'print "$F[0] -> $F[5]"' /etc/passwd  # split on colons
perl -F, -lane 'print $F[2]' data.csv                # split on commas

-i: In-Place Editing¶

The -i flag edits files in place. With -i.bak, Perl saves the original as a backup before writing the modified version:

perl -i.bak -pe 's/old/new/g' config.txt   # backup to config.txt.bak
perl -i -pe 's/old/new/g' config.txt        # no backup (dangerous)

Always Use a Backup Extension

Running -i without a backup extension is irreversible. Always use -i.bak until you have verified the transformation is correct.

Flag Pipeline¶

The following diagram shows how common flags combine to process text:

flowchart LR
    A[Input\nSTDIN or files] --> B{-l flag?}
    B -->|Yes| C[chomp each line]
    B -->|No| D[raw line with \\n]
    C --> E{-a flag?}
    D --> E
    E -->|Yes| F["split into @F\n(using -F pattern)"]
    E -->|No| G["$_ = current line"]
    F --> G
    G --> H[Your -e code runs]
    H --> I{-p flag?}
    I -->|Yes| J[print $_ automatically]
    I -->|No| K[print only if your\ncode calls print]
    J --> L{-i flag?}
    K --> L
    L -->|Yes| M[Write to file in place]
    L -->|No| N[Write to STDOUT]

$_ and @F in One-Liners¶

Two variables dominate one-liner programming: $_ and @F.

$_ Is Your Line¶

With -n or -p, $_ holds the current line. Perl defaults to $_ for regex matching, print, chomp, length, and dozens of other functions:

# These are equivalent:
perl -ne 'print $_ if $_ =~ /error/'
perl -ne 'print if /error/'

@F Is Your Row¶

With -a, @F holds the fields of the current line after splitting. It uses zero-based indexing: $F[0] is the first field, $F[-1] is the last:

# /etc/passwd: user:pass:uid:gid:gecos:home:shell
perl -F: -lane 'print "$F[0] uses $F[-1]"' /etc/passwd

# Modify @F and reconstruct the line
perl -F: -lane '$F[6] = "/bin/bash"; print join(":", @F)' /etc/passwd

Field Processing¶

Field processing is where Perl one-liners replace awk. The -a and -F flags handle splitting; your code handles selection, transformation, and output.

Selecting and Filtering Fields¶

# Print columns 1 and 3 from whitespace-delimited data
perl -lane 'print "$F[0] $F[2]"'

# Print all fields except the first
perl -lane 'print join(" ", @F[1..$#F])'

# Print lines where the third field exceeds 100
perl -lane 'print if $F[2] > 100'

Transforming and Rearranging¶

# Multiply the second field by 1.1 (10% increase)
perl -lane '$F[1] *= 1.1; print join("\t", @F)'

# Uppercase the first field, leave the rest
perl -lane '$F[0] = uc($F[0]); print join(" ", @F)'

# Swap first and second columns
perl -lane 'print "$F[1] $F[0] @F[2..$#F]"'

In-Place Editing¶

The -i flag transforms Perl from a filter into an editor. Combined with -p, it reads a file, applies your transformation, and writes the result back.

Basic In-Place Edit¶

# Replace all instances of "localhost" with "0.0.0.0"
perl -i.bak -pe 's/localhost/0.0.0.0/g' nginx.conf

This creates nginx.conf.bak (the original) and writes the modified content to nginx.conf.

Multiple Files and Safe Workflow¶

# Fix a typo across all config files
perl -i.bak -pe 's/recieve/receive/g' *.conf

The safe workflow: run without -i first to preview, then add -i.bak to apply, then diff to verify, then delete backups:

perl -pe 's/DEBUG/INFO/g' app.conf          # preview
perl -i.bak -pe 's/DEBUG/INFO/g' app.conf   # apply
diff app.conf app.conf.bak                   # verify
rm app.conf.bak                              # clean up

BEGIN and END Blocks¶

BEGIN and END blocks run before and after the main loop, respectively. In one-liners, they handle initialization and summary output.

BEGIN: Setup Before the Loop¶

# Print a header before processing
perl -lane 'BEGIN { print "Name\tScore" } print "$F[0]\t$F[1]"' grades.txt

BEGIN runs once, before the first line is read. Use it for headers, variable initialization, or loading modules.

END: Summary After the Loop¶

# Sum the second column and print the total
perl -lane '$sum += $F[1]; END { print "Total: $sum" }' sales.txt

# Count matching lines
perl -ne '$count++ if /ERROR/; END { print "Errors: $count\n" }' app.log

END runs once, after the last line has been processed. Use it for totals, averages, and summary reports.

Combining BEGIN and END¶

# Full report: header, data, footer
perl -lane 'BEGIN { print "USER\tSHELL" } print "$F[0]\t$F[-1]"; END { print "Total: $." }' /etc/passwd

The $. Variable in END Blocks

$. holds the current line number. In an END block, it holds the line number of the last line read - effectively the total line count.

Multi-Line Processing¶

Not every problem fits the line-by-line model. The -0 flag changes how Perl defines a "record":

# Paragraph mode (-00): records separated by blank lines
perl -00 -ne 'print if /keyword/' document.txt

# Slurp mode (-0777): read entire file at once
perl -0777 -ne 'print scalar(() = /error/gi), " errors\n"' app.log

# Multi-line substitution: remove C-style comments
perl -0777 -pe 's{/\*.*?\*/}{}gs' source.c

# Custom record separator
perl -ne 'BEGIN { $/ = "---\n" } chomp; print "RECORD: $_\n" if length' data.txt

The /s modifier makes . match newlines, so .*? can span line boundaries.

Log Parsing Patterns¶

Log files are where Perl one-liners earn their keep. The combination of regex, field splitting, and aggregation handles most log analysis tasks without a dedicated tool.

Counting and Ranking¶

# Count HTTP status codes
perl -lane '$c{$F[8]}++; END { print "$_: $c{$_}" for sort keys %c }' access.log

# Top 10 IPs by request count
perl -lane '$ip{$F[0]}++; END { print "$ip{$_} $_" for (sort { $ip{$b} <=> $ip{$a} } keys %ip)[0..9] }' access.log

Timestamps and Aggregation¶

# Print the first and last timestamp from a log
perl -ne '
  if (/\[(\d{2}\/\w+\/\d{4}:\d{2}:\d{2}:\d{2})/) { $first //= $1; $last = $1 }
  END { print "First: $first\nLast: $last\n" }
' access.log

# Filter log lines between 09:00 and 10:00
perl -ne 'print if m{\[.*/\d{4}:09:\d{2}:\d{2}}' access.log

# Average response time (last field, in microseconds)
perl -lane '$sum += $F[-1]; $n++; END { printf "Avg: %.2f ms (%d reqs)\n", $sum/$n/1000, $n }' access.log

CSV and TSV Manipulation¶

For simple CSV (no embedded commas or quotes), Perl one-liners are fast and effective. For complex CSV with quoting, use the Text::CSV module instead.

Basic CSV Operations¶

# Extract the third column from a CSV
perl -F, -lane 'print $F[2]' data.csv

# Convert CSV to TSV
perl -F, -lane 'print join("\t", @F)' data.csv

# Convert TSV to CSV
perl -F'\t' -lane 'print join(",", @F)' data.tsv

# Skip the header line
perl -F, -lane 'print $F[2] if $. > 1' data.csv

Modifying Columns and Aggregating¶

# Add a row number as the first column
perl -F, -lane 'print join(",", $., @F)' data.csv

# Remove the second column (index 1)
perl -F, -lane 'splice(@F, 1, 1); print join(",", @F)' data.csv

# Print rows where column 3 exceeds a threshold
perl -F, -lane 'print if $F[2] > 1000' sales.csv

# Group by column 1, sum column 2
perl -F, -lane '$t{$F[0]} += $F[1]; END { print "$_: $t{$_}" for sort keys %t }' expenses.csv

CSV Edge Cases

Simple comma splitting fails when fields contain commas, newlines, or quotes. A field like "Smith, Jr." splits incorrectly with -F,. For production CSV parsing, use perl -MText::CSV -e '...' or write a proper script with Text::CSV.

Building One-Liners Incrementally¶

Complex one-liners grow through small, testable steps: inspect the data, split fields, add the filter, aggregate results, format output.

Example: Finding the Busiest Hour¶

# Step 1: See the data
head -1 access.log
# 192.168.1.50 - - [15/Jan/2025:09:23:45 +0000] "GET /index.html HTTP/1.1" 200 2326

# Step 2: Extract the hour
perl -ne 'print "$1\n" if /\[.*?:(\d{2}):\d{2}:\d{2}/' access.log | head

# Step 3: Aggregate by hour
perl -ne '$h{$1}++ if /\[.*?:(\d{2}):\d{2}:\d{2}/; END { print "$_: $h{$_}\n" for sort keys %h }' access.log

# Step 4: Sort by count (most requests first)
perl -ne '$h{$1}++ if /\[.*?:(\d{2}):\d{2}:\d{2}/; END { printf "%s:00 - %d requests\n", $_, $h{$_} for sort { $h{$b} <=> $h{$a} } keys %h }' access.log

Each step is independently testable. If step 3 gives wrong numbers, you debug step 2's regex.

Converting Between One-Liners and Scripts¶

One-Liner to Script¶

This one-liner counts words per line:

perl -lne '$w += scalar(split); END { print $w }'

The equivalent script:

#!/usr/bin/env perl
use strict;
use warnings;

my $w = 0;
while (<>) {
    chomp;                        # -l flag
    $w += scalar(split);          # -a not used, but split works on $_
}
print "$w\n";                     # END block

The mapping:

Flag	Script equivalent
`-n`	`while (<>) { ... }`
`-p`	`while (<>) { ... } continue { print }`
`-l`	`chomp` on input, `"\n"` on output
`-a`	`@F = split` at the start of the loop
`-F:`	`@F = split /:/, $_`
`-i.bak`	Open input, rename, open output
`BEGIN {}`	Code before the loop
`END {}`	Code after the loop

Script to One-Liner¶

Start with the loop body. Strip variable declarations. Use $_ defaults. Collapse to one line:

# Script version: while (<>) { chomp; my @f = split /,/; print join(",", @f), "\n" if $f[2] > 100 }
# One-liner:
perl -F, -lane 'print join(",", @F) if $F[2] > 100'

When to Convert

Keep it as a one-liner if it fits in a single readable line (roughly 80-120 characters). Convert to a script when it needs error handling, multiple data sources, or will be run by others.

Comparison with sed and awk¶

sed vs Perl¶

sed excels at simple substitutions. Perl handles everything sed does, plus complex logic:

Task	sed	Perl
Substitution	`sed 's/old/new/g'`	`perl -pe 's/old/new/g'`
Delete lines	`sed '/pattern/d'`	`perl -ne 'print unless /pattern/'`
Line range	`sed -n '5,10p'`	`perl -ne 'print if 5..10'`
Conditional sub	Awkward	`perl -pe 's/old/new/ if /context/'`
Math on captures	Not possible	`perl -pe 's/(\d+)/$1*2/ge'`

The /e modifier - evaluating the replacement as code - is where Perl surpasses sed entirely.

awk vs Perl¶

awk is a field-oriented processor. Perl's -a flag provides the same model with full language access:

Task	awk	Perl
Print field 2	`awk '{print $2}'`	`perl -lane 'print $F[1]'`
Sum a column	`awk '{s+=$3} END{print s}'`	`perl -lane '$s+=$F[2]; END{print $s}'`
Field separator	`awk -F:`	`perl -F:`

The key difference: awk fields start at 1 ($1 is the first field, $0 is the entire line), while Perl's @F starts at 0 ($F[0] is the first field, $_ is the entire line).

When to Use Which¶

Use sed for simple substitutions on one file. Use awk for quick column extraction. Use Perl when you need conditional logic, computed replacements, multiple operations in one pass, or code that will grow into a script.