awk - Pattern Scanning and Processing Language
awk is a powerful text processing tool for working with structured data, especially column-based formats like CSV, TSV, logs, and tabular output.
Basic Syntax
awk 'pattern { action }' file
awk '{ action }' file # No pattern = all lines
awk 'pattern' file # No action = print matching lines
Built-in Variables
Field Variables
$0 # Entire line
$1 # First field
$2 # Second field
$NF # Last field
$(NF-1) # Second to last field
Special Variables
NR # Current line number (total across all files)
NF # Number of fields in current line
FNR # Line number in current file
FS # Field separator (default: whitespace)
OFS # Output field separator (default: space)
RS # Record separator (default: newline)
ORS # Output record separator (default: newline)
FILENAME # Current filename
Print and Fields
Basic Printing
# Print entire line
awk '{ print }' file
awk '{ print $0 }' file
# Print specific fields
awk '{ print $1 }' file # First field
awk '{ print $1, $3 }' file # First and third fields
awk '{ print $1 "\t" $3 }' file # With tab separator
# Print with custom separator
awk '{ print $1, $2 }' OFS=", " file
Field Separators
# Comma-separated (CSV)
awk -F',' '{ print $1, $3 }' file.csv
# Tab-separated
awk -F'\t' '{ print $1 }' file.tsv
# Multiple character separator
awk -F': ' '{ print $2 }' file
# Regular expression separator
awk -F'[,:]' '{ print $1 }' file
# Set both input and output separator
awk -F',' '{ print $1, $2 }' OFS='|' file
Pattern Matching
Regular Expression Patterns
# Lines matching pattern
awk '/pattern/ { print }' file
awk '/ERROR/' file # Shorthand
# Specific field matches
awk '$1 == "error"' file # First field equals "error"
awk '$3 ~ /pattern/' file # Third field contains pattern
awk '$2 !~ /pattern/' file # Second field doesn't contain pattern
Comparison Operators
awk '$1 == "value"' file # Equal
awk '$1 != "value"' file # Not equal
awk '$3 > 100' file # Greater than
awk '$3 >= 100' file # Greater or equal
awk '$3 < 100' file # Less than
awk '$3 <= 100' file # Less or equal
Logical Operators
awk '$1 == "error" && $3 > 100' file # AND
awk '$1 == "error" || $1 == "warn"' file # OR
awk '!($1 == "error")' file # NOT
Range Patterns
# Lines between two patterns
awk '/START/,/END/' file
# Lines 10 to 20
awk 'NR==10,NR==20' file
BEGIN and END Blocks
# Execute before processing
awk 'BEGIN { print "Header" } { print $1 }' file
# Execute after processing
awk '{ sum += $1 } END { print sum }' file
# Both
awk 'BEGIN { print "Start" } { print } END { print "Done" }' file
Common Patterns
Print Specific Lines
# First line
awk 'NR==1' file
# Last line
awk 'END { print }' file
# Lines 5-10
awk 'NR>=5 && NR<=10' file
# Every other line
awk 'NR % 2 == 0' file
# Skip first line (header)
awk 'NR > 1' file
Column Operations
Sum Column
awk '{ sum += $3 } END { print sum }' file
Average
awk '{ sum += $3; count++ } END { print sum/count }' file
Max/Min
# Max
awk 'BEGIN { max = 0 } { if ($3 > max) max = $3 } END { print max }' file
# Min
awk 'BEGIN { min = 999999 } { if ($3 < min) min = $3 } END { print min }' file
Count
# Count lines
awk 'END { print NR }' file
# Count matching lines
awk '/pattern/ { count++ } END { print count }' file
# Count unique values
awk '{ a[$1]++ } END { print length(a) }' file
Formatting Output
Aligned Columns
awk '{ printf "%-10s %-20s %5d\n", $1, $2, $3 }' file
printf format:
%-10s- Left-aligned string, width 10%20s- Right-aligned string, width 20%5d- Integer, width 5%.2f- Float with 2 decimal places
Add Line Numbers
awk '{ print NR, $0 }' file
Add Custom Formatting
awk '{ print "Name: " $1 ", Age: " $2 }' file
Arrays and Associative Arrays
Basic Arrays
# Count occurrences
awk '{ count[$1]++ } END { for (word in count) print word, count[word] }' file
# Unique values
awk '!seen[$1]++' file
# Group by first field, sum second field
awk '{ sum[$1] += $2 } END { for (key in sum) print key, sum[key] }' file
Multi-dimensional Indexing
# Use SUBSEP (default: \034)
awk '{ count[$1,$2]++ } END { for (key in count) print key, count[key] }' file
# Or create custom separator
awk '{ count[$1 "-" $2]++ } END { for (key in count) print key, count[key] }' file
String Functions
length($1) # Length of field
substr($1, start, len) # Substring
tolower($1) # Convert to lowercase
toupper($1) # Convert to uppercase
split($0, arr, ",") # Split into array
gsub(/old/, "new", $1) # Global substitute in field
sub(/old/, "new", $1) # First substitute in field
index($1, "text") # Position of substring (0 if not found)
match($1, /regex/) # Test regex match
Examples
# Convert first field to uppercase
awk '{ $1 = toupper($1); print }' file
# Extract substring
awk '{ print substr($1, 1, 3) }' file
# Replace in field
awk '{ gsub(/old/, "new", $1); print }' file
# Print if field contains substring
awk 'index($1, "text") > 0' file
Math Functions
int(x) # Integer part
sqrt(x) # Square root
exp(x) # Exponential
log(x) # Natural logarithm
sin(x) # Sine
cos(x) # Cosine
atan2(y, x) # Arctangent
rand() # Random number 0-1
srand(x) # Seed random
Conditional Statements
if-else
awk '{ if ($3 > 100) print "High"; else print "Low" }' file
# Multi-line
awk '{
if ($3 > 100)
print $1, "High"
else if ($3 > 50)
print $1, "Medium"
else
print $1, "Low"
}' file
Ternary Operator
awk '{ print ($3 > 100) ? "High" : "Low" }' file
Loops
for Loop
# Iterate over array
awk '{ for (i = 1; i <= NF; i++) print $i }' file
# With counter
awk 'BEGIN { for (i = 1; i <= 10; i++) print i }'
while Loop
awk '{ i = 1; while (i <= NF) { print $i; i++ } }' file
Practical Examples
CSV Processing
Parse CSV
awk -F',' '{ print $1, $3 }' file.csv
Add header
awk 'BEGIN { print "Name,Age,City" } { print }' file.csv
Filter rows
awk -F',' '$3 > 25' file.csv
Log Analysis
Count by status code
awk '{ count[$9]++ } END { for (code in count) print code, count[code] }' access.log
Filter by time range
awk '$4 >= "[01/Jan/2024" && $4 <= "[31/Jan/2024"' access.log
Sum response times
awk '{ sum += $10; count++ } END { print "Avg:", sum/count }' access.log
Data Transformation
Swap columns
awk '{ print $2, $1, $3 }' file
Remove duplicates
awk '!seen[$0]++' file
Transpose data
awk '{ for (i = 1; i <= NF; i++) a[i,NR] = $i }
END { for (i = 1; i <= NF; i++) {
for (j = 1; j <= NR; j++)
printf "%s ", a[i,j]
print ""
}
}' file
JSON-like Output
awk 'BEGIN { print "[" }
{ printf " {\"name\": \"%s\", \"age\": %d}", $1, $2 }
NR < total { print "," }
END { print "\n]" }' file
Statistical Analysis
Calculate standard deviation
awk '{
sum += $1
sumsq += $1 * $1
}
END {
mean = sum / NR
variance = (sumsq / NR) - (mean * mean)
print "Mean:", mean
print "StdDev:", sqrt(variance)
}' file
Working with Multiple Files
# Process multiple files
awk '{ print FILENAME, $0 }' file1 file2
# Different actions per file
awk 'FNR==1 { print "File:", FILENAME } { print }' file1 file2
# Compare files
awk 'NR==FNR { a[$1]=$2; next } { print $1, a[$1], $2 }' file1 file2
Pattern explained:
NR==FNR- True only for first filenext- Skip to next record- Second block processes second file
Debugging
# Print line number and content
awk '{ print NR, $0 }' file
# Print number of fields
awk '{ print NF, $0 }' file
# Debug variables
awk '{ print "NR=" NR, "NF=" NF, "Line:", $0 }' file
Command Line Options
-F # Field separator
-v var=value # Set variable
-f script.awk # Read awk script from file
-W version # Show version
Using Variables
awk -v threshold=100 '$3 > threshold' file
awk -v name="John" '$1 == name' file
Script Files
# script.awk
BEGIN { FS = "," }
{
if ($3 > 100)
print $1, "High"
}
# Run it
awk -f script.awk file.csv
Integration with Other Tools
With grep
grep "ERROR" log.txt | awk '{ print $1, $4 }'
With sort
awk '{ print $2, $1 }' file | sort -n
With uniq
awk '{ print $1 }' file | sort | uniq -c
Pipe to awk
ps aux | awk 'NR > 1 { print $1, $11 }'
ls -l | awk '{ sum += $5 } END { print sum }'
Tips and Best Practices
- Use single quotes to avoid shell interpretation:
awk '...' - Test patterns first before adding actions
- Break complex scripts into multiple lines for readability
- Use BEGIN to initialize variables and set separators
- Use END for final calculations and summaries
- Remember NF is the number of fields, use
$NFfor last field - Arrays don’t need declaration - they’re created on first use
- **Field assignment updates 1 = “new”; print` changes the line
Common Pitfalls
String vs Number Comparison
# String comparison
awk '$1 == "100"' file
# Numeric comparison (forces numeric context)
awk '$1 == 100' file
awk '$1 + 0 == 100' file
Empty Fields
# Check for empty field
awk '$3 == ""' file
awk 'length($3) == 0' file
Modifying Fields
# This updates $0
awk '{ $1 = toupper($1); print }' file
# OFS is used when reconstructing $0
awk 'BEGIN { OFS = "," } { $1 = $1; print }' file
Resources
- GNU awk manual: https://www.gnu.org/software/gawk/manual/
- awk tutorial: https://www.grymoire.com/Unix/Awk.html
- One-liners: https://catonmat.net/awk-one-liners-explained-part-one