awk - Pattern Scanning and Processing Language

awk is a powerful text processing tool for working with structured data, especially column-based formats like CSV, TSV, logs, and tabular output.

Basic Syntax

awk 'pattern { action }' file
awk '{ action }' file              # No pattern = all lines
awk 'pattern' file                 # No action = print matching lines

Built-in Variables

Field Variables

$0          # Entire line
$1          # First field
$2          # Second field
$NF         # Last field
$(NF-1)     # Second to last field

Special Variables

NR          # Current line number (total across all files)
NF          # Number of fields in current line
FNR         # Line number in current file
FS          # Field separator (default: whitespace)
OFS         # Output field separator (default: space)
RS          # Record separator (default: newline)
ORS         # Output record separator (default: newline)
FILENAME    # Current filename

Basic Printing

# Print entire line
awk '{ print }' file
awk '{ print $0 }' file

# Print specific fields
awk '{ print $1 }' file            # First field
awk '{ print $1, $3 }' file        # First and third fields
awk '{ print $1 "\t" $3 }' file    # With tab separator

# Print with custom separator
awk '{ print $1, $2 }' OFS=", " file

Field Separators

# Comma-separated (CSV)
awk -F',' '{ print $1, $3 }' file.csv

# Tab-separated
awk -F'\t' '{ print $1 }' file.tsv

# Multiple character separator
awk -F': ' '{ print $2 }' file

# Regular expression separator
awk -F'[,:]' '{ print $1 }' file

# Set both input and output separator
awk -F',' '{ print $1, $2 }' OFS='|' file

Pattern Matching

Regular Expression Patterns

# Lines matching pattern
awk '/pattern/ { print }' file
awk '/ERROR/' file                 # Shorthand

# Specific field matches
awk '$1 == "error"' file           # First field equals "error"
awk '$3 ~ /pattern/' file          # Third field contains pattern
awk '$2 !~ /pattern/' file         # Second field doesn't contain pattern

Comparison Operators

awk '$1 == "value"' file           # Equal
awk '$1 != "value"' file           # Not equal
awk '$3 > 100' file                # Greater than
awk '$3 >= 100' file               # Greater or equal
awk '$3 < 100' file                # Less than
awk '$3 <= 100' file               # Less or equal

Logical Operators

awk '$1 == "error" && $3 > 100' file    # AND
awk '$1 == "error" || $1 == "warn"' file # OR
awk '!($1 == "error")' file             # NOT

Range Patterns

# Lines between two patterns
awk '/START/,/END/' file

# Lines 10 to 20
awk 'NR==10,NR==20' file

BEGIN and END Blocks

# Execute before processing
awk 'BEGIN { print "Header" } { print $1 }' file

# Execute after processing
awk '{ sum += $1 } END { print sum }' file

# Both
awk 'BEGIN { print "Start" } { print } END { print "Done" }' file

Common Patterns

# First line
awk 'NR==1' file

# Last line
awk 'END { print }' file

# Lines 5-10
awk 'NR>=5 && NR<=10' file

# Every other line
awk 'NR % 2 == 0' file

# Skip first line (header)
awk 'NR > 1' file

Column Operations

Sum Column

awk '{ sum += $3 } END { print sum }' file

Average

awk '{ sum += $3; count++ } END { print sum/count }' file

Max/Min

# Max
awk 'BEGIN { max = 0 } { if ($3 > max) max = $3 } END { print max }' file

# Min
awk 'BEGIN { min = 999999 } { if ($3 < min) min = $3 } END { print min }' file

Count

# Count lines
awk 'END { print NR }' file

# Count matching lines
awk '/pattern/ { count++ } END { print count }' file

# Count unique values
awk '{ a[$1]++ } END { print length(a) }' file

Formatting Output

Aligned Columns

awk '{ printf "%-10s %-20s %5d\n", $1, $2, $3 }' file

printf format:

  • %-10s - Left-aligned string, width 10
  • %20s - Right-aligned string, width 20
  • %5d - Integer, width 5
  • %.2f - Float with 2 decimal places

Add Line Numbers

awk '{ print NR, $0 }' file

Add Custom Formatting

awk '{ print "Name: " $1 ", Age: " $2 }' file

Arrays and Associative Arrays

Basic Arrays

# Count occurrences
awk '{ count[$1]++ } END { for (word in count) print word, count[word] }' file

# Unique values
awk '!seen[$1]++' file

# Group by first field, sum second field
awk '{ sum[$1] += $2 } END { for (key in sum) print key, sum[key] }' file

Multi-dimensional Indexing

# Use SUBSEP (default: \034)
awk '{ count[$1,$2]++ } END { for (key in count) print key, count[key] }' file

# Or create custom separator
awk '{ count[$1 "-" $2]++ } END { for (key in count) print key, count[key] }' file

String Functions

length($1)              # Length of field
substr($1, start, len)  # Substring
tolower($1)             # Convert to lowercase
toupper($1)             # Convert to uppercase
split($0, arr, ",")     # Split into array
gsub(/old/, "new", $1)  # Global substitute in field
sub(/old/, "new", $1)   # First substitute in field
index($1, "text")       # Position of substring (0 if not found)
match($1, /regex/)      # Test regex match

Examples

# Convert first field to uppercase
awk '{ $1 = toupper($1); print }' file

# Extract substring
awk '{ print substr($1, 1, 3) }' file

# Replace in field
awk '{ gsub(/old/, "new", $1); print }' file

# Print if field contains substring
awk 'index($1, "text") > 0' file

Math Functions

int(x)          # Integer part
sqrt(x)         # Square root
exp(x)          # Exponential
log(x)          # Natural logarithm
sin(x)          # Sine
cos(x)          # Cosine
atan2(y, x)     # Arctangent
rand()          # Random number 0-1
srand(x)        # Seed random

Conditional Statements

if-else

awk '{ if ($3 > 100) print "High"; else print "Low" }' file

# Multi-line
awk '{
    if ($3 > 100)
        print $1, "High"
    else if ($3 > 50)
        print $1, "Medium"
    else
        print $1, "Low"
}' file

Ternary Operator

awk '{ print ($3 > 100) ? "High" : "Low" }' file

Loops

for Loop

# Iterate over array
awk '{ for (i = 1; i <= NF; i++) print $i }' file

# With counter
awk 'BEGIN { for (i = 1; i <= 10; i++) print i }'

while Loop

awk '{ i = 1; while (i <= NF) { print $i; i++ } }' file

Practical Examples

CSV Processing

Parse CSV

awk -F',' '{ print $1, $3 }' file.csv

Add header

awk 'BEGIN { print "Name,Age,City" } { print }' file.csv

Filter rows

awk -F',' '$3 > 25' file.csv

Log Analysis

Count by status code

awk '{ count[$9]++ } END { for (code in count) print code, count[code] }' access.log

Filter by time range

awk '$4 >= "[01/Jan/2024" && $4 <= "[31/Jan/2024"' access.log

Sum response times

awk '{ sum += $10; count++ } END { print "Avg:", sum/count }' access.log

Data Transformation

Swap columns

awk '{ print $2, $1, $3 }' file

Remove duplicates

awk '!seen[$0]++' file

Transpose data

awk '{ for (i = 1; i <= NF; i++) a[i,NR] = $i }
     END { for (i = 1; i <= NF; i++) {
         for (j = 1; j <= NR; j++)
             printf "%s ", a[i,j]
         print ""
     }
}' file

JSON-like Output

awk 'BEGIN { print "[" }
     { printf "  {\"name\": \"%s\", \"age\": %d}", $1, $2 }
     NR < total { print "," }
     END { print "\n]" }' file

Statistical Analysis

Calculate standard deviation

awk '{
    sum += $1
    sumsq += $1 * $1
}
END {
    mean = sum / NR
    variance = (sumsq / NR) - (mean * mean)
    print "Mean:", mean
    print "StdDev:", sqrt(variance)
}' file

Working with Multiple Files

# Process multiple files
awk '{ print FILENAME, $0 }' file1 file2

# Different actions per file
awk 'FNR==1 { print "File:", FILENAME } { print }' file1 file2

# Compare files
awk 'NR==FNR { a[$1]=$2; next } { print $1, a[$1], $2 }' file1 file2

Pattern explained:

  • NR==FNR - True only for first file
  • next - Skip to next record
  • Second block processes second file

Debugging

# Print line number and content
awk '{ print NR, $0 }' file

# Print number of fields
awk '{ print NF, $0 }' file

# Debug variables
awk '{ print "NR=" NR, "NF=" NF, "Line:", $0 }' file

Command Line Options

-F            # Field separator
-v var=value  # Set variable
-f script.awk # Read awk script from file
-W version    # Show version

Using Variables

awk -v threshold=100 '$3 > threshold' file
awk -v name="John" '$1 == name' file

Script Files

# script.awk
BEGIN { FS = "," }
{
    if ($3 > 100)
        print $1, "High"
}

# Run it
awk -f script.awk file.csv

Integration with Other Tools

With grep

grep "ERROR" log.txt | awk '{ print $1, $4 }'

With sort

awk '{ print $2, $1 }' file | sort -n

With uniq

awk '{ print $1 }' file | sort | uniq -c

Pipe to awk

ps aux | awk 'NR > 1 { print $1, $11 }'
ls -l | awk '{ sum += $5 } END { print sum }'

Tips and Best Practices

  1. Use single quotes to avoid shell interpretation: awk '...'
  2. Test patterns first before adding actions
  3. Break complex scripts into multiple lines for readability
  4. Use BEGIN to initialize variables and set separators
  5. Use END for final calculations and summaries
  6. Remember NF is the number of fields, use $NF for last field
  7. Arrays don’t need declaration - they’re created on first use
  8. **Field assignment updates 0:0**: `1 = “new”; print` changes the line

Common Pitfalls

String vs Number Comparison

# String comparison
awk '$1 == "100"' file

# Numeric comparison (forces numeric context)
awk '$1 == 100' file
awk '$1 + 0 == 100' file

Empty Fields

# Check for empty field
awk '$3 == ""' file
awk 'length($3) == 0' file

Modifying Fields

# This updates $0
awk '{ $1 = toupper($1); print }' file

# OFS is used when reconstructing $0
awk 'BEGIN { OFS = "," } { $1 = $1; print }' file

Resources