Common Machine Learning Notation Standards

Data and Dimensions

m - Number of training examples
n - Number of features/dimensions
X - Input matrix/feature matrix (usually ∈ ℝ^(m×n))
x - Single input vector or feature
x^(i) - The i-th training example
x_j - The j-th feature
x_j^(i) - The j-th feature of the i-th example
y - Output/target/label vector
y^(i) - Label for the i-th training example
ŷ (y-hat) - Predicted output

Model Parameters

θ (theta) - Parameter vector (common in classical ML)
w or W - Weight matrix or vector
b - Bias term
β (beta) - Regression coefficients (statistics convention)
α (alpha) - Learning rate
λ (lambda) - Regularisation parameter
ε (epsilon) - Small value for numerical stability or error threshold

Functions and Operations

h(x) or h_θ(x) - Hypothesis function
f(x) - General function or true underlying function
g(x) - Often activation function
J(θ) - Cost/objective function
L - Loss function (for single example)
ℓ - Alternative loss notation
∇ (nabla) - Gradient operator
σ (sigma) - Sigmoid function or standard deviation
Σ - Summation

Neural Network Specific

L - Number of layers
n^[l] - Number of units in layer l
W^[l] - Weight matrix for layer l
b^[l] - Bias vector for layer l
a^[l] - Activations for layer l
z^[l] - Pre-activation values for layer l
δ (delta) - Error term in backpropagation

Probability and Statistics

P(x) - Probability of x
p(x) - Probability density/mass function
𝔼[X] - Expected value of X
μ (mu) - Mean
σ² - Variance
Σ - Covariance matrix (context-dependent)
𝒩(μ, σ²) - Normal distribution
~ - “Distributed as” (e.g., X ~ 𝒩(0,1))

Optimization

t - Time step or iteration number
η (eta) - Learning rate (alternative to α)
ρ (rho) - Momentum coefficient or discount factor
γ (gamma) - Discount factor (reinforcement learning)
∂ - Partial derivative

Evaluation and Splits

X_train, X_test, X_val - Training, test, validation sets
k - Number of clusters (k-means) or folds (k-fold CV)
K - Number of classes in classification
TP, FP, TN, FN - True/False Positives/Negatives
ACC - Accuracy
MSE - Mean Squared Error
RMSE - Root Mean Squared Error

Matrix Operations

X^T - Transpose of X
X^{-1} - Inverse of X
⊙ - Element-wise multiplication (Hadamard product)
· or ⟨,⟩ - Dot product
||x|| - Norm of x (often L2 unless specified)
I - Identity matrix

Special Notation

1{condition} - Indicator function (1 if true, 0 if false)
argmax - Argument that maximizes
argmin - Argument that minimizes
log - Natural logarithm (unless log base specified)
exp - Exponential function
𝟙 - Vector of ones
𝟎 - Vector of zeros

Indexing Conventions

i - Typically indexes training examples (1 to m)
j - Typically indexes features (1 to n)
k - Typically indexes output classes
l - Typically indexes layers in neural networks
t - Typically indexes time steps

Set Notation

𝒟 - Dataset
ℝ - Real numbers
ℝ^n - n-dimensional real space
∈ - “Element of”
⊂ - “Subset of”

-

-