Multiple Linear Regression

Introduction

Multiple Linear Regression is a fundamental statistical and machine learning technique used to model the relationship between multiple independent variables (features) and a single dependent variable (target). Unlike simple linear regression, which uses only one predictor, multiple linear regression allows us to understand how several factors simultaneously influence an outcome. This makes it invaluable for real-world applications where phenomena are typically influenced by multiple variables—from predicting house prices based on size, location, and age, to estimating sales based on advertising spend across different channels.

Mathematical Representation

The image demonstrates two equivalent ways to express the multiple linear regression model:

Scalar Form

fw,b(x)=w1x1+w2x2++wnxn+bf_{w,b}(x) = w_1x_1 + w_2x_2 + \ldots + w_nx_n + b

In this representation:

  • Each input feature (x₁, x₂, …, xₙ) represents a different predictor variable
  • Each weight (w₁, w₂, …, wₙ) represents the strength and direction of that feature’s influence on the prediction
  • The bias term b (also called the intercept) represents the baseline prediction when all features equal zero

Vector Form

The bottom equation presents the same model using vector notation:

fw,b(x)=wx+bf_{w,b}(\vec{x}) = \vec{w} \cdot \vec{x} + b

Understanding the Arrow Notation: The arrows above w and x (w\vec{w} and x\vec{x}) indicate that these are vectors rather than scalar values. In mathematical notation, this arrow distinguishes:

  • w=[w1,w2,w3,,wn]\vec{w} = [w_1, w_2, w_3, \ldots, w_n] - a vector containing all the model weights
  • x=[x1,x2,x3,,xn]\vec{x} = [x_1, x_2, x_3, \ldots, x_n] - a vector containing all the input features

The dot product (wx\vec{w} \cdot \vec{x}) computes the sum of element-wise multiplications: w1x1+w2x2+w3x3++wnxnw_1x_1 + w_2x_2 + w_3x_3 + \ldots + w_nx_n, providing a compact way to express the same computation.

Model Parameters and Interpretation

The notation clearly identifies two types of components:

  1. Parameters of the model (w\vec{w} and bb): These are the values learned during the training process through optimization algorithms like gradient descent. They represent the model’s “knowledge” about how features relate to the target.

  2. Input features (x\vec{x}): These are the observed variables for which we want to make predictions. In practice, these could be measurements, counts, or any quantifiable characteristics.

Practical Significance

The vector notation isn’t just mathematical elegance—it enables efficient computation through:

  • Matrix operations that can process multiple data points simultaneously
  • Optimized linear algebra libraries (like BLAS, LAPACK)
  • GPU acceleration for large-scale datasets
  • Cleaner, more maintainable code implementation

This mathematical framework forms the foundation for understanding not just multiple linear regression, but also more complex models like neural networks, where similar linear transformations are combined with non-linear activation functions to model complex patterns in data.