Gaussian Distribution

The Gaussian distribution, also widely known as the normal distribution, is the most extensively studied probability distribution for continuous-valued random variables. It is characterised by its distinctive bell-shaped curve, which represents a symmetric distribution where most observations cluster around the central peak.

Fundamental Characteristics

The distribution is defined by two primary parameters:

  • Mean (μ\mu): Describes the centre or average of the distribution.
  • Standard Deviation (σ\sigma): Describes the width, spread, or concentration of data points around the mean.

A key property of the Gaussian distribution is the “68-95-99.7” rule. This states that approximately 68% of the values fall within one standard deviation (1σ1\sigma) of the mean, 95% within 2σ2\sigma, and 99.7% within 3σ3\sigma. Additionally, the distribution is normalised, meaning the total area under the probability density curve is always equal to 1.

Mathematical Representation

For a single variable, the probability density function is expressed as: f(x)=1σ2πexp((xμ)22σ2)f(x) = \frac{1}{\sigma \sqrt{2\pi}} \exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right)

In higher dimensions, it is referred to as the multivariate Gaussian distribution, characterised by a mean vector μ\mu and a covariance matrix (Σ\Sigma). A specific version where the mean is zero and the covariance is the identity matrix is known as the standard normal distribution.

Importance in Machine Learning

Gaussians are fundamental to machine learning for several reasons:

  • Central Limit Theorem: The distribution arises naturally when considering the sums of independent and identically distributed random variables.
  • Computational Convenience: Gaussian distributions possess high mathematical tractability. For instance, the marginals and conditionals of joint Gaussian distributions are themselves Gaussians. Furthermore, any linear or affine transformation of a Gaussian random variable results in another Gaussian distribution.
  • Noise Modelling: In regression tasks, it is common to assume that observation uncertainty can be explained by independent Gaussian noise with zero mean.
  • Statistical Assumptions: Many statistical methods and models, such as Gaussian mixture models for density estimation or certain types of linear regression, rely on the assumption that data is normally distributed.

Analogy: You can think of a Gaussian distribution like a sand pile created by pouring sand through a funnel. Most of the sand grains land in a tall heap directly under the funnel (the mean), while fewer grains bounce further away. The result is a smooth, predictable hill where the slopes taper off equally on both sides, creating a “bell” shape that represents where most of the data is expected to lie.

-
-