Definition of Logistic Regression

Logistic regression is a statistical method used for binary classification problems - predicting outcomes that fall into one of two categories (like yes/no, pass/fail, or spam/not spam).

Unlike linear regression, which predicts continuous values, logistic regression predicts the probability that an instance belongs to a particular class. It does this by applying the logistic (sigmoid) function to a linear combination of input features, which transforms any real-valued number into a value between 0 and 1.

The core equation is:

  • p = 1 / (1 + e^(-z))
  • where z = β₀ + β₁x₁ + β₂x₂ + … + βₙxₙ

Key characteristics:

  • Output interpretation: The output represents the probability of belonging to the positive class (typically coded as 1)
  • Decision boundary: Usually set at 0.5 - predictions above this threshold are classified as one class, below as the other
  • Training method: Parameters are typically estimated using maximum likelihood estimation rather than least squares
  • Assumptions: Assumes a linear relationship between the log-odds of the outcome and the predictors

Logistic regression can be extended to multi-class classification (called multinomial or softmax regression) and remains widely used in fields like medicine, marketing, and machine learning due to its interpretability - the coefficients directly indicate how each feature affects the odds of the outcome.