Definition of Logistic Regression
Logistic regression is a statistical method used for binary classification problems - predicting outcomes that fall into one of two categories (like yes/no, pass/fail, or spam/not spam).
Unlike linear regression, which predicts continuous values, logistic regression predicts the probability that an instance belongs to a particular class. It does this by applying the logistic (sigmoid) function to a linear combination of input features, which transforms any real-valued number into a value between 0 and 1.
The core equation is:
- p = 1 / (1 + e^(-z))
- where z = β₀ + β₁x₁ + β₂x₂ + … + βₙxₙ
Key characteristics:
- Output interpretation: The output represents the probability of belonging to the positive class (typically coded as 1)
- Decision boundary: Usually set at 0.5 - predictions above this threshold are classified as one class, below as the other
- Training method: Parameters are typically estimated using maximum likelihood estimation rather than least squares
- Assumptions: Assumes a linear relationship between the log-odds of the outcome and the predictors
Logistic regression can be extended to multi-class classification (called multinomial or softmax regression) and remains widely used in fields like medicine, marketing, and machine learning due to its interpretability - the coefficients directly indicate how each feature affects the odds of the outcome.