Softmax

Softmax is a function that takes a list of numbers and turns them into probabilities. It’s how neural networks answer “which option is most likely?”

The Problem Softmax Solves

Imagine a neural network trying to classify an image as a cat, dog, or bird. The network outputs raw scores called logits:

Cat: 5.0
Dog: 2.0
Bird: 1.0

These numbers are hard to interpret. Is 5.0 good? How confident is the network? Softmax converts these into percentages that sum to 100%:

Cat: 84.2%
Dog: 11.4%
Bird: 4.4%

Now it’s clear: the network thinks it’s probably a cat.

How It Works

Softmax does two things:

Makes everything positive using the exponential function ( $e^x$ )
Normalises so everything adds up to 1 (100%)

Step by Step

Starting with scores [5.0, 2.0, 1.0]:

Animal	Score	$e^{\text{score}}$	Probability
Cat	5.0	148.4	148.4 ÷ 176.3 = 84.2%
Dog	2.0	7.4	7.4 ÷ 176.3 = 11.4%
Bird	1.0	2.7	2.7 ÷ 176.3 = 4.4%
Total		176.3	100%

The formula: divide each exponential by the sum of all exponentials.

Why Use Exponentials?

The exponential function ( $e^x$ ) has useful properties:

Always positive: You can’t have negative probability
Amplifies differences: Bigger scores get much bigger after $e^x$ , making the winner stand out
Smooth: Small changes in input create small changes in output (important for learning)

Softmax vs “Hard” Max

Think of softmax as a “soft” version of picking a winner:

Scores:      [5.0,  2.0,  1.0]

Hard max:    [1,    0,    0  ]    ← Winner takes all
Softmax:     [0.84, 0.11, 0.04]   ← Winner gets most, but others get some

Hard max says “definitely cat.” Softmax says “probably cat, but maybe dog or bird.”

Temperature: Adjusting Confidence

You can make softmax more or less confident using temperature:

Temperature	Result
Low (0.5)	More confident: `[95%, 4%, 1%]`
Normal (1.0)	Standard: `[84%, 11%, 4%]`
High (2.0)	Less confident: `[62%, 23%, 15%]`

Low temperature → More decisive (approaches hard max)
High temperature → More uncertain (approaches equal probabilities)

This is useful when you want AI to be more creative (high temp) or more predictable (low temp).

Where Softmax Is Used

Image classification: “Is this a cat, dog, or bird?”
Language models: “What’s the next word in this sentence?”
Game AI: “Which move should I make?” (see Reinforcement Learning)
Recommendation systems: “Which video should I suggest?”

Key Takeaways

Softmax converts raw scores into probabilities (0-100%)
All probabilities sum to exactly 100%
Higher scores get higher probabilities
It’s “soft” because every option keeps some probability
Temperature controls how confident the output is