Softmax
Softmax is a function that takes a list of numbers and turns them into probabilities. It’s how neural networks answer “which option is most likely?”
The Problem Softmax Solves
Imagine a neural network trying to classify an image as a cat, dog, or bird. The network outputs raw scores called logits:
- Cat: 5.0
- Dog: 2.0
- Bird: 1.0
These numbers are hard to interpret. Is 5.0 good? How confident is the network? Softmax converts these into percentages that sum to 100%:
- Cat: 84.2%
- Dog: 11.4%
- Bird: 4.4%
Now it’s clear: the network thinks it’s probably a cat.
How It Works
Softmax does two things:
- Makes everything positive using the exponential function ()
- Normalises so everything adds up to 1 (100%)
Step by Step
Starting with scores [5.0, 2.0, 1.0]:
| Animal | Score | Probability | |
|---|---|---|---|
| Cat | 5.0 | 148.4 | 148.4 ÷ 176.3 = 84.2% |
| Dog | 2.0 | 7.4 | 7.4 ÷ 176.3 = 11.4% |
| Bird | 1.0 | 2.7 | 2.7 ÷ 176.3 = 4.4% |
| Total | 176.3 | 100% |
The formula: divide each exponential by the sum of all exponentials.
Why Use Exponentials?
The exponential function () has useful properties:
- Always positive: You can’t have negative probability
- Amplifies differences: Bigger scores get much bigger after , making the winner stand out
- Smooth: Small changes in input create small changes in output (important for learning)
Softmax vs “Hard” Max
Think of softmax as a “soft” version of picking a winner:
Scores: [5.0, 2.0, 1.0]
Hard max: [1, 0, 0 ] ← Winner takes all
Softmax: [0.84, 0.11, 0.04] ← Winner gets most, but others get some
Hard max says “definitely cat.” Softmax says “probably cat, but maybe dog or bird.”
Temperature: Adjusting Confidence
You can make softmax more or less confident using temperature:
| Temperature | Result |
|---|---|
| Low (0.5) | More confident: [95%, 4%, 1%] |
| Normal (1.0) | Standard: [84%, 11%, 4%] |
| High (2.0) | Less confident: [62%, 23%, 15%] |
- Low temperature → More decisive (approaches hard max)
- High temperature → More uncertain (approaches equal probabilities)
This is useful when you want AI to be more creative (high temp) or more predictable (low temp).
Where Softmax Is Used
- Image classification: “Is this a cat, dog, or bird?”
- Language models: “What’s the next word in this sentence?”
- Game AI: “Which move should I make?” (see Reinforcement Learning)
- Recommendation systems: “Which video should I suggest?”
Key Takeaways
- Softmax converts raw scores into probabilities (0-100%)
- All probabilities sum to exactly 100%
- Higher scores get higher probabilities
- It’s “soft” because every option keeps some probability
- Temperature controls how confident the output is
See Also
- Reinforcement Learning — uses softmax for choosing actions
- Deep Q Networks (DQN) — uses temperature to balance exploration