Parameters vs Inputs
Parameters vs Inputs: The Core Distinction
Inputs are the data that flows through your model to get predictions. Parameters are the internal values that define how your model transforms inputs into outputs. Think of inputs as the “questions” and parameters as the “knowledge” your model uses to answer them.
Key Differences
Inputs (x)
- Change with every prediction - Each example has different input values
- Come from the outside world - The data you want predictions for
- Not learned - They’re given to you
- Flow through the model - Pass through the function to produce output
- Examples: pixel values of an image, words in a sentence, temperature readings
Parameters (w, b, θ)
- Stay fixed during prediction - Same values used for all examples
- Learned during training - The model discovers these values
- Define the model’s behaviour - They ARE the model
- Updated to minimise loss - Adjusted through optimisation
- Examples: weights, biases, convolution filters
Simple Analogy
Think of a recipe:
- Inputs = ingredients (flour, eggs, milk)
- Parameters = quantities and instructions (2 cups, 350°F, mix for 3 minutes)
- Output = the final dish
The same recipe (parameters) can be applied to different ingredients (inputs). Training is like perfecting the recipe through trial and error.
In Mathematical Terms
- x = input (changes for each example)
- w, b = parameters (fixed after training)
- When x = 5: f(5) = w(5) + b
- When x = 10: f(10) = w(10) + b
- Same w and b, different x
During Training vs Inference
Training Phase:
- Inputs: training examples flow through
- Parameters: continuously updated to improve predictions
- Process: input → current parameters → prediction → compare to target → update parameters
Inference Phase:
- Inputs: new examples flow through
- Parameters: frozen at their trained values
- Process: input → fixed parameters → prediction
In Neural Networks
This scales up dramatically:
Inputs:
- Image: millions of pixel values
- Text: sequence of word tokens
- Audio: waveform samples
Parameters:
- Modern language model: billions of weights
- Each layer has its own weight matrices and biases
- All learned during training, fixed during use
Why This Distinction Matters
Generalization - Parameters capture patterns that work across many different inputs. A good model learns parameters that handle inputs it’s never seen before.
Transfer Learning - You can take learned parameters from one task and apply them to new inputs from a related task.
Model Size - When we say “GPT-3 has 175 billion parameters,” we’re counting the learned values, not the inputs it processes.
Optimization - Gradient descent updates parameters, not inputs. We’re solving for the best parameters given fixed training inputs.
Common Notation Patterns
- Inputs: x, X, x^(i), input, features, data
- Parameters: w, W, b, θ (theta), β (beta), weights, coefficients
- Outputs: y, ŷ, predictions, targets
A Practical Example
Consider predicting house prices:
Inputs (per house):
- Square footage: 2000
- Bedrooms: 3
- Location: downtown
Parameters (same for all houses):
- Weight for sq ft: 150
- Weight for bedrooms: 10000
- Weight for downtown: 50000
- Bias: 100000
Calculation: Price = 150(2000) + 10000(3) + 50000(1) + 100000 = $480,000
Different house (inputs) → same parameters → different price
The model’s “knowledge” about how features relate to price is encoded in the parameters, while the specific house details are the inputs.