Parameters vs Inputs

Parameters vs Inputs: The Core Distinction

Inputs are the data that flows through your model to get predictions. Parameters are the internal values that define how your model transforms inputs into outputs. Think of inputs as the “questions” and parameters as the “knowledge” your model uses to answer them.

Key Differences

Inputs (x)

Change with every prediction - Each example has different input values
Come from the outside world - The data you want predictions for
Not learned - They’re given to you
Flow through the model - Pass through the function to produce output
Examples: pixel values of an image, words in a sentence, temperature readings

Parameters (w, b, θ)

Stay fixed during prediction - Same values used for all examples
Learned during training - The model discovers these values
Define the model’s behaviour - They ARE the model
Updated to minimise loss - Adjusted through optimisation
Examples: weights, biases, convolution filters

Simple Analogy

Think of a recipe:

Inputs = ingredients (flour, eggs, milk)
Parameters = quantities and instructions (2 cups, 350°F, mix for 3 minutes)
Output = the final dish

The same recipe (parameters) can be applied to different ingredients (inputs). Training is like perfecting the recipe through trial and error.

In Mathematical Terms

f_{w,b}(x) = wx + b

x = input (changes for each example)
w, b = parameters (fixed after training)
When x = 5: f(5) = w(5) + b
When x = 10: f(10) = w(10) + b
Same w and b, different x

During Training vs Inference

Training Phase:

Inputs: training examples flow through
Parameters: continuously updated to improve predictions
Process: input → current parameters → prediction → compare to target → update parameters

Inference Phase:

Inputs: new examples flow through
Parameters: frozen at their trained values
Process: input → fixed parameters → prediction

In Neural Networks

This scales up dramatically:

Inputs:

Image: millions of pixel values
Text: sequence of word tokens
Audio: waveform samples

Parameters:

Modern language model: billions of weights
Each layer has its own weight matrices and biases
All learned during training, fixed during use

Why This Distinction Matters

Generalization - Parameters capture patterns that work across many different inputs. A good model learns parameters that handle inputs it’s never seen before.

Transfer Learning - You can take learned parameters from one task and apply them to new inputs from a related task.

Model Size - When we say “GPT-3 has 175 billion parameters,” we’re counting the learned values, not the inputs it processes.

Optimization - Gradient descent updates parameters, not inputs. We’re solving for the best parameters given fixed training inputs.

Common Notation Patterns

Inputs: x, X, x^(i), input, features, data
Parameters: w, W, b, θ (theta), β (beta), weights, coefficients
Outputs: y, ŷ, predictions, targets

A Practical Example

Consider predicting house prices:

Inputs (per house):

Square footage: 2000
Bedrooms: 3
Location: downtown

Parameters (same for all houses):

Weight for sq ft: 150
Weight for bedrooms: 10000
Weight for downtown: 50000
Bias: 100000

Calculation: Price = 150(2000) + 10000(3) + 50000(1) + 100000 = $480,000

Different house (inputs) → same parameters → different price

The model’s “knowledge” about how features relate to price is encoded in the parameters, while the specific house details are the inputs.