Statistics and AI

A model can fit the past beautifully and still predict badly.

This prototype introduces regression and overfitting by comparing training error with test error. Learners can change model complexity, noise and the amount of training data.

The aim is to make a central idea in statistics and machine learning visible: fitting data is not the same as understanding the pattern.

Regression lab

Fit a model and watch generalisation change

The blue points are training data used to fit the model. The orange points are test data used to ask whether the model generalises beyond the data it saw.

Model degree

Training MSE

1.594

Test MSE

1.253

Generalisation gap

-0.34

Controls

Polynomial degree: 1

Training points: 12

Noise level: 1.6

Model behaviour

A straight line has low flexibility. It may underfit if the real pattern curves.

A large positive gap means the model fits the training data better than the test data, which is a warning sign for overfitting.

What this shows

Regression is about fitting a model to data, but a good fit to the training data is not the same as a useful model.

Overfitting happens when a model follows noise rather than the underlying pattern.

This is one reason machine learning uses test data, validation data and careful model selection.

Guided tasks

Increase the degree. When does the training error improve but the test error get worse?

Add more training points. Does this make overfitting harder?

Increase the noise. Why does the best model become harder to choose?

Joy in the process

The point is not just to finish. It is to notice, test and return.

These tools are invitations to explore. A good mistake, a surprising pattern or a question you cannot yet answer is part of the work, not a failure of it.

The challenge is deliberate: the site should support thinking, not remove the need for it.

Before changing a setting, pause and predict what you think will happen.

Change one thing at a time. What stayed the same, and what changed?

Try to create a surprising case, a broken case, or a beautiful pattern.

Ask what this connects to outside the page: maps, movement, nature, systems or decisions.

Reset, then try again with a new question in mind.

Future extensions

This can grow into a broader modelling and machine learning lab.

Add train-validation-test splits and model selection.

Add regularisation to show how complexity can be controlled.

Add editable data points so learners can create outliers.

Add classification and decision boundary examples as a follow-up.