Machine Learning

“all models are wrong”  – George Box

“but one can be justified” – HPGM Lab

How important is nearness for uncertainty quantification?
If we take a look at the figure below we can see that the function defined by connecting the points can simply be computed by doing a polynomial regression of any degree. And to our surprise everyone of them can be justified | the complexity of the model.

But given a choice for a particular fit (characterised by degree n) which one do we go for? The answers may differ. Perhaps a degree n=1, 2 … etc. is a “okay” for some paper; but in the real scenario, modelling demands learning from data. The figure shows how Gaussian Processes Regression (GPR) tackles this fitting problem by modelling the similarity of the points using Euclidean-distance measure to generate adequate functions which honour these given points. The “more near” the points are the “more similar” the contribution to the fitted function.
Here is a simple 1D function fitted using Gaussian Process Regression (GPR).

GPR_1D

Top (L->R): a) Collection of points (black stars) through which we want to fit a 1D function, b) with no control points GPR produces functions which are not of much use (a-priori), c) Uncertainty is quantified in terms of standard deviation among the generated functions

Bottom (L->R): a) the same collection of points and a fitted 1D function , b) now using the control points in generating the functions (posterior), c) uncertainty between points which are closer becomes less; but as we move far from these control points uncertainty increases as a function of distance