May 23, 2019

Intuitive Explanation to Bias-Variance Tradeoff


An important concept when studying Machine Learning is to understand Bias and Variance. In this blog we will try to understand these concepts intuitively. But before that let's see what Underfitting, Rightfitting, Overfitting mean. Below image shows three scenarios in a right way.

Underfitting occurs when the model or the algorithm does not fit the data well enough.
Overfitting occurs when the model or the algorithm fits the training data really well (by fitting a squiggly line) but fails to capture trends on the unseen/test data.
Rightfitting is a sweet spot between the both. Here, we find the line that represents both our training and testing data well.

Let's take first diagram in the above image. It's clearly evident that if we apply methods like Linear Regression for capturing the data distribution it's simply not going to work because the a straight line does not have the flexibility to fit curves resulting in high fitting error on training data. On the other hand, if we fit a squiggly line to the curve (show by second diagram) in the above image, we see that the line tightly fits all the points resulting in almost zero fitting error on the training data. 

Now, what if we displace each data point up or down on the vertical axis by a certain distance. The straight line will still be able to capture the distribution with similar error as before. But in case of Squiggly fit, the error will spike compared to zero in the previous case and continue to do so if distance was increased even more.

Squiggly line has low bias since it is flexible to adapt to data distribution for training data. On the other side, it also has high variance because it results in high error on unseen or little change in the data.

Straight line has relatively high bias, since it is not able to capture distribution in training data. On the other side, it also has relatively low variance because it results in similar error on unseen data like training.

In case of overfitting, where training error is too less and testing error is really high results in low bias and high variance. On the other side, in case of underfitting, where training error is too large and testing error is also large results in high bias and high variance.

Ideally we would want an algorithm to be somewhere in between these two states having low bias (less error on training data) and low variance (less error on testing data). There are various ways to achieving the same - through ensemble methods, regularization, etc. We will read about them in future blogs.

Feel free to share your thoughts - Thanks