July 12, 2024


Building useful machine learning models

Overfitting causes your model to miss its target, Photo by engin akyurt on Unsplash


In this post, I will share four practical ways you can avoid overfitting when building machine learning (ML) models and why they are effective.

Overfitting is an undesirable condition that occurs when a model is fitted too close to the training data that it becomes unable to generalize well to new examples, that is, unable to give accurate predictions for previously unseen datasets.

Let’s face it, overfitting makes ML models unusable.

The most common way to detect overfitting is by comparing the model performance on the train and test datasets. An overfitted model would have significantly worse performance on the test set compared with the training set.

4 Effective Ways to Prevent Overfitting

Image by author

To build useful ML models, we need to tackle overfitting effectively. Here are some pragmatic ways to avoid overfitting your models:

  1. Adding More Data

Yes, increasing the number of training examples is a sure way to reduce overfitting. Unless the training set properly represents the real-world patterns in the overall data, your model is likely to fit too closely to the limited data thereby making it ungeneralizable to new datasets it has not been trained on.

This can be achieved by collecting new samples (which might be difficult in some applications) or data augmentation (which is creating a variation of existing samples). For example, in image classification tasks, data augmentation has been found to give a significant improvement in model performance.

Why it works: As the amount of data is increased, the model tends to generalize more since it fits a smaller fraction of the training samples.

2. Tuning Hyperparameters

Complex ML models are typically more accurate than simpler models but the former is more susceptible to overfitting.

In this context, ML models become more complex when they have more hyperparameters or the longer they are trained. The values of hyper-parameters control the learning process, cannot be estimated from the data, and are external to the model. Hence, they are set before the training begins.

This solution depends on the algorithm type and can be done by setting optimal hyperparameter values. Some common examples include:

  • Tree methods (Decision trees, Random forests, LGBM, XGBoost): reducing the maximum depth of each tree (i.e, number of leaves per tree).
  • Neural networks: early stopping (stopping the training process earlier) and reducing the number of hidden layers (removing some neurons).
  • K nearest neighbors (KNN): increasing the value of K.

Why it works: Model complexity can be reduced by tuning hyperparameters which reduces the chance of overfitting. For instance, stopping the training process earlier results in a simpler model. In the case of KNN, increasing the value of K reduces the sensitivity of the model to local patterns since more neighbors are used in the training process. This makes the model more generalizable to new datasets and hence, less overfitted.

3. Reducing the Number of Features

Overfitting may result from having a limited amount of data with a lot of features. However, reducing the number of features can cause a loss of vital information and must be done carefully.

One idea is to manually remove features with low predictive power during exploratory data analysis. Features having stronger correlations with the target variable have more predictive power. Also, the predictive power can be measured using feature importance scores and mutual information.

In addition, techniques such as Principal Component Analysis (PCA) may be used to reduce the dimensionality of the data. Using this method, the number of components may be chosen such that only features contributing significant information (explained variance) are selected. However, PCA results in a loss of interpretability of the transformed features.

Why it works: Using fewer features reduces model complexity which helps to prevent overfitting.

4. Regularization

This method is used to constrain the complexity of an ML model by penalizing features with little or no contribution to the learning process.

Some regularization types include L1 (least absolute shrinkage and selection operator — LASSO), L2 (Ridge), and Dropout which is specifically for Neural Networks.

Whereas LASSO shrinks the contribution of less important features to zero thereby eliminating the features, Ridge reduces the contribution of such features. More details about regularization can be found here.

Why it works: Regularization limits the model complexity which reduces the chance of overfitting.

4 Effective Ways to Prevent Overfitting and Why They Work Republished from Source https://towardsdatascience.com/4-effective-ways-to-prevent-overfitting-and-why-they-work-f6e3b98aefda?source=rss—-7f60cf5620c9—4 via https://towardsdatascience.com/feed


Source link