July 12, 2024


The art of learning through data

Photo by Markus Winkler on Unsplash


This article covers the three main units of machine learning (ML) modeling that you should master as a data scientist or machine learning engineer. ML modeling, the art of learning through data, is an important step in the data science project life cycle and perhaps the most interesting for data practitioners.

Image by author

We learn every day by perception, visual inspection, and hearing. In addition, we make decisions about tomorrow from our previous experiences. Machine learning is a branch of artificial intelligence that mimics the human ability to learn by uncovering data patterns, that is, relationships between features and the target variable.

Whereas features are independent variables that describe a given observation or data point, the target variable is the dependent variable we are interested in modeling usually to make predictions.

Additionally, the three main types of machine learning are highlighted below:

  1. Supervised learning: Here, the algorithm is trained with the target variable provided in the examples. More details can be found here.
  2. Unsupervised learning: In this case, the algorithm is trained only on the features without the target provided in the examples. More details can be found here.
  3. Reinforcement learning: Here, learning is done by interacting with the environment while providing negative and positive rewards to the algorithm as necessary. More details can be found here.

The application of ML modeling varies across academic fields and industries from discovering innovative protein structures to predicting weather and energy demands. Also, ML offers great value for businesses in the area of fraud detection, sales prediction, and customer segmentation.

Three Main Components of Machine Learning Modeling

Main components of ML modeling (image by author)

Now, let’s delve into the three main units of ML modeling.


This is perhaps the most important component of ML modeling. The data is collected based on the business problem to be solved, preprocessed, and explored before it is used for modeling. The data quality strongly depends on the exploration and preprocessing steps. Also, the original data can be further enriched before modeling.

Data quality is the most important factor that affects predictive performance in ML modeling.

In previous articles, I covered the critical components of data preprocessing namely data integration, cleaning, and transformation. In addition, I have discussed the pitfalls to avoid during data exploration. The links can be found below:

Data integration:

Data cleaning:

Data transformation:

Data exploration:


This is the component of ML modeling that is fitted with the data to learn the patterns and relationships between features and target variables. Several ML algorithms exist based on how they learn from data. Some examples include decision trees, random forest, KMeans, DBSCAN, and Neural Networks.

In data science practice, different algorithms are considered when solving a given problem. This process is experimental and highly problem-dependent.

In addition, Scikit-learn is the most widely used Python library for implementing different ML algorithms. Other popular Python libraries for ML modeling include LightGBM, XGBoost, Tensorflow, Keras, Pytorch, and Lightning.

The fit method is the main characteristic function in every algorithm.


An ML model is an object that results from fitting an algorithm to the data. It has been trained to identify data patterns and can make predictions where necessary.

In practice, an experiment is conducted using different values of the hyper-parameters of an algorithm to create several models which are evaluated using different metrics. The “best” model is chosen and deployed to production.

The processes involved in creating an ML model are outside the scope of this discussion and will be covered in a different article.

The predict method is the main characteristic function in many ML models.


In this article, we covered the main components of machine learning modeling: data, algorithm, and model. We also highlighted the different types of machine learning and some fields of application.

I hope you enjoyed this article, until next time. Cheers!

Three Main Components of Machine Learning Modeling Republished from Source https://towardsdatascience.com/three-main-components-of-machine-learning-modeling-c9186658f7fe?source=rss—-7f60cf5620c9—4 via https://towardsdatascience.com/feed




Source link