aiboard > Blog > Artificial Intelligence > From Random Forests to Gradient Boosting: A Journey Through Ensemble Learning

From Random Forests to Gradient Boosting: A Journey Through Ensemble Learning

July 25, 2023
Posted by: Kulbir Singh
Category: Artificial Intelligence Data Science

No Comments

Ensemble Learning

Ensemble learning is a powerful machine learning concept that involves combining several models to produce predictions that are more robust than those of the individual models. The idea is to reduce bias and variance, making the model more flexible and less sensitive to specific data.

For example, imagine you’re trying to predict the weather for tomorrow. Instead of relying on a single forecast model, you use several models (like a decision tree, logistic regression, and a neural network). Each model makes its own prediction, and the final prediction is based on the majority vote of all models. This is the essence of ensemble learning.

Two popular ensemble methods are bagging and boosting:

Bagging: This method involves training several models in parallel, each on a random subset of the data. The final prediction is based on the majority vote of all models. A common example of bagging is the Random Forest algorithm.

Boosting: This method involves training several models sequentially, where each model learns from the mistakes of the previous one. The final prediction is a weighted vote of all models. Examples of boosting algorithms include AdaBoost and Gradient Boosting.

Bagging

Bagging methods work best on high variance, low bias models (i.e., complex models). This is because bagging methods aim to decrease the model’s variance without increasing the bias. Therefore, bagging methods are usually applied to complex models which tend to have high variance.

Bagging is used when the goal is to reduce the variance of a decision tree classifier. Here the objective is to create several subsets of data from the original dataset (with replacement), training each subset with a decision tree model, and then aggregate the output of all the trees to make a final decision. The aggregation function usually takes the form of voting for classification problems and averaging for regression problems. Bagging is a simple way to create different models without worrying about choosing a learning rate.

Algorithms that fall under bagging include:

Random Forest

Random Forest

Random Forest is an ensemble learning method that operates by constructing multiple decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.

Here’s a step-by-step breakdown of how Random Forest works:

Step 1: Select n random subsets from the training set.
Step 2: Train n decision trees. One random subset is used to train one decision tree.
Step 3: Each individual tree predicts the records/candidates in the test set, independently.
Step 4: The final prediction is the one that receives the majority of votes from all the decision trees.

Boosting

Boosting methods typically work best on high bias, low variance models (i.e., simple models). Boosting methods aim to decrease the model’s bias without impacting its variance. Therefore, boosting methods are usually applied on simple models which tend to have high bias.

Boosting is used when the goal is to reduce bias and also to reduce variance. Boosting operates in a stage-wise way like bagging, but the key difference is that the trees are not fit on completely resampled versions of the original data. Instead, each tree is fit on a modified version of the original data where the modifications at each step are determined by the performance of the previously grown tree. Boosting tends to give higher accuracy than bagging, but it also tends to overfit the training data as well.

Algorithms that fall under boosting include:

AdaBoost (Adaptive Boosting)
Gradient Boosting

AdaBoost (Adaptive Boosting)

AdaBoost, short for Adaptive Boosting, is a machine learning algorithm that is used as a classifier. When you have a large amount of data and you want to find a way to classify the data, you can use AdaBoost. It’s called ‘Adaptive’ because it adapts to the data it is currently training on.

Here’s a step-by-step breakdown of how AdaBoost works:

Step 1: Initialize the weights of data points. If the training set has 100 data points, then each point’s initial weight should be 1/100 = 0.01.
Step 2: Train a decision tree.
Step 3: Calculate the weighted error rate (e) of the decision tree. The weighted error rate (e) is just how many wrong predictions out of total and you treat the wrong predictions differently based on its data point’s weight. The higher the weight, the more the corresponding error will be weighted during the calculation of the (e).
Step 4: Calculate this decision tree’s weight in the ensemble. The weight of this tree = learning rate * log( (1 — e) / e). The higher weighted error rate of a tree, the less decision power the tree will be given during the later voting. The lower weighted error rate of a tree, the higher decision power the tree will be given during the later voting.

Step 5: Update weights of wrongly classified points. The weight of each data point = if the model got this data point correct, the weight stays the same. If the model got this data point wrong, the new weight of this point = old weight * np.exp(weight of this tree).
Step 6: Repeat Step 1 until the number of trees we set to train is reached.
Step 7: Make the final prediction. The AdaBoost makes a new prediction by adding up the weight (of each tree) and multiply the prediction (of each tree).

Gradient Boosting

Gradient Boosting is another boosting model. Unlike AdaBoost, which tweaks the instance weights at every interaction, this method tries to fit the new predictor to the residual errors made by the previous predictor.

Here’s a step-by-step breakdown of how Gradient Boosting works:

Step 1: Train a decision tree.
Step 2: Apply the decision tree just trained to predict.
Step 3: Calculate the residual of this decision tree, Save residual errors as the new y.
Step 4: Repeat Step 1 until the number of trees we set to train is reached.
Step 5: Make the final prediction. The Gradient Boosting makes a new prediction by simply adding up the predictions (of all trees).

In conclusion, ensemble methods like Random Forest, AdaBoost, and Gradient Boosting are powerful tools for improving the performance of machine learning models. They combine the predictions of multiple models to produce a final prediction that is more robust and accurate.

Building a Transformer-based Language Model from scratch to generate text

A Large Language Model (LLM) is a type of deep learning model trained on massive text datasets to understand and generate human language.

AI vs Human

Humans and artificial intelligence (AI) have been contrasted and compared frequently.Some worry that AI will drive people out of many professions, while others argue that AI will never fully replace human intelligence and creativity.

Autonomous Vehicles: The Road Ahead Powered by AI

Autonomous vehicles, also known as self-driving cars, are like smart robots that can drive themselves without a human driver.

Author:Kulbir Singh

I am an analytics and data science professional with over two decades of experience in IT, specializing in leveraging data for strategic decision-making and actionable insights. Proficient in AI and experienced across healthcare, retail, and finance, I have led impactful projects, improving healthcare quality and reducing costs. Recognized with international achievements and multiple awards, I founded AIBoard (https://aiboard.io/), authoring educational articles and courses on AI. With a Master's degree in Data Science, I drive innovation, mentor teams, and contribute to AI and healthcare advancement through publications and speaking engagements. In addition to his professional work, Singh is active in multiple IT communities, contributes as an active blogger and educator, and is a member of the judging committee member for Globee awards. Kulbir has completed his Master's in Computer Science in Data Science from the University of Illinois at Urbana Champaign.