Deep learning Vs. XG Boost: Rivalry revealed

In most data science contests, we come across the common phenomenon of top scorers using one of the two algorithms, XG Boost or Deep Learning. However, we all know that there isn’t any algorithm that works across all types of data sets. So, how do we choose and what do we choose? It’s always a trade-off based on what you are looking for as an outcome.

A quick background

Before we get into evaluating XG Boost and Deep learning, here’s a quick background on Bias-Variance trade off. It means – an error in the learning algorithm due to wrong (biased) assumptions, leading to a difference between the result predicted and the actual one. High bias can lead to ‘underfitting’ in models resulting in missing out critical and relevant information affecting the targeted output.

And variance is caused due to the sensitivities of even the smallest fluctuation from the training set i.e. if a model is picking up irrelevant information from small set of data points, it would lead the new data points towards a wrong prediction. Contrary to the under-fitting, this situation is called ‘over-fitting.’


In short, a model that has a low bias factor and is less complex, will be more reliable. With that at in the backdrop, let’s evaluate the two algorithms.

How XG boost works?

XG boost is a powerful classifier that works well on both basic and more complex recognition problems. It works by creating a highly accurate classifier by combining many relatively weak and inaccurate classifiers. In other words, it creates a strong learner by combining more than one weak learner after every iteration depending on the distribution.

So, after adding a weak learner, data is reweighed. As a result, examples that are misclassified gain weightage and examples that are classified correctly lose weightage, thus affecting the future classification.

Looking at it mathematically, let us define an objective function which will decide the performance of the model:


where L is the training loss function, and Ω is the regularization term. The training loss measures how predictive our model is on training data.

The prediction scores of each individual tree are summed up to get the final score. So, we can write our model in the form,


where K is the number of trees, f is a function in the functional space F, and F is the set of all possible CARTs. Therefore, our objective to optimize can be written as,


Now we input what it would learn in each step and add one more tree. Now we get prediction value at each step will be:


We will add those trees which will minimize our objective function.


This way XG Boost will create strong learners from weak learners and will produce the optimum accuracy.

How Deep learning works?

Deep net which is created by the principle of Neural Net is responsible for auto-detection of faces, self-driving cars etc. Neural Networks (NN) is created mimicking the working of a human nervous system.

There are 3 layers of a Neural Network:

  • Input Layer: The training observations are fed through these neurons.
  • Hidden Layers: These are the intermediate layers between input and output which help the Neural Network learn the complicated relationships involved in data.
  • Output Layer: The final output is extracted from previous two layers. For Example: In case of a classification problem with 5 classes, the output later will have 5 neurons.

This is how a neuron works:

  • x1, x2,…, xN:Inputs to the neuron. It can either be the actual observations from input layer or an intermediate value from one of the hidden layers.
  • x0:Bias unit. This is a constant value added to the input of the activation function. It works similar to an intercept term and typically has +1 values.
  • w0,w1, w2,…,wN:Weights on each input. Note that even bias unit carries a weightage.
  • a:Output of the neuron is calculated as:


* here “f” is known as activation function.


After each iteration, it updates the weight of each neurons to rectify the errors by chain rule. It is known as back propagation.

XG Boost or Deep Net?

What model to pick, depends on your data set and its types.

XG boost is fast, works well with smaller and structured data sets with lesser variables. It is easy to tune parameters for XG Boost. Some times for text mining problems as well we can use XG boost. But sometimes with real world data sets it performs very badly, due to a high percentage chance of capturing noise. It tends to give wrong predictions when the no. of independent variables is large. Also, it cannot increase the accuracy on scale, compared to algorithms such as Random forest, SVM etc.

Deep net, on the other hand, can increase the accuracy magically compared to most algorithms. It can work on both structured and unstructured data such as text mining, image recognition, distinguishing sound waves, and more. But tuning parameters in deep learning requires a lot of time and experience, if not, it may lead to bad results.

To sum up, if you have lesser data points and not much noise – go for XG boost. If you are working on large “enough” data sets – you can bet on deep net.

Looking forward to hear your experiences and experiments.

About the Author

Radhakrishnan is curious about algorithms and modelling, proficient in python and an ardent datascience buff. He enjoys food, driving, travelling and spending time with books.