How to #3 : Tree Optimization for XGBoost.

Adam Davis
5 min readJan 16, 2023
Henri Matisse. The Codomas (Les Codomas), 1943

“Exactitude is not truth” — Henri Matisse

How do we know that XGBoost is training correctly? Is there a way to visualize what it is scoring as it is training? Short answer is, yes.

Typically visualizing training and test accuracy, AUC or f-scores during training/testing is left up to neural networks or Tensorflow. It is expected and knowing the amount of iterations (epochs) is very helpful. We are able to tell if our model needs more training or if it will still predict effectively with fewer epochs.

If no maximum estimator, or trees to be used, are specified the algorithm is set to 100 estimators. We are able to set the estimators ourselves. This can be helpful to break through possible local minima or maxima. There is also an early stopping metric that can be set as well where if a score doesn’t change for a specific number of metric the training terminates. Either would work in this situation depending on what is needed. Here we are trying to maximize the AUC (area under the curve).

Setting up the training:

We first do a standard split for our pre-processed data of 30% for testing. The model is only trained on the specific training data that we allotted for of 70% total data. The training and test sets are combined so that the metric is visible during the amount of estimators that we have chosen which is 1,000. The output of this will be similar to the following:

Validation_0-auc is the AUC of the training data and the other is the test data evaluated at each specific tree during training. The results of this training are able to be indexed as well.

The resulting plot of the training cycle of the 1,000 estimators:

It is visible that more estimators don’t equal an improvement in our AUC. In this case the fewer the better. As the cycle is able to be indexed the maximum AUC on our test set is able to be given:

Training is as follows:

Only seven trees were needed to effectively train our model. We can then view the metrics through a confusion matrix:

Accuracy could of course be better though is not always the best metric for binary classification. In this case we will use our f score for desired metric. Most out of the box algorithms will have a .5 threshold in probability of belonging to one class or not for binary classification problems. It is possible to affect the predictions by tuning this probability threshold as well.

Tuning threshold probability:

The probability threshold will be checked 1,000 times between 0 and 1 in intervals of .001. Each specific metric (AUC, accuracy and f-score) is tabulated at each iteration so the threshold can be chosen which produces the maximum metric possible.

Maximum of each metric

In this case we will choose the threshold of .267 to maximize our f score. The following code does that as well as displays the resulting confusion matrix:

This improves the model. The f score is improved almost two full percentage points. We can see that the negative case was predicted almost perfectly while the positive class has 84% accuracy. This would be a good starting point to think about a direction to proceed and how the model is able to be used. Ideally we would like to have 100% accuracy but that is highly unlikely. As we would like to maximize our positive cases that is one of the most important. We can see that the actual positive cases were predicted at almost 100%. A majority of the actual negative cases however were predicted incorrectly. This is still an improvement. Without this model all cases would essentially be counted as positive. We are able to handle a little bit of negative cases to go along with our positive cases as long as the model is able to predict the negative cases with a high accuracy. The metrics are up to the user and can be adjusted according to what saves the most resources.

--

--