Combined Results Machine Studying for Excessive-Cardinality Categorical Variables — Half II: A Demo of the GPBoost Library

A demo of GPBoost in Python & R utilizing real-world knowledge

**Illustration of high-cardinality categorical knowledge**: field plots and uncooked knowledge (crimson factors) of the response variable for various ranges of a categorical variable — Picture by writer

Excessive-cardinality categorical variables are variables for which the variety of totally different ranges is giant relative to the pattern measurement of a knowledge set. In Part I of this collection, we did an empirical comparability of various machine studying strategies and located that random results are an efficient instrument for dealing with high-cardinality categorical variables with the GPBoost algorithm [Sigrist, 2022, 2023] having the best prediction accuracy. On this article, we display how the GPBoost algorithm, which mixes tree-boosting with random results, could be utilized with the Python and R packages of the GPBoost library. GPBoost model 1.2.1 is used on this demo.

Desk of contents

∘ 1 Introduction
∘ 2 Data: description, loading, and sample split
∘ 3 Training a GPBoost model
∘ 4 Choosing tuning parameter
∘ 5 Prediction
∘ 6 Interpretation
∘ 7 Further modeling options
· · 7.1 Interaction between categorical variables and other predictor variables
· · 7.2 (Generalized) linear mixed effects models
∘ 8 Conclusion and references

Making use of a GPBoost mannequin includes the next most important steps:

Outline a GPModel by which one specifies the next:
— A random results mannequin: grouped random results through group_data and/or Gaussian processes through gp_coords
— The chance (= distribution of the response variable conditional on fastened and random results)
Create a Dataset containing the response variable (label) and glued results predictor variables (knowledge)
Select tuning parameters, e.g., utilizing the perform gpb.grid.search.tune.parameters
Prepare the mannequin
Make predictions and/or interpret the skilled mannequin