Combined Results Machine Studying for Excessive-Cardinality Categorical Variables — Half II: A Demo of the GPBoost Library

A demo of GPBoost in Python & R utilizing real-world knowledge

Illustration of high-cardinality categorical knowledge: field plots and uncooked knowledge (crimson factors) of the response variable for various ranges of a categorical variable — Picture by writer

Excessive-cardinality categorical variables are variables for which the variety of totally different ranges is giant relative to the pattern measurement of a knowledge set. In Part I of this collection, we did an empirical comparability of various machine studying strategies and located that random results are an efficient instrument for dealing with high-cardinality categorical variables with the GPBoost algorithm [Sigrist, 2022, 2023] having the best prediction accuracy. On this article, we display how the GPBoost algorithm, which mixes tree-boosting with random results, could be utilized with the Python and R packages of the GPBoost library. GPBoost model 1.2.1 is used on this demo.

Desk of contents

1 Introduction
2 Data: description, loading, and sample split
3 Training a GPBoost model
4 Choosing tuning parameter
5 Prediction
6 Interpretation
7 Further modeling options
· · 7.1 Interaction between categorical variables and other predictor variables
· · 7.2 (Generalized) linear mixed effects models
8 Conclusion and references

Making use of a GPBoost mannequin includes the next most important steps:

  1. Outline a GPModel by which one specifies the next:
    — A random results mannequin: grouped random results through group_data and/or Gaussian processes through gp_coords
    — The chance (= distribution of the response variable conditional on fastened and random results)
  2. Create a Dataset containing the response variable (label) and glued results predictor variables (knowledge)
  3. Select tuning parameters, e.g., utilizing the perform
  4. Prepare the mannequin
  5. Make predictions and/or interpret the skilled mannequin

Within the following, we undergo these factors step-by-step.

Leave a Reply

Your email address will not be published. Required fields are marked *