Title: | eXtreme RuleFit |
---|---|
Description: | An implementation of the RuleFit algorithm as described in Friedman & Popescu (2008) <doi:10.1214/07-AOAS148>. eXtreme Gradient Boosting ('XGBoost') is used to build rules, and 'glmnet' is used to fit a sparse linear model on the raw and rule features. The result is a model that learns similarly to a tree ensemble, while often offering improved interpretability and achieving improved scoring runtime in live applications. Several algorithms for reducing rule complexity are provided, most notably hyperrectangle de-overlapping. All algorithms scale to several million rows and support sparse representations to handle tens of thousands of dimensions. |
Authors: | Karl Holub [aut, cre] |
Maintainer: | Karl Holub <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.2 |
Built: | 2024-11-22 03:51:07 UTC |
Source: | https://github.com/holub008/xrf |
Produce rules & coefficients for the RuleFit model
## S3 method for class 'xrf' coef(object, lambda = "lambda.min", ...)
## S3 method for class 'xrf' coef(object, lambda = "lambda.min", ...)
object |
an object of class "xrf" |
lambda |
the lasso penalty parameter to be applied as in 'glmnet' |
... |
ignored arguments |
m <- xrf(Petal.Length ~ ., iris, xgb_control = list(nrounds = 2, max_depth = 2), family = 'gaussian') linear_model_coefficients <- coef(m, lambda = 'lambda.1se')
m <- xrf(Petal.Length ~ ., iris, xgb_control = list(nrounds = 2, max_depth = 2), family = 'gaussian') linear_model_coefficients <- coef(m, lambda = 'lambda.1se')
Generate the design matrix from an eXtreme RuleFit model
## S3 method for class 'xrf' model.matrix(object, data, sparse = TRUE, ...)
## S3 method for class 'xrf' model.matrix(object, data, sparse = TRUE, ...)
object |
an object of class "xrf" |
data |
data to generate design matrix from |
sparse |
a logical indicating whether a sparse design matrix should be used |
... |
ignored arguments |
m <- xrf(Petal.Length ~ ., iris, xgb_control = list(nrounds = 2, max_depth = 2), family = 'gaussian') design <- model.matrix(m, iris, sparse = FALSE)
m <- xrf(Petal.Length ~ ., iris, xgb_control = list(nrounds = 2, max_depth = 2), family = 'gaussian') design <- model.matrix(m, iris, sparse = FALSE)
Draw predictions from a RuleFit xrf model
## S3 method for class 'xrf' predict( object, newdata, sparse = TRUE, lambda = "lambda.min", type = "response", ... )
## S3 method for class 'xrf' predict( object, newdata, sparse = TRUE, lambda = "lambda.min", type = "response", ... )
object |
an object of class "xrf" |
newdata |
data to predict on |
sparse |
a logical indicating whether a sparse design matrix should be used |
lambda |
the lasso penalty parameter to be applied |
type |
the type of predicted value produced |
... |
ignored arguments |
m <- xrf(Petal.Length ~ ., iris, xgb_control = list(nrounds = 2, max_depth = 2), family = 'gaussian') predictions <- predict(m, iris)
m <- xrf(Petal.Length ~ ., iris, xgb_control = list(nrounds = 2, max_depth = 2), family = 'gaussian') predictions <- predict(m, iris)
Print an eXtreme RuleFit model
## S3 method for class 'xrf' print(x, ...)
## S3 method for class 'xrf' print(x, ...)
x |
an object of class "xrf" |
... |
ignored arguments |
m <- xrf(Petal.Length ~ ., iris, xgb_control = list(nrounds = 2, max_depth = 2), family = 'gaussian') print(m)
m <- xrf(Petal.Length ~ ., iris, xgb_control = list(nrounds = 2, max_depth = 2), family = 'gaussian') print(m)
Summarize an eXtreme RuleFit model
## S3 method for class 'xrf' summary(object, ...)
## S3 method for class 'xrf' summary(object, ...)
object |
an object of class "xrf" |
... |
ignored arguments |
m <- xrf(Petal.Length ~ ., iris, xgb_control = list(nrounds = 2, max_depth = 2), family = 'gaussian') summary(m)
m <- xrf(Petal.Length ~ ., iris, xgb_control = list(nrounds = 2, max_depth = 2), family = 'gaussian') summary(m)
S3 method for building an "eXtreme RuleFit" model.
See xrf.formula
for preferred entry point
xrf(object, ...)
xrf(object, ...)
object |
an object describing the model to be fit |
... |
additional arguments |
m <- xrf(Petal.Length ~ ., iris, xgb_control = list(nrounds = 2, max_depth = 2), family = 'gaussian')
m <- xrf(Petal.Length ~ ., iris, xgb_control = list(nrounds = 2, max_depth = 2), family = 'gaussian')
See Friedman & Popescu (2008) for a description of the general RuleFit algorithm. This method uses XGBoost to fit a tree ensemble, extracts a ruleset as the conjunction of tree traversals, and fits a sparse linear model to the resulting feature set (including the original feature set) using glmnet.
## S3 method for class 'formula' xrf( object, data, family, xgb_control = list(nrounds = 100, max_depth = 3), glm_control = list(type.measure = "deviance", nfolds = 5), sparse = TRUE, prefit_xgb = NULL, deoverlap = FALSE, ... )
## S3 method for class 'formula' xrf( object, data, family, xgb_control = list(nrounds = 100, max_depth = 3), glm_control = list(type.measure = "deviance", nfolds = 5), sparse = TRUE, prefit_xgb = NULL, deoverlap = FALSE, ... )
object |
a formula prescribing features to use in the model. transformation of the response variable is not supported. when using transformations on the input features (not suggested in general) it is suggested to set sparse=F |
data |
a data frame with columns corresponding to the formula |
family |
the family of the fitted model. one of 'gaussian', 'binomial', 'multinomial' |
xgb_control |
a list of parameters for xgboost. must supply an nrounds argument |
glm_control |
a list of parameters for the glmnet fit. must supply a type.measure and nfolds arguments (for the lambda cv) |
sparse |
whether a sparse design matrix should be used |
prefit_xgb |
an xgboost model (of class xgb.Booster) to be used instead of the model that |
deoverlap |
if true, the tree derived rules are deoverlapped, in that the deoverlapped rule set contains no overlapped rules |
... |
ignored arguments |
Friedman, J. H., & Popescu, B. E. (2008). Predictive learning via rule ensembles. The Annals of Applied Statistics, 2(3), 916-954.
m <- xrf(Petal.Length ~ ., iris, xgb_control = list(nrounds = 2, max_depth = 2), family = 'gaussian')
m <- xrf(Petal.Length ~ ., iris, xgb_control = list(nrounds = 2, max_depth = 2), family = 'gaussian')