Objetivos

El problema

Segumos con nuestro problema central: predecir los ingresos de la ocupación principal (p21) en la EPH del segundo trimestre del 2015. Pero en esta oportunidad queremos evaluar si podemos predecir la no respuesta. Es decir, entrenar un modelo que nos permita predecir qué tan probable es que una persona NO responda ingresos.

Ya hemos preprocesado los datos y estamos listos

Lo primero que tenemos que hacer es importar las librerías con las que vamos a trabajar:

library(caret)
library(tidyverse)
library(rpart)

Luego, cargamos los datos y formateamos un poco algunas etiquetas:

load('../data/EPH_2015_II.RData')

data$pp03i<-factor(data$pp03i, labels=c('1-SI', '2-No', '9-NS'))



data$intensi<-factor(data$intensi, labels=c('1-Sub_dem', '2-SO_no_dem', 
                                            '3-Ocup.pleno', '4-Sobreoc',
                                            '5-No trabajo', '9-NS'))

data$pp07a<-factor(data$pp07a, labels=c('0-NC',
                                        '1-Menos de un mes',
                                        '2-1 a 3 meses',
                                        '3-3 a 6 meses',
                                        '4-6 a 12 meses',
                                        '5-12 a 60 meses',
                                        '6-Más de 60 meses',
                                        '9-NS'))



data <- data %>%
        mutate(imp_inglab1=factor(imp_inglab1, labels=c('non_miss','miss')))

Ahora, nuestra variable a predecir es el indicador imp_inglab. Por ende, elimonamos la \(p21\).


df_train <- data %>%
        select(-p21)

Lo primero que vamos a hacer es crear una partición de datos:

set.seed(1234)
tr_index <- createDataPartition(y=df_train$imp_inglab1,
                                p=0.8,
                                list=FALSE)

Y generamos dos datasets

train <- df_train %>%
        slice(tr_index)

#df_train[tr_index, ]

test <- df_train %>%
        slice(-tr_index)

#df_train[-tr_index,]

Entrenando modelos (train())

CART - Classification and Regression Trees

Empecemos por entrenar algunos árboles simples para tener una idea del proceso. Para entrenar modelos sin tunear hiperparámetros, tenemos que definir un objeto trainControl con method='none'.

fitControl <- trainControl(method = "none", classProbs = FALSE)

Ahora podemos entrenar un árbol poco profundo… digamos, 3.

cart_tune <- train(imp_inglab1 ~ ., 
                 data = df_train, 
                 method = "rpart2", 
                 trControl = fitControl,
                 tuneGrid = data.frame(maxdepth=3),
                 control = rpart.control(minsplit = 1,
                                         minbucket = 1,
                                         cp=0.00000001)
)

Y podemos plotearlo de forma fea:

plot(cart_tune$finalModel)
text(cart_tune$finalModel, pretty=1)

O de forma bonita:

library(rpart.plot)
rpart.plot(cart_tune$finalModel)

Testeemos la performance de este árbol:

table(predict(cart_tune, df_train) , df_train$imp_inglab1)
          
           non_miss  miss
  non_miss    19397  4979
  miss            9    13

¿Qué conclusión pueden sacar al respecto?

Entrenen, ahora, un segundo árbol pero más complejo: maxdepth=10.

cart_tune <- train(imp_inglab1 ~ . , 
                 data = df_train, 
                 method = "rpart2", 
                 trControl = fitControl,
                 tuneGrid = data.frame(maxdepth=10),
                 control = rpart.control(cp=0.0001)
)
rpart.plot(cart_tune$finalModel)

table(predict(cart_tune, df_train) , df_train$imp_inglab1)
          
           non_miss  miss
  non_miss    18836  3830
  miss          570  1162

Hasta aquí estuvimos haciendo trampa. Vamos a ahora a tunear el parámetro de profundidad de forma correcta.

Seteando la partición para evaluar

Primero, fijamos la semilla aleatoria (para asegurarnos la posibilidad de replicabilidad)

set.seed(789)

Podemos usar la función createFolds() para generar los índices. Aquí, pas

cv_index <- createFolds(y = train$imp_inglab1,
                        k=5,
                        list=TRUE,
                        returnTrain=TRUE)

Finalmente, especificamos el diseño de remuestreo mediante la función trainControl:

fitControl <- trainControl(
        index=cv_index,
        method="cv",
        number=5
        )
grid <- expand.grid(maxdepth=c(1, 2, 4, 8, 10, 15, 20))

Y volvemos a entrenar el modelo:

cart_tune
CART 

19519 samples
   25 predictor
    2 classes: 'non_miss', 'miss' 

No pre-processing
Resampling: Cross-Validated (5 fold) 
Summary of sample sizes: 15615, 15615, 15616, 15615, 15615 
Resampling results across tuning parameters:

  maxdepth  Accuracy   Kappa    
   1        0.7986581  0.1090253
   2        0.7986581  0.1090253
   4        0.7986581  0.1090253
   8        0.7997339  0.1261417
  10        0.7997851  0.1290888
  15        0.8011171  0.1513197
  20        0.7707363  0.2210235

Accuracy was used to select the optimal model using
 the largest value.
The final value used for the model was maxdepth = 15.

Seleccionando el mejor modelo

Una vez finalizado el proceso de tunning de los hiperparámetros, podemos proceder a elegir cuál es el mejor modelo y entrenarlo sobre todo el dataset. Podemos ver que el mejor es un árbol que parece demasiado complejo maxdepth=15, por ello, vamos a elegir uno un poco más interpreable

cart_final
CART 

19519 samples
   25 predictor
    2 classes: 'non_miss', 'miss' 

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 19519, 19519, 19519, 19519, 19519, 19519, ... 
Resampling results:

  Accuracy   Kappa    
  0.7945336  0.1356593

Tuning parameter 'maxdepth' was held constant at
 a value of 6

Podemos visualizarlo:

rpart.plot(cart_final$finalModel)

Y generamos las predicciones finales:

y_preds
   [1] non_miss non_miss non_miss non_miss non_miss
   [6] non_miss non_miss non_miss non_miss non_miss
  [11] non_miss non_miss non_miss non_miss non_miss
  [16] non_miss non_miss non_miss non_miss non_miss
  [21] non_miss non_miss non_miss non_miss non_miss
  [26] non_miss non_miss non_miss non_miss non_miss
  [31] non_miss non_miss non_miss non_miss non_miss
  [36] non_miss non_miss non_miss non_miss non_miss
  [41] non_miss non_miss non_miss non_miss non_miss
  [46] non_miss non_miss non_miss non_miss non_miss
  [51] non_miss non_miss non_miss non_miss non_miss
  [56] non_miss non_miss non_miss non_miss non_miss
  [61] non_miss non_miss non_miss non_miss non_miss
  [66] non_miss non_miss non_miss non_miss non_miss
  [71] non_miss non_miss non_miss non_miss non_miss
  [76] non_miss non_miss non_miss non_miss non_miss
  [81] non_miss non_miss non_miss non_miss non_miss
  [86] non_miss non_miss non_miss non_miss non_miss
  [91] non_miss non_miss non_miss non_miss non_miss
  [96] non_miss non_miss non_miss non_miss non_miss
 [101] non_miss non_miss non_miss non_miss non_miss
 [106] non_miss non_miss non_miss non_miss non_miss
 [111] non_miss non_miss non_miss non_miss non_miss
 [116] non_miss non_miss non_miss non_miss non_miss
 [121] non_miss non_miss non_miss non_miss non_miss
 [126] non_miss non_miss non_miss non_miss non_miss
 [131] non_miss non_miss non_miss non_miss non_miss
 [136] non_miss non_miss non_miss non_miss non_miss
 [141] non_miss non_miss non_miss non_miss non_miss
 [146] non_miss non_miss non_miss non_miss non_miss
 [151] non_miss non_miss non_miss non_miss non_miss
 [156] non_miss non_miss non_miss non_miss non_miss
 [161] non_miss non_miss non_miss non_miss non_miss
 [166] non_miss non_miss non_miss non_miss non_miss
 [171] non_miss non_miss non_miss non_miss non_miss
 [176] non_miss non_miss non_miss non_miss non_miss
 [181] non_miss non_miss non_miss non_miss non_miss
 [186] non_miss non_miss non_miss non_miss non_miss
 [191] non_miss non_miss non_miss non_miss non_miss
 [196] non_miss non_miss non_miss non_miss non_miss
 [201] non_miss non_miss non_miss non_miss non_miss
 [206] non_miss non_miss non_miss non_miss non_miss
 [211] non_miss non_miss non_miss non_miss non_miss
 [216] non_miss non_miss non_miss non_miss non_miss
 [221] non_miss non_miss non_miss non_miss non_miss
 [226] non_miss non_miss miss     non_miss miss    
 [231] non_miss non_miss non_miss non_miss non_miss
 [236] miss     non_miss non_miss non_miss miss    
 [241] miss     miss     non_miss miss     miss    
 [246] non_miss non_miss non_miss non_miss non_miss
 [251] non_miss non_miss non_miss non_miss miss    
 [256] non_miss non_miss non_miss non_miss miss    
 [261] non_miss miss     non_miss non_miss non_miss
 [266] miss     non_miss non_miss miss     non_miss
 [271] non_miss non_miss miss     non_miss non_miss
 [276] non_miss non_miss miss     non_miss non_miss
 [281] non_miss non_miss miss     non_miss non_miss
 [286] miss     miss     miss     miss     miss    
 [291] miss     miss     non_miss miss     miss    
 [296] non_miss non_miss non_miss non_miss non_miss
 [301] miss     non_miss non_miss non_miss non_miss
 [306] non_miss non_miss miss     non_miss non_miss
 [311] non_miss miss     miss     non_miss non_miss
 [316] miss     non_miss non_miss miss     non_miss
 [321] miss     non_miss non_miss miss     miss    
 [326] miss     non_miss non_miss non_miss miss    
 [331] non_miss non_miss non_miss non_miss non_miss
 [336] non_miss non_miss miss     non_miss non_miss
 [341] non_miss miss     non_miss non_miss non_miss
 [346] miss     non_miss non_miss miss     non_miss
 [351] non_miss non_miss miss     non_miss non_miss
 [356] non_miss miss     non_miss non_miss non_miss
 [361] non_miss non_miss non_miss non_miss non_miss
 [366] non_miss non_miss non_miss non_miss non_miss
 [371] non_miss non_miss miss     miss     non_miss
 [376] non_miss non_miss non_miss miss     non_miss
 [381] non_miss non_miss non_miss miss     non_miss
 [386] miss     miss     non_miss non_miss non_miss
 [391] non_miss miss     non_miss non_miss non_miss
 [396] non_miss non_miss non_miss non_miss non_miss
 [401] non_miss miss     miss     non_miss non_miss
 [406] non_miss non_miss non_miss non_miss non_miss
 [411] miss     miss     non_miss miss     non_miss
 [416] non_miss miss     miss     miss     miss    
 [421] non_miss non_miss non_miss non_miss non_miss
 [426] non_miss non_miss miss     non_miss non_miss
 [431] non_miss non_miss non_miss non_miss non_miss
 [436] non_miss non_miss non_miss non_miss non_miss
 [441] non_miss non_miss non_miss non_miss non_miss
 [446] non_miss non_miss non_miss non_miss non_miss
 [451] non_miss non_miss non_miss non_miss non_miss
 [456] non_miss non_miss non_miss non_miss non_miss
 [461] non_miss non_miss non_miss non_miss non_miss
 [466] non_miss non_miss non_miss non_miss non_miss
 [471] non_miss non_miss non_miss non_miss non_miss
 [476] non_miss non_miss non_miss non_miss non_miss
 [481] non_miss non_miss non_miss non_miss non_miss
 [486] non_miss non_miss non_miss non_miss non_miss
 [491] non_miss non_miss non_miss non_miss non_miss
 [496] non_miss non_miss non_miss non_miss non_miss
 [501] non_miss non_miss non_miss non_miss non_miss
 [506] non_miss non_miss non_miss non_miss non_miss
 [511] non_miss non_miss non_miss non_miss non_miss
 [516] non_miss non_miss non_miss non_miss non_miss
 [521] non_miss non_miss non_miss non_miss non_miss
 [526] non_miss non_miss non_miss non_miss non_miss
 [531] non_miss non_miss non_miss non_miss non_miss
 [536] non_miss non_miss non_miss non_miss non_miss
 [541] non_miss non_miss non_miss non_miss non_miss
 [546] non_miss non_miss non_miss non_miss non_miss
 [551] non_miss non_miss non_miss non_miss non_miss
 [556] non_miss non_miss non_miss non_miss non_miss
 [561] non_miss non_miss non_miss non_miss non_miss
 [566] non_miss non_miss miss     non_miss non_miss
 [571] non_miss non_miss non_miss miss     non_miss
 [576] non_miss non_miss non_miss non_miss non_miss
 [581] non_miss non_miss non_miss non_miss non_miss
 [586] non_miss non_miss non_miss non_miss non_miss
 [591] non_miss non_miss non_miss non_miss non_miss
 [596] non_miss non_miss non_miss non_miss non_miss
 [601] non_miss non_miss non_miss non_miss non_miss
 [606] non_miss non_miss non_miss non_miss non_miss
 [611] non_miss non_miss non_miss non_miss non_miss
 [616] non_miss non_miss non_miss non_miss non_miss
 [621] non_miss non_miss non_miss non_miss non_miss
 [626] non_miss non_miss non_miss non_miss non_miss
 [631] non_miss non_miss non_miss non_miss non_miss
 [636] miss     non_miss non_miss non_miss non_miss
 [641] non_miss non_miss non_miss non_miss non_miss
 [646] non_miss non_miss non_miss non_miss miss    
 [651] non_miss non_miss non_miss non_miss non_miss
 [656] non_miss non_miss non_miss non_miss non_miss
 [661] non_miss non_miss non_miss miss     non_miss
 [666] non_miss non_miss non_miss non_miss non_miss
 [671] non_miss non_miss non_miss non_miss non_miss
 [676] non_miss non_miss non_miss non_miss non_miss
 [681] non_miss non_miss non_miss non_miss non_miss
 [686] non_miss non_miss non_miss non_miss non_miss
 [691] non_miss non_miss non_miss non_miss non_miss
 [696] non_miss non_miss non_miss non_miss non_miss
 [701] non_miss non_miss non_miss non_miss non_miss
 [706] non_miss non_miss non_miss non_miss non_miss
 [711] non_miss non_miss non_miss non_miss non_miss
 [716] non_miss non_miss non_miss non_miss non_miss
 [721] non_miss non_miss non_miss non_miss non_miss
 [726] non_miss non_miss non_miss non_miss non_miss
 [731] non_miss non_miss non_miss non_miss non_miss
 [736] non_miss non_miss non_miss non_miss non_miss
 [741] non_miss non_miss non_miss non_miss non_miss
 [746] non_miss non_miss non_miss non_miss non_miss
 [751] non_miss non_miss non_miss non_miss non_miss
 [756] non_miss non_miss non_miss non_miss non_miss
 [761] non_miss non_miss non_miss non_miss non_miss
 [766] non_miss non_miss non_miss non_miss non_miss
 [771] non_miss non_miss non_miss non_miss non_miss
 [776] non_miss non_miss non_miss non_miss non_miss
 [781] non_miss non_miss non_miss non_miss non_miss
 [786] non_miss non_miss non_miss non_miss non_miss
 [791] non_miss non_miss non_miss non_miss non_miss
 [796] non_miss non_miss non_miss non_miss non_miss
 [801] non_miss non_miss non_miss non_miss non_miss
 [806] non_miss non_miss non_miss non_miss non_miss
 [811] non_miss non_miss non_miss non_miss non_miss
 [816] non_miss non_miss non_miss non_miss non_miss
 [821] non_miss non_miss non_miss non_miss non_miss
 [826] non_miss non_miss non_miss non_miss non_miss
 [831] non_miss non_miss non_miss non_miss non_miss
 [836] non_miss non_miss non_miss non_miss non_miss
 [841] non_miss non_miss non_miss non_miss non_miss
 [846] non_miss non_miss non_miss non_miss non_miss
 [851] non_miss non_miss non_miss non_miss non_miss
 [856] non_miss non_miss non_miss non_miss non_miss
 [861] non_miss non_miss non_miss non_miss non_miss
 [866] non_miss non_miss non_miss non_miss non_miss
 [871] non_miss non_miss non_miss non_miss non_miss
 [876] non_miss non_miss non_miss non_miss non_miss
 [881] non_miss non_miss non_miss non_miss non_miss
 [886] non_miss non_miss non_miss non_miss non_miss
 [891] non_miss non_miss non_miss non_miss non_miss
 [896] non_miss non_miss non_miss non_miss non_miss
 [901] non_miss non_miss non_miss non_miss non_miss
 [906] non_miss non_miss non_miss non_miss non_miss
 [911] non_miss non_miss non_miss non_miss non_miss
 [916] non_miss non_miss non_miss non_miss non_miss
 [921] non_miss non_miss non_miss non_miss non_miss
 [926] non_miss non_miss non_miss non_miss non_miss
 [931] non_miss non_miss non_miss non_miss non_miss
 [936] non_miss non_miss non_miss non_miss non_miss
 [941] non_miss non_miss non_miss non_miss non_miss
 [946] non_miss non_miss non_miss non_miss non_miss
 [951] non_miss non_miss non_miss non_miss non_miss
 [956] non_miss non_miss non_miss non_miss non_miss
 [961] non_miss non_miss non_miss non_miss non_miss
 [966] non_miss non_miss non_miss non_miss non_miss
 [971] non_miss non_miss non_miss non_miss non_miss
 [976] non_miss non_miss non_miss non_miss non_miss
 [981] non_miss non_miss non_miss non_miss non_miss
 [986] non_miss non_miss non_miss non_miss non_miss
 [991] non_miss non_miss non_miss non_miss non_miss
 [996] non_miss non_miss non_miss non_miss non_miss
 [ reached getOption("max.print") -- omitted 3879 entries ]
Levels: non_miss miss

Generamos nuestra matriz de confusión:

confusionMatrix(y_preds, test$imp_inglab1)
Confusion Matrix and Statistics

          Reference
Prediction non_miss miss
  non_miss     3758  895
  miss          123  103
                                          
               Accuracy : 0.7914          
                 95% CI : (0.7797, 0.8027)
    No Information Rate : 0.7954          
    P-Value [Acc > NIR] : 0.7671          
                                          
                  Kappa : 0.1003          
                                          
 Mcnemar's Test P-Value : <2e-16          
                                          
            Sensitivity : 0.9683          
            Specificity : 0.1032          
         Pos Pred Value : 0.8077          
         Neg Pred Value : 0.4558          
             Prevalence : 0.7954          
         Detection Rate : 0.7702          
   Detection Prevalence : 0.9537          
      Balanced Accuracy : 0.5358          
                                          
       'Positive' Class : non_miss        
                                          

¿Qué se puede decir del árbol de decisión? ¿Cómo funciona?

---
title: "Implementando modelos basados en árboles en `caret`"
author: "Germán Rosati"
output: html_notebook
---

## Objetivos

- Introducir los principales conceptos alrededor de la estimación de modelos basados en árboles de decisión (cart, bagging, random forest) 
- Mostrar su implementación en `caret`
        
## El problema

Segumos con nuestro problema central: predecir los ingresos de la ocupación principal (`p21`) en la EPH del segundo trimestre del 2015. Pero en esta oportunidad queremos evaluar si podemos predecir la no respuesta. Es decir, entrenar un modelo que nos permita predecir qué tan probable es que una persona NO responda ingresos.

Ya hemos preprocesado los datos y estamos listos 

Lo primero que tenemos que hacer es importar las librerías con las que vamos a trabajar:


```{r, message=FALSE}
library(caret)
library(tidyverse)
library(rpart)
```


Luego, cargamos los datos y formateamos un poco algunas etiquetas:


```{r}
load('../data/EPH_2015_II.RData')

data$pp03i<-factor(data$pp03i, labels=c('1-SI', '2-No', '9-NS'))



data$intensi<-factor(data$intensi, labels=c('1-Sub_dem', '2-SO_no_dem', 
                                            '3-Ocup.pleno', '4-Sobreoc',
                                            '5-No trabajo', '9-NS'))

data$pp07a<-factor(data$pp07a, labels=c('0-NC',
                                        '1-Menos de un mes',
                                        '2-1 a 3 meses',
                                        '3-3 a 6 meses',
                                        '4-6 a 12 meses',
                                        '5-12 a 60 meses',
                                        '6-Más de 60 meses',
                                        '9-NS'))


data <- data %>%
        mutate(imp_inglab1=factor(imp_inglab1, labels=c('non_miss','miss')))
```

Ahora, nuestra variable a predecir es el indicador `imp_inglab`. Por ende, elimonamos la $p21$.

```{r}

df_train <- data %>%
        select(-p21)

```

Lo primero que vamos a hacer es crear una partición de datos:

```{r}
set.seed(1234)
tr_index <- createDataPartition(y=df_train$imp_inglab1,
                                p=0.8,
                                list=FALSE)
```

Y generamos dos datasets

```{r}
train <- df_train %>%
        slice(tr_index)

#df_train[tr_index, ]

test <- df_train %>%
        slice(-tr_index)

#df_train[-tr_index,]
```


## Entrenando modelos (`train()`)
### CART -  Classification and Regression Trees

Empecemos por entrenar algunos árboles simples para tener una idea del proceso. Para entrenar modelos sin tunear hiperparámetros, tenemos que definir un objeto `trainControl` con `method='none'`.


```{r}
fitControl <- trainControl(method = "none", classProbs = FALSE)
```

Ahora podemos entrenar un árbol poco profundo... digamos, 3. 

```{r}
cart_tune <- train(imp_inglab1 ~ ., 
                 data = df_train, 
                 method = "rpart2", 
                 trControl = fitControl,
                 tuneGrid = data.frame(maxdepth=3),
                 control = rpart.control(minsplit = 1,
                                         minbucket = 1,
                                         cp=0.00000001)
)
```

Y podemos plotearlo de forma fea:

```{r}
plot(cart_tune$finalModel)
text(cart_tune$finalModel, pretty=1)
```

O de forma bonita:

```{r}
library(rpart.plot)
```

```{r}
rpart.plot(cart_tune$finalModel)
```

Testeemos la performance de este árbol:

```{r}
table(df_train$imp_inglab1)

table(predict(cart_tune, df_train) , df_train$imp_inglab1)
```

¿Qué conclusión pueden sacar al respecto?

Entrenen, ahora, un segundo árbol pero más complejo: `maxdepth=10`.

```{r}
cart_tune <- train(imp_inglab1 ~ . , 
                 data = df_train, 
                 method = "rpart2", 
                 trControl = fitControl,
                 tuneGrid = data.frame(maxdepth=10),
                 control = rpart.control(cp=0.0001)
)
```

```{r}
rpart.plot(cart_tune$finalModel)
```

```{r}
table(predict(cart_tune, df_train) , df_train$imp_inglab1)
```

Hasta aquí estuvimos haciendo trampa. Vamos a ahora a tunear el parámetro de profundidad de forma correcta.


### Seteando la partición para evaluar

Primero, fijamos la semilla aleatoria (para asegurarnos la posibilidad de replicabilidad)


```{r}
set.seed(789)
```

Podemos usar la función `createFolds()` para generar los índices. Aquí, pas

```{r}
cv_index <- createFolds(y = train$imp_inglab1,
                        k=5,
                        list=TRUE,
                        returnTrain=TRUE)
```

Finalmente, especificamos el diseño de remuestreo mediante la función `trainControl`:

```{r}
fitControl <- trainControl(
        index=cv_index,
        method="cv",
        number=5
        )
```


```{r}
grid <- expand.grid(maxdepth=c(1, 2, 4, 8, 10, 15, 20))
```

Y volvemos a entrenar el modelo:

```{r warning=FALSE}
cart_tune <- train(imp_inglab1 ~ . , 
                 data = train, 
                 method = "rpart2", 
                 trControl = fitControl,
                 tuneGrid = grid,
                 control = rpart.control(cp=0.000001)
)

cart_tune
```


## Seleccionando el mejor modelo

Una vez finalizado el proceso de tunning de los hiperparámetros, podemos proceder a elegir cuál es el mejor modelo y entrenarlo sobre todo el dataset. Podemos ver que el mejor es un árbol que parece demasiado complejo `maxdepth=15`, por ello, vamos a elegir uno un poco más interpreable

```{r}
cart_final <- train(imp_inglab1 ~ . , 
                 data = train, 
                 method = "rpart2", 
                 tuneGrid = data.frame(maxdepth=6),
                 control = rpart.control(cp=0.000001)
)

cart_final
```



Podemos visualizarlo:

```{r fig.height=12, fig.width=20}
rpart.plot(cart_final$finalModel)
```

Y generamos las predicciones finales:

```{r}
y_preds <- predict(cart_final, test)
y_preds
```

Generamos nuestra matriz de confusión:

```{r}
confusionMatrix(y_preds, test$imp_inglab1)
```

¿Qué se puede decir del árbol de decisión? ¿Cómo funciona?

