ds: done
This commit is contained in:
@ -108,7 +108,8 @@ plot(perf)
|
||||
abline(a = 0, b = 1)
|
||||
|
||||
auc_perf = performance(pred, measure = "auc")
|
||||
auc_perf@y.values[[1]]
|
||||
auc_value = auc_perf@y.values[[1]]
|
||||
auc_value
|
||||
```
|
||||
Score the model with the testing data. How accurate are the tree’s predictions?
|
||||
Repeat part (a), but set the splitting index to the Gini coefficient splitting index. How does the new tree compare to the previous one?
|
||||
@ -118,44 +119,5 @@ Repeat part (a), but set the splitting index to the Gini coefficient splitting i
|
||||
Gini(Q) = 1 - sum(p^2) - максимизируем
|
||||
0 - все к 1 классу
|
||||
1 - все равновероятны
|
||||
1-\ x^{2}\ -\ \left(1-x\right)^{2}
|
||||
```{r}
|
||||
pred_test = predict(tree, test_df, type="class")
|
||||
conf_mat_test = table(Actual = test_df$MYDEPV, Predicted = pred_test)
|
||||
conf_mat_test
|
||||
print(diag(conf_mat_test) / rowSums(conf_mat_test))
|
||||
|
||||
tree_gini = rpart(
|
||||
MYDEPV ~ Price + Income + Age,
|
||||
data = train_df,
|
||||
method = "class",
|
||||
parms = list(split = "gini")
|
||||
)
|
||||
|
||||
printcp(tree_gini)
|
||||
|
||||
rpart.plot(
|
||||
tree_gini,
|
||||
type = 1,
|
||||
extra = 106,
|
||||
fallen.leaves = TRUE,
|
||||
)
|
||||
```
|
||||
One way to prune a tree is according to the complexity parameter associated with the smallest cross-validation error. Prune the new tree in this way using the “prune” function. Which features were actually used in the pruned tree? Why were certain variables not used?
|
||||
```{r}
|
||||
best_cp <- tree_gini$cptable[which.min(tree_gini$cptable[, "xerror"]), "CP"]
|
||||
best_cp
|
||||
|
||||
pruned_tree = prune(tree_gini, cp = best_cp)
|
||||
|
||||
printcp(pruned_tree)
|
||||
|
||||
rpart.plot(pruned_tree)
|
||||
```
|
||||
Create the confusion matrix for the new model, and compare the performance of the model before and after pruning.
|
||||
```{r}
|
||||
pruned_pred = predict(pruned_tree, test_df, type="class")
|
||||
pruned_conf_mat = table(Actual = test_df$MYDEPV, Predicted = pruned_pred)
|
||||
pruned_conf_mat
|
||||
print(diag(pruned_conf_mat) / rowSums(pruned_conf_mat))
|
||||
```
|
||||
Reference in New Issue
Block a user