This commit is contained in:
2026-02-13 14:03:28 +03:00
parent 417326498e
commit 65218abfb1
159 changed files with 2577567 additions and 2553 deletions

View File

@ -108,7 +108,8 @@ plot(perf)
abline(a = 0, b = 1)
auc_perf = performance(pred, measure = "auc")
auc_perf@y.values[[1]]
auc_value = auc_perf@y.values[[1]]
auc_value
```
Score the model with the testing data. How accurate are the trees predictions?
Repeat part (a), but set the splitting index to the Gini coefficient splitting index. How does the new tree compare to the previous one?
@ -118,44 +119,5 @@ Repeat part (a), but set the splitting index to the Gini coefficient splitting i
Gini(Q) = 1 - sum(p^2) - максимизируем
0 - все к 1 классу
1 - все равновероятны
1-\ x^{2}\ -\ \left(1-x\right)^{2}
```{r}
pred_test = predict(tree, test_df, type="class")
conf_mat_test = table(Actual = test_df$MYDEPV, Predicted = pred_test)
conf_mat_test
print(diag(conf_mat_test) / rowSums(conf_mat_test))
tree_gini = rpart(
MYDEPV ~ Price + Income + Age,
data = train_df,
method = "class",
parms = list(split = "gini")
)
printcp(tree_gini)
rpart.plot(
tree_gini,
type = 1,
extra = 106,
fallen.leaves = TRUE,
)
```
One way to prune a tree is according to the complexity parameter associated with the smallest cross-validation error. Prune the new tree in this way using the “prune” function. Which features were actually used in the pruned tree? Why were certain variables not used?
```{r}
best_cp <- tree_gini$cptable[which.min(tree_gini$cptable[, "xerror"]), "CP"]
best_cp
pruned_tree = prune(tree_gini, cp = best_cp)
printcp(pruned_tree)
rpart.plot(pruned_tree)
```
Create the confusion matrix for the new model, and compare the performance of the model before and after pruning.
```{r}
pruned_pred = predict(pruned_tree, test_df, type="class")
pruned_conf_mat = table(Actual = test_df$MYDEPV, Predicted = pruned_pred)
pruned_conf_mat
print(diag(pruned_conf_mat) / rowSums(pruned_conf_mat))
```