ds: done

2026-02-13 14:03:28 +03:00
parent 417326498e
commit 65218abfb1
159 changed files with 2577567 additions and 2553 deletions
--- a/ds/25-1/r/9.Rmd
+++ b/ds/25-1/r/9.Rmd
@ -108,7 +108,8 @@ plot(perf)
 abline(a = 0, b = 1)

 auc_perf = performance(pred, measure = "auc")
-auc_perf@y.values[[1]]
+auc_value = auc_perf@y.values[[1]]
+auc_value
 ```
 Score the model with the testing data.  How accurate are the tree’s predictions?
 Repeat part (a), but set the splitting index to the Gini coefficient splitting index.  How does the new tree compare to the previous one? 
@ -118,44 +119,5 @@ Repeat part (a), but set the splitting index to the Gini coefficient splitting i
 Gini(Q) = 1 - sum(p^2) - максимизируем
 0 - все к 1 классу
 1 - все равновероятны
-1-\ x^{2}\ -\ \left(1-x\right)^{2}
 ```{r}
-pred_test = predict(tree, test_df, type="class")
-conf_mat_test = table(Actual = test_df$MYDEPV, Predicted = pred_test)
-conf_mat_test
-print(diag(conf_mat_test) / rowSums(conf_mat_test))
-
-tree_gini = rpart(
-  MYDEPV ~ Price + Income + Age,
-  data = train_df,
-  method = "class",
-  parms = list(split = "gini")
-)
-
-printcp(tree_gini)
-
-rpart.plot(
-  tree_gini,
-  type = 1,
-  extra = 106,
-  fallen.leaves = TRUE,
-)
-```
-One way to prune a tree is according to the complexity parameter associated with the smallest cross-validation error.  Prune the new tree in this way using the “prune” function.  Which features were actually used in the pruned tree?  Why were certain variables not used?
-```{r}
-best_cp <- tree_gini$cptable[which.min(tree_gini$cptable[, "xerror"]), "CP"]
-best_cp
-
-pruned_tree = prune(tree_gini, cp = best_cp)
-
-printcp(pruned_tree)
-
-rpart.plot(pruned_tree)
-```
-Create the confusion matrix for the new model, and compare the performance of the model before and after pruning.
-```{r}
-pruned_pred = predict(pruned_tree, test_df, type="class")
-pruned_conf_mat = table(Actual = test_df$MYDEPV, Predicted = pruned_pred)
-pruned_conf_mat
-print(diag(pruned_conf_mat) / rowSums(pruned_conf_mat))
 ```