ds: done

2026-02-13 14:03:28 +03:00
parent 417326498e
commit 65218abfb1
159 changed files with 2577567 additions and 2553 deletions
--- a/ds/25-1/r/10.Rmd
+++ b/ds/25-1/r/10.Rmd
@ -0,0 +1,111 @@
+---
+title: "Lab10: Time Series"
+author: "Vladislav Litvinov <vlad@sek1ro>"
+output:
+  pdf_document:
+  toc_float: TRUE
+---
+Plotting data set
+```{r}
+setwd('/home/sek1ro/git/public/lab/ds/25-1/r')
+jj = scan("jj.dat")
+jj_ts = ts(jj, start = c(1960, 1), frequency = 4)
+
+jj_ts
+
+plot(jj_ts, ylab = "EPS", xlab = "Year")
+```
+In order to perform an ARIMA model, the time series will need to be transformed to remove any trend.  Plot the difference of xt and xt-1, for all t > 0.   Has this difference adequately detrended the series? Does the variability of the EPS appear constant over time?  Why does the constant variance matter?
+```{r}
+jj_diff = diff(jj_ts)
+
+plot(jj_diff, xlab = "Year", ylab = "EPS diff")
+```
+Plot the log10 of the quarterly EPS vs. time and plot the difference of log10(xt ) and
+log10(xt-1) for all t > 0.  Has this adequately detrended the series?  Has the variability of the differenced log10(EPS) become more constant? 
+```{r}
+log_jj = log10(jj_ts)
+log_jj_diff = diff(log_jj)
+
+plot(log_jj, xlab = "Year", ylab = "log10(EPS)")
+plot(log_jj_diff, xlab = "Year", ylab = "log10(EPS) diff")
+```
+Treating the differenced log10 of the EPS series as a stationary series, plot the ACF and PACF of this series.  What possible ARIMA models would you consider and why?
+
+ACF(k) = Corr(x[t], x[t-k]) - Autocorrelation Function, показывает, насколько временной ряд коррелирует сам с собой
+
+PACF - Partial Autocorrelation Function - оказывает чистую связь после удаления влияния всех промежуточных значений между t и t-k, это последний коэффициент в AR(k)-регрессии
+
+xt = f1xt-1 + f2xt-2 + .. + fkxt-k + eps
+PACF(k) = fk
+
+ARMA(p, q)
+p - AR-часть, предыдущие значения - PACF
+q - MA-часть, ошибки предыдущих предсказаний - ACF
+
+ARIMA(p, d, q)
+d - I-часть, number of differences
+
+```{r}
+acf(log_jj_diff, lag.max = 20)
+ar(log_jj_diff)
+pacf(log_jj_diff, lag.max = 20)
+```
+Run the proposed ARIMA models from part d and compare the results. Identify an appropriate model. Justify your choice.
+
+Смысл: баланс между
+
+Качеством подгонки (чем лучше модель описывает данные, тем ниже ошибка)
+
+Сложностью модели (чем больше параметров, тем выше риск переобучения)
+
+AIC=2k−2ln(L)
+L > , k <
+
+Why is the choice of natural log or log base 10 in Problem 4.8 somewhat irrelevant to the transformation and the analysis?
+
+Why is the value of the ACF for lag 0 equal to one?
+```{r}
+library(forecast)
+
+fit_model = function(order) {
+  Arima(log_jj_diff, order = order)
+}
+
+models <- list(
+  "1, 0, 1" = fit_model(c(1,0,1)),
+  "1, 1, 1" = fit_model(c(1,1,1)),
+  "1, 0, 5" = fit_model(c(1,0,5)),
+  "1, 1, 5" = fit_model(c(1,1,5))
+)
+
+print(models["1, 0, 5"])
+
+aic_values <- sapply(models, AIC)
+print(aic_values)
+```
+Arima(1, 0, 5)
+```{r}
+n = 10000
+phi4 = c(-0.18)
+AR <- arima.sim(n=n, list(ar=phi4[1]))
+
+plot(AR, main="AR series")
+acf(AR, main="ACF AR")
+pacf(AR, main="PACF AR")
+
+theta4 <- c(-0.65, -0.22, -0.28, 1, -0.4)
+MA <- arima.sim(n=n, list(ma=theta4))
+
+plot(MA, main="MA series")
+acf(MA, main="ACF MA")
+pacf(MA, main="PACF MA")
+```
+
+```{r}
+fit <- auto.arima(jj_ts)
+summary(fit)
+
+forecasted_values <- forecast(fit, h = 20)
+plot(forecasted_values)
+```