Files
lab/5/data science/r/10.Rmd
2026-02-17 23:13:20 +03:00

111 lines
3.4 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Lab10: Time Series"
author: "Vladislav Litvinov <vlad@sek1ro>"
output:
pdf_document:
toc_float: TRUE
---
Plotting data set
```{r}
setwd('/home/sek1ro/git/public/lab/ds/25-1/r')
jj = scan("jj.dat")
jj_ts = ts(jj, start = c(1960, 1), frequency = 4)
jj_ts
plot(jj_ts, ylab = "EPS", xlab = "Year")
```
In order to perform an ARIMA model, the time series will need to be transformed to remove any trend. Plot the difference of xt and xt-1, for all t > 0. Has this difference adequately detrended the series? Does the variability of the EPS appear constant over time? Why does the constant variance matter?
```{r}
jj_diff = diff(jj_ts)
plot(jj_diff, xlab = "Year", ylab = "EPS diff")
```
Plot the log10 of the quarterly EPS vs. time and plot the difference of log10(xt ) and
log10(xt-1) for all t > 0. Has this adequately detrended the series? Has the variability of the differenced log10(EPS) become more constant?
```{r}
log_jj = log10(jj_ts)
log_jj_diff = diff(log_jj)
plot(log_jj, xlab = "Year", ylab = "log10(EPS)")
plot(log_jj_diff, xlab = "Year", ylab = "log10(EPS) diff")
```
Treating the differenced log10 of the EPS series as a stationary series, plot the ACF and PACF of this series. What possible ARIMA models would you consider and why?
ACF(k) = Corr(x[t], x[t-k]) - Autocorrelation Function, показывает, насколько временной ряд коррелирует сам с собой
PACF - Partial Autocorrelation Function - оказывает чистую связь после удаления влияния всех промежуточных значений между t и t-k, это последний коэффициент в AR(k)-регрессии
xt = f1xt-1 + f2xt-2 + .. + fkxt-k + eps + f1xt-1 + f2xt-2 + .. + fkxt-k
PACF(k) = fk
ARMA(p, q)
p - AR-часть, предыдущие значения - PACF
q - MA-часть, ошибки предыдущих предсказаний - ACF
ARIMA(p, d, q)
d - I-часть, number of differences
```{r}
acf(log_jj_diff, lag.max = 20)
ar(log_jj_diff)
pacf(log_jj_diff, lag.max = 20)
```
Run the proposed ARIMA models from part d and compare the results. Identify an appropriate model. Justify your choice.
Смысл: баланс между
Качеством подгонки (чем лучше модель описывает данные, тем ниже ошибка)
Сложностью модели (чем больше параметров, тем выше риск переобучения)
AIC=2k2ln(L)
L > , k <
Why is the choice of natural log or log base 10 in Problem 4.8 somewhat irrelevant to the transformation and the analysis?
Why is the value of the ACF for lag 0 equal to one?
```{r}
library(forecast)
fit_model = function(order) {
Arima(log_jj_diff, order = order)
}
models <- list(
"1, 0, 1" = fit_model(c(1,0,1)),
"1, 1, 1" = fit_model(c(1,1,1)),
"1, 0, 5" = fit_model(c(1,0,5)),
"1, 1, 5" = fit_model(c(1,1,5))
)
print(models["1, 0, 5"])
aic_values <- sapply(models, AIC)
print(aic_values)
```
Arima(1, 0, 5)
```{r}
n = 10000
phi4 = c(-0.18)
AR <- arima.sim(n=n, list(ar=phi4[1]))
plot(AR, main="AR series")
acf(AR, main="ACF AR")
pacf(AR, main="PACF AR")
theta4 <- c(-0.65, -0.22, -0.28, 1, -0.4)
MA <- arima.sim(n=n, list(ma=theta4))
plot(MA, main="MA series")
acf(MA, main="ACF MA")
pacf(MA, main="PACF MA")
```
```{r}
fit <- auto.arima(jj_ts)
summary(fit)
forecasted_values <- forecast(fit, h = 20)
plot(forecasted_values)
```