renaming
This commit is contained in:
111
5/data science/r/10.Rmd
Normal file
111
5/data science/r/10.Rmd
Normal file
@ -0,0 +1,111 @@
|
||||
---
|
||||
title: "Lab10: Time Series"
|
||||
author: "Vladislav Litvinov <vlad@sek1ro>"
|
||||
output:
|
||||
pdf_document:
|
||||
toc_float: TRUE
|
||||
---
|
||||
Plotting data set
|
||||
```{r}
|
||||
setwd('/home/sek1ro/git/public/lab/ds/25-1/r')
|
||||
jj = scan("jj.dat")
|
||||
jj_ts = ts(jj, start = c(1960, 1), frequency = 4)
|
||||
|
||||
jj_ts
|
||||
|
||||
plot(jj_ts, ylab = "EPS", xlab = "Year")
|
||||
```
|
||||
In order to perform an ARIMA model, the time series will need to be transformed to remove any trend. Plot the difference of xt and xt-1, for all t > 0. Has this difference adequately detrended the series? Does the variability of the EPS appear constant over time? Why does the constant variance matter?
|
||||
```{r}
|
||||
jj_diff = diff(jj_ts)
|
||||
|
||||
plot(jj_diff, xlab = "Year", ylab = "EPS diff")
|
||||
```
|
||||
Plot the log10 of the quarterly EPS vs. time and plot the difference of log10(xt ) and
|
||||
log10(xt-1) for all t > 0. Has this adequately detrended the series? Has the variability of the differenced log10(EPS) become more constant?
|
||||
```{r}
|
||||
log_jj = log10(jj_ts)
|
||||
log_jj_diff = diff(log_jj)
|
||||
|
||||
plot(log_jj, xlab = "Year", ylab = "log10(EPS)")
|
||||
plot(log_jj_diff, xlab = "Year", ylab = "log10(EPS) diff")
|
||||
```
|
||||
Treating the differenced log10 of the EPS series as a stationary series, plot the ACF and PACF of this series. What possible ARIMA models would you consider and why?
|
||||
|
||||
ACF(k) = Corr(x[t], x[t-k]) - Autocorrelation Function, показывает, насколько временной ряд коррелирует сам с собой
|
||||
|
||||
PACF - Partial Autocorrelation Function - оказывает чистую связь после удаления влияния всех промежуточных значений между t и t-k, это последний коэффициент в AR(k)-регрессии
|
||||
|
||||
xt = f1xt-1 + f2xt-2 + .. + fkxt-k + eps + f1xt-1 + f2xt-2 + .. + fkxt-k
|
||||
PACF(k) = fk
|
||||
|
||||
ARMA(p, q)
|
||||
p - AR-часть, предыдущие значения - PACF
|
||||
q - MA-часть, ошибки предыдущих предсказаний - ACF
|
||||
|
||||
ARIMA(p, d, q)
|
||||
d - I-часть, number of differences
|
||||
|
||||
```{r}
|
||||
acf(log_jj_diff, lag.max = 20)
|
||||
ar(log_jj_diff)
|
||||
pacf(log_jj_diff, lag.max = 20)
|
||||
```
|
||||
Run the proposed ARIMA models from part d and compare the results. Identify an appropriate model. Justify your choice.
|
||||
|
||||
Смысл: баланс между
|
||||
|
||||
Качеством подгонки (чем лучше модель описывает данные, тем ниже ошибка)
|
||||
|
||||
Сложностью модели (чем больше параметров, тем выше риск переобучения)
|
||||
|
||||
AIC=2k−2ln(L)
|
||||
L > , k <
|
||||
|
||||
Why is the choice of natural log or log base 10 in Problem 4.8 somewhat irrelevant to the transformation and the analysis?
|
||||
|
||||
Why is the value of the ACF for lag 0 equal to one?
|
||||
```{r}
|
||||
library(forecast)
|
||||
|
||||
fit_model = function(order) {
|
||||
Arima(log_jj_diff, order = order)
|
||||
}
|
||||
|
||||
models <- list(
|
||||
"1, 0, 1" = fit_model(c(1,0,1)),
|
||||
"1, 1, 1" = fit_model(c(1,1,1)),
|
||||
"1, 0, 5" = fit_model(c(1,0,5)),
|
||||
"1, 1, 5" = fit_model(c(1,1,5))
|
||||
)
|
||||
|
||||
print(models["1, 0, 5"])
|
||||
|
||||
aic_values <- sapply(models, AIC)
|
||||
print(aic_values)
|
||||
```
|
||||
Arima(1, 0, 5)
|
||||
```{r}
|
||||
n = 10000
|
||||
phi4 = c(-0.18)
|
||||
AR <- arima.sim(n=n, list(ar=phi4[1]))
|
||||
|
||||
plot(AR, main="AR series")
|
||||
acf(AR, main="ACF AR")
|
||||
pacf(AR, main="PACF AR")
|
||||
|
||||
theta4 <- c(-0.65, -0.22, -0.28, 1, -0.4)
|
||||
MA <- arima.sim(n=n, list(ma=theta4))
|
||||
|
||||
plot(MA, main="MA series")
|
||||
acf(MA, main="ACF MA")
|
||||
pacf(MA, main="PACF MA")
|
||||
```
|
||||
|
||||
```{r}
|
||||
fit <- auto.arima(jj_ts)
|
||||
summary(fit)
|
||||
|
||||
forecasted_values <- forecast(fit, h = 20)
|
||||
plot(forecasted_values)
|
||||
```
|
||||
Reference in New Issue
Block a user