Time Series Foundations

←Back to Tech Tree

inventorycoverage

Time Series Foundations #

Probability & StatisticsDifficulty: ★★★☆☆Depth: 10Unlocks: 3

Stationarity, autocorrelation and partial autocorrelation functions. AR, MA, ARMA, ARIMA models. Box-Jenkins methodology.

Prerequisites (2) #

Covariance and Correlation6 atomsLinear Regression6 atoms

Unlocks (1) #

State-Space Modelslvl 4

Referenced by (6) #

Where this concept shows up in the operating-finance and personal-finance graphs.

From Business (6) #

[Cash FlowBusiness

Cash flow is inherently a time series - sequential inflows and outflows where timing, seasonality, and autocorrelation matter. Stationarity concepts explain why a single-period snapshot misleads about liquidity.](/business/cash-flow/)[real estateBusiness

RE market cycles, Case-Shiller index methodology, and mean-reversion analysis all require stationarity testing and ARIMA-family models](/business/real-estate/)[Value StreamBusiness

Determining whether a value stream appreciates or decays requires modeling its trajectory over time; stationarity tests distinguish trending streams (appreciating or decaying) from mean-reverting ones, and autocorrelation reveals whether past performance compounds into future value](/business/value-stream/)[ARRBusiness

ARR is inherently a time-indexed metric whose trajectory, trend decomposition, and forecasting rely on time series concepts like stationarity and autoregressive modeling.](/business/arr/)[Depreciating AssetBusiness

Stationarity is the core concept - a verifier is built assuming a fixed rule distribution, but non-stationarity (rules change over time) is exactly what causes the verifier to depreciate, requiring re-estimation against the new regime](/business/depreciating-asset/)[CFABusiness

CFA quantitative methods covers autoregressive models and stationarity for financial return series and economic forecasting](/business/cfa/)

Advanced Learning Details

Graph Position #

162

Depth Cost

3

Fan-Out (ROI)

1

Bottleneck Score

10

Chain Length

Many real-world datasets are sequences in time — from daily stock prices to hourly sensor readings — and understanding their time-dependent patterns lets you forecast, detect anomalies, and make decisions with confidence.

TL;DR:

Time series foundations give you tools to tell whether a sequence is stable over time, measure how past values influence future ones (autocorrelation and partial autocorrelation), build AR/MA/ARMA/ARIMA models, and apply the Box–Jenkins cycle to identify, estimate, and validate models for forecasting.

What Is Time Series Foundations? #

Time series analysis studies observations indexed by time: {Xt:t∈Z}{X_t: t \in \mathbb{Z}}{Xt​:t∈Z}, where XtX_tXt​ is the value at time ttt. The foundations of time series cover three core concepts: stationarity (is the process stable over time?), dependence (how does Xt−kX_{t-k}Xt−k​ affect XtX_tXt​?), and parsimonious models that capture that dependence (AR, MA, ARMA, ARIMA). These let you move from raw data to forecasts and inference.

Why care? Because many applied problems are sequential: forecasting electricity load, modeling GDP, or controlling processes in engineering. A model that respects time-dependence is usually far more accurate than treating observations as iid. The prerequisites you already know — Covariance and Correlation (measuring linear relation) and Linear Regression (estimating linear coefficients via least squares) — are used heavily: sample autocovariances extend covariance to lags, and regression ideas underpin estimation of autoregressive parameters.

Stationarity: Intuitively, a stationary time series 'behaves the same' through time. We commonly use weak (covariance) stationarity: 1) constant mean E[Xt]=μ\mathbb{E}[X_t]=\muE[Xt​]=μ for all ttt, 2) autocovariance γ(s,t)=Cov⁡(Xs,Xt)\gamma(s,t)=\operatorname{Cov}(X_s,X_t)γ(s,t)=Cov(Xs​,Xt​) depends only on lag ∣t−s∣|t-s|∣t−s∣, i.e. γ(h)=Cov⁡(Xt,Xt−h)\gamma(h)=\operatorname{Cov}(X_{t},X_{t-h})γ(h)=Cov(Xt​,Xt−h​). Strict stationarity (distribution invariance under time shifts) is stronger but weaker in practice because many models are only second-order.

Concrete example: Random walk vs AR(1)

Var⁡(Xt)=σ21−ϕ2.\operatorname{Var}(X_t)=\frac{\sigma^2}{1-\phi^2}.Var(Xt​)=1−ϕ2σ2​.

Numeric example: if ϕ=0.6\phi=0.6ϕ=0.6 and σ2=1\sigma^2=1σ2=1, then Var⁡(Xt)=1/(1−0.36)=1/0.64=1.5625.\operatorname{Var}(X_t)=1/(1-0.36)=1/0.64=1.5625.Var(Xt​)=1/(1−0.36)=1/0.64=1.5625. This shows a finite, time-invariant variance.

Autocorrelation and autocovariance: the autocovariance at lag kkk is

γk=Cov⁡(Xt,Xt−k),\gamma_k = \operatorname{Cov}(X_t,X_{t-k}),γk​=Cov(Xt​,Xt−k​),

and the autocorrelation is

ρk=γkγ0,\rho_k = \frac{\gamma_k}{\gamma_0},ρk​=γ0​γk​​,

where γ0=Var⁡(Xt)\gamma_0=\operatorname{Var}(X_t)γ0​=Var(Xt​). These generalize the concepts from Covariance and Correlation. A quick sample calculation: suppose observed series [2,4,6,8][2,4,6,8][2,4,6,8]. The sample mean Xˉ=5\bar X=5Xˉ=5. The sample lag-1 autocovariance (using denominator nnn for simplicity) is

γ^1=14∑t=24(Xt−Xˉ)(Xt−1−Xˉ).\hat\gamma_1 = \frac{1}{4}\sum_{t=2}^4 (X_t-\bar X)(X_{t-1}-\bar X).γ^​1​=41​t=2∑4​(Xt​−Xˉ)(Xt−1​−Xˉ).

Compute terms: (4−5)(2−5)=(−1)(−3)=3(4-5)(2-5)=(-1)(-3)=3(4−5)(2−5)=(−1)(−3)=3, (6−5)(4−5)=(1)(−1)=−1(6-5)(4-5)=(1)(-1)=-1(6−5)(4−5)=(1)(−1)=−1, (8−5)(6−5)=(3)(1)=3(8-5)(6-5)=(3)(1)=3(8−5)(6−5)=(3)(1)=3. Sum = 3-1+3=5. So γ^1=5/4=1.25\hat\gamma_1 = 5/4 = 1.25γ^​1​=5/4=1.25. The sample variance γ^0=14[(2−5)2+(4−5)2+(6−5)2+(8−5)2]=(9+1+1+9)/4=20/4=5\hat\gamma_0 = \frac{1}{4}[(2-5)^2+(4-5)^2+(6-5)^2+(8-5)^2] = (9+1+1+9)/4 = 20/4 = 5γ^​0​=41​[(2−5)2+(4−5)2+(6−5)2+(8−5)2]=(9+1+1+9)/4=20/4=5. Thus sample autocorrelation at lag 1 is ρ^1=1.25/5=0.25\hat\rho_1 = 1.25/5 = 0.25ρ^​1​=1.25/5=0.25.

Partial Autocorrelation (PACF): the PACF at lag kkk, denoted ϕkk\phi_{kk}ϕkk​ or αk\alpha_kαk​, measures the correlation between XtX_tXt​ and Xt−kX_{t-k}Xt−k​ after removing linear effects of intermediate lags $1\ldots k-1$. A simple way to compute PACF is to fit the linear regression

Xt=β0+β1Xt−1+⋯+βkXt−k+εtX_t = \beta_0 + \beta_1 X_{t-1} + \dots + \beta_k X_{t-k} + \varepsilon_tXt​=β0​+β1​Xt−1​+⋯+βk​Xt−k​+εt​

and take the coefficient on Xt−kX_{t-k}Xt−k​. This uses Linear Regression skills directly. Numeric example: for an AR(1) with coefficient ϕ=0.6\phi=0.6ϕ=0.6, PACF at lag 1 is $0.6$ and at higher lags it is (theoretically) 0.

Model types (intuition):

These building blocks let you represent many time series parsimoniously and prepare for forecasting. The rest of the lesson develops the main mechanical tools: ACF/PACF, estimation via Yule–Walker and least squares, and the Box–Jenkins cycle for practical modeling.

Core Mechanic 1: Autocorrelation and Partial Autocorrelation (ACF & PACF) #

ACF and PACF are the primary diagnostic tools for identifying the dependence structure in a stationary time series. They are analogous to the correlation and partial correlation concepts from Covariance and Correlation and use Linear Regression for PACF computations.

Definitions and sample formulas

γ^k=1n∑t=k+1n(xt−xˉ)(xt−k−xˉ).\hat\gamma_k = \frac{1}{n}\sum_{t=k+1}^{n} (x_t-\bar x)(x_{t-k}-\bar x).γ^​k​=n1​t=k+1∑n​(xt​−xˉ)(xt−k​−xˉ).

Example: with [2,4,6,8][2,4,6,8][2,4,6,8] we computed γ^1=1.25\hat\gamma_1=1.25γ^​1​=1.25 and γ^0=5\hat\gamma_0=5γ^​0​=5, so ρ^1=0.25\hat\rho_1=0.25ρ^​1​=0.25 (see Section 1). If you prefer the unbiased denominator n−kn-kn−k replace $1/n$ with $1/(n-k)$; both are used in practice.

Properties of ACF for simple models (critical identification clues)

Thus the ACF of AR(1) decays exponentially, while the ACF of MA(1) cuts off after lag 1. This distinction is foundational for identification.

Partial Autocorrelation Function (PACF)

Xt=β0+β1Xt−1+⋯+βkXt−k+ut,X_t = \beta_0 + \beta_1 X_{t-1} + \dots + \beta_k X_{t-k} + u_t,Xt​=β0​+β1​Xt−1​+⋯+βk​Xt−k​+ut​,

then αk=βk\alpha_k = \beta_kαk​=βk​ from that regression. For stationary AR(p), the PACF cuts off after lag ppp (i.e., αk=0\alpha_k=0αk​=0 for k>pk>pk>p). For MA(q), the PACF decays gradually.

Computational methods

Small numeric demonstration of PACF via regression: Data: x=[1.2,0.9,1.1,1.4,1.3]x=[1.2, 0.9, 1.1, 1.4, 1.3]x=[1.2,0.9,1.1,1.4,1.3]. Compute xˉ=1.18\bar x=1.18xˉ=1.18. To estimate PACF at lag 2 (i.e., coefficient on Xt−2X_{t-2}Xt−2​ when regressing on lags 1 and 2), form the design matrix for t=3..5:

X = [[x_2,x_1],[x_3,x_2],[x_4,x_3]] = [[0.9,1.2],[1.1,0.9],[1.4,1.1]].

Response y = [x_3,x_4,x_5] = [1.1,1.4,1.3].

Run a small OLS: β=(XTX)−1XTy\beta=(X^TX)^{-1}X^T yβ=(XTX)−1XTy. Numeric calculation (sketch): XTX=(0.92+1.12+1.420.9⋅1.2+1.1⋅0.9+1.4⋅1.1same1.22+0.92+1.12)X^TX = \begin{pmatrix}0.9^2+1.1^2+1.4^2 & 0.9\cdot1.2+1.1\cdot0.9+1.4\cdot1.1 \ same & 1.2^2+0.9^2+1.1^2\end{pmatrix}XTX=(0.92+1.12+1.42same​0.9⋅1.2+1.1⋅0.9+1.4⋅1.11.22+0.92+1.12​). Compute numbers: $0.81+1.21+1.96=3.98$, off-diagonal $1.08+0.99+1.54=3.61$, lower-right $1.44+0.81+1.21=3.46$. Solve to obtain coefficients; the lag-2 coefficient is the PACF estimate. This step explicitly uses Linear Regression.

Interpreting ACF/PACF plots

Statistical significance: For large n, sample autocorrelations under white noise are approximately N(0,1/n)N(0,1/n)N(0,1/n). A rough 95% confidence band is ±1.96/n\pm 1.96/\sqrt{n}±1.96/n​. Example: if n=100n=100n=100, the band is ±0.196\pm 0.196±0.196. Any sample ACF outside that band suggests significant autocorrelation.

Thus ACF and PACF, combined with numerical rules and plots, are your first discovery tools in the Box–Jenkins cycle (coming later). They depend directly on Covariance and Correlation ideas and on Linear Regression for PACF estimation.

Core Mechanic 2: AR, MA, ARMA, ARIMA Models and Estimation #

We now define the principal parametric families and show core formulas and estimation strategies. These models are linear in either past values or past shocks and are the workhorses of time series forecasting.

Autoregressive (AR) models

Xt=ϕ1Xt−1+ϕ2Xt−2+⋯+ϕpXt−p+εt,X_t = \phi_1 X_{t-1} + \phi_2 X_{t-2} + \dots + \phi_p X_{t-p} + \varepsilon_t,Xt​=ϕ1​Xt−1​+ϕ2​Xt−2​+⋯+ϕp​Xt−p​+εt​,

where εt\varepsilon_tεt​ is white noise with mean 0 and variance σ2\sigma^2σ2. This is analogous to regressing XtX_tXt​ on its past p values (Linear Regression). To fit AR(p), one can use OLS on lagged regressors when the series is mean zero or include an intercept if needed. Another method uses the Yule–Walker equations derived from autocovariances.

Yule–Walker equations (for AR(p))

The theoretical autocovariances satisfy:

(γ0γ1⋮γp−1)=(1ϕ1…ϕp−1ϕ11…ϕp−2⋮⋱ϕp−1…ϕ11)(σ2 0 ⋮)\begin{pmatrix}\gamma_0 \ \gamma_1 \ \vdots \ \gamma_{p-1}\end{pmatrix} = \begin{pmatrix}1 & \phi_1 & \dots & \phi_{p-1} \\phi_1 & 1 & \dots & \phi_{p-2} \\vdots & & \ddots & \\phi_{p-1} & \dots & \phi_1 & 1\end{pmatrix} \begin{pmatrix}\sigma^2 \\ 0 \\ \vdots\end{pmatrix}​γ0​γ1​⋮γp−1​​​=​1ϕ1​⋮ϕp−1​​ϕ1​1…​……⋱ϕ1​​ϕp−1​ϕp−2​1​​​σ2 0 ⋮​​

A practical way is to solve the linear system

(γ1γ2 ⋮γp)=(γ0γ1…γp−1γ1γ0…γp−2⋮⋱γp−1…γ0)(ϕ1ϕ2 ⋮ϕp).\begin{pmatrix}\gamma_1 \\gamma_2 \\ \vdots \\gamma_p\end{pmatrix} = \begin{pmatrix}\gamma_0 & \gamma_1 & \dots & \gamma_{p-1} \\gamma_1 & \gamma_0 & \dots & \gamma_{p-2} \\vdots & & \ddots & \\gamma_{p-1} & \dots & \gamma_0\end{pmatrix} \begin{pmatrix}\phi_1 \\phi_2 \\ \vdots \\phi_p\end{pmatrix}.​γ1​γ2​ ⋮γp​​​=​γ0​γ1​⋮γp−1​​γ1​γ0​…​……⋱γ0​​γp−1​γp−2​​​​ϕ1​ϕ2​ ⋮ϕp​​​.

Use sample autocovariances to get Yule–Walker estimators. Example: AR(1) Yule–Walker reduces to γ1=ϕγ0\gamma_1 = \phi \gamma_0γ1​=ϕγ0​, so ϕ=γ1/γ0\phi = \gamma_1/\gamma_0ϕ=γ1​/γ0​. Numerically, using the sample values from Section 1 where γ^1=1.25\hat\gamma_1=1.25γ^​1​=1.25 and γ^0=5\hat\gamma_0=5γ^​0​=5, the Yule–Walker estimate is ϕ^=1.25/5=0.25\hat\phi = 1.25/5 = 0.25ϕ^​=1.25/5=0.25.

Moving Average (MA) models

Xt=εt+θ1εt−1+⋯+θqεt−q.X_t = \varepsilon_t + \theta_1 \varepsilon_{t-1} + \dots + \theta_q \varepsilon_{t-q}.Xt​=εt​+θ1​εt−1​+⋯+θq​εt−q​.

The parameters θj\theta_jθj​ multiply the unobserved past shocks εt−j\varepsilon_{t-j}εt−j​, making direct linear regression impossible. Estimation typically proceeds via maximum likelihood or nonlinear least squares. However, the MA autocovariance has closed-form expressions in terms of θ\thetaθ and σ2\sigma^2σ2, which you can equate to sample autocovariances for method-of-moments estimation in small orders. Example: for MA(1) ρ1=θ/(1+θ2)\rho_1 = \theta/(1+\theta^2)ρ1​=θ/(1+θ2). If sample ρ^1=0.4\hat\rho_1=0.4ρ^​1​=0.4, solving $0.4 = \theta/(1+\theta^2)$ yields $0.4+0.4\theta^2=\theta \Rightarrow 0.4\theta^2 - \theta + 0.4=0.Solvenumerically:discriminant. Solve numerically: discriminant .Solvenumerically:discriminant=1-4\cdot0.4^2=1-0.64=0.36,so, so ,so\theta=(1\pm 0.6)/(0.8).Positiveroot. Positive root .Positiveroot(1-0.6)/0.8=0.4/0.8=0.5or or or(1+0.6)/0.8=1.6/0.8=2;invertibilityconstraintusuallyselects; invertibility constraint usually selects ;invertibilityconstraintusuallyselects|\theta|<1,so, so ,so\theta=0.5$.

ARMA(p,q) models

Xt=ϕ1Xt−1+⋯+ϕpXt−p+εt+θ1εt−1+⋯+θqεt−q.X_t = \phi_1 X_{t-1} + \dots + \phi_p X_{t-p} + \varepsilon_t + \theta_1 \varepsilon_{t-1} + \dots + \theta_q \varepsilon_{t-q}.Xt​=ϕ1​Xt−1​+⋯+ϕp​Xt−p​+εt​+θ1​εt−1​+⋯+θq​εt−q​.

Estimation is typically via maximum likelihood (often numerically executed) or conditional least squares.

ARIMA(p,d,q): dealing with nonstationarity

Forecasting with AR models (closed-form)

\hat X_{t+h|t} = \phi^h X_t.$$ Numeric example: $\phi=0.6$, $X_t=10$, the 3-step forecast is $0.6^3\cdot 10 = 0.216\cdot 10 = 2.16.$

Estimation practicalities and model selection

Identifiability and invertibility

Concrete example illustrating estimation difference: Suppose sample ACF shows strong decay and PACF cuts after 2 lags: this suggests AR(2). Using sample autocovariances γ^1,γ^2\hat\gamma_1,\hat\gamma_2γ^​1​,γ^​2​ we solve Yule–Walker linear system to get ϕ^1,ϕ^2\hat\phi_1,\hat\phi_2ϕ^​1​,ϕ^​2​ and compute residuals ε^t\hat\varepsilon_tε^t​ then estimate σ^2\hat\sigma^2σ^2 as mean squared residual. This combines Covariance and Correlation (autocovariances) and Linear Regression (residual analysis).

In short, AR/MA/ARMA/ARIMA provide interpretable, mathematically tractable families; Yule–Walker and OLS give simple estimation for AR models, while MA parameters often need likelihood-based methods.

Applications and Connections: Box–Jenkins Methodology and Real-World Use #

The Box–Jenkins (BJ) methodology is a structured four-step practical workflow for building ARIMA-class models. It turns the theoretical tools (ACF/PACF, AR/MA definitions, stationarity tests) into an applied recipe. Each step references ideas you've seen in previous sections and relies on Covariance and Correlation and Linear Regression for computations.

Box–Jenkins steps

  1. Identification: Use plots of the time series, ACF, and PACF to decide whether differencing is needed and to propose model orders p and q. Look for patterns: slow ACF decay suggests nonstationarity or AR behavior; ACF cutoffs help detect MA terms. Example: If ACF decays slowly and first differences look stationary, set d=1 and then examine the ACF/PACF of differenced series.

  2. Estimation: Fit candidate ARIMA(p,d,q) models via maximum likelihood or conditional least squares; estimate parameters ϕ,θ,σ2\phi,\theta,\sigma^2ϕ,θ,σ2. For AR parts you might use Yule–Walker to get initial estimates (a closed-form start) and then refine via ML. Example: Fit ARIMA(1,1,1) to data; the differencing makes the series stationary before estimation.

  3. Diagnostic checking: Analyze residuals ε^t\hat\varepsilon_tε^t​ from the fitted model. Residual ACF should show no significant autocorrelation (randomness). Perform Ljung–Box test for joint autocorrelation up to lag m. Residuals should be approximately uncorrelated with constant variance and mean zero. If diagnostics fail, go back to step 1 and refine model orders.

  4. Forecasting: With a validated model, compute forecasts and forecast intervals. ARIMA models allow iterative computation of point forecasts and mean squared error estimates for prediction intervals.

Concrete real-world examples and use cases

Seasonality and SARIMA: When periodic patterns exist, include seasonal terms. For seasonal period s, a SARIMA model includes factors like (1−Φ1Bs−… )(1 - \Phi_1 B^s - \dots )(1−Φ1​Bs−…) for seasonal AR and seasonal differencing (1−Bs)D(1-B^s)^D(1−Bs)D. Numeric example: for monthly data with annual seasonality s=12, seasonal differencing with D=1 is Yt=(1−B12)Xt=Xt−Xt−12Y_t = (1-B^{12})X_t = X_t - X_{t-12}Yt​=(1−B12)Xt​=Xt​−Xt−12​.

Model evaluation and selection

Connections to machine learning and further topics

Practical pitfalls and tips

A short end-to-end numeric illustration

Suppose you have monthly sales series and see a linear upward trend. First difference to remove trend (d=1). The differenced series shows ACF cutting off at lag 1 and PACF decaying: candidate ARIMA(0,1,1). Fit MA(1) on differenced data; estimate θ≈0.5\theta\approx0.5θ≈0.5 and residuals look white. Forecast by integrating predicted differences forward to recover level forecasts. This is the Box–Jenkins loop in action.

In summary, Box–Jenkins ties together stationarity tests, ACF/PACF-guided identification, estimation (Yule–Walker, OLS, ML), and diagnostic checking to produce reliable forecasts. The concrete algebra and numerical examples throughout this lesson show how concepts from Covariance and Correlation and Linear Regression are repurposed to handle temporal dependence and forecasting tasks.

Worked Examples (3) #

Compute sample ACF for a small series #

You observe the series x = [2, 4, 6, 8]. Compute the sample mean, sample autocovariance at lags 0 and 1 using denominator n=4, and the sample autocorrelation at lag 1.

  1. Compute sample mean: xˉ=(2+4+6+8)/4=20/4=5\bar x = (2+4+6+8)/4 = 20/4 = 5xˉ=(2+4+6+8)/4=20/4=5.
  2. Compute sample autocovariance at lag 0: γ^0=14∑t=14(xt−xˉ)2\hat\gamma_0 = \frac{1}{4}\sum_{t=1}^4 (x_t-\bar x)^2γ^​0​=41​∑t=14​(xt​−xˉ)2. Differences squared: (2-5)^2=9, (4-5)^2=1, (6-5)^2=1, (8-5)^2=9. Sum = 20. So γ^0=20/4=5\hat\gamma_0 = 20/4 = 5γ^​0​=20/4=5.
  3. Compute sample autocovariance at lag 1: γ^1=14∑t=24(xt−xˉ)(xt−1−xˉ)\hat\gamma_1 = \frac{1}{4}\sum_{t=2}^4 (x_t-\bar x)(x_{t-1}-\bar x)γ^​1​=41​∑t=24​(xt​−xˉ)(xt−1​−xˉ). Terms: (4-5)(2-5)=(-1)(-3)=3, (6-5)(4-5)=(1)(-1)=-1, (8-5)(6-5)=(3)(1)=3. Sum = 5. So γ^1=5/4=1.25\hat\gamma_1 = 5/4 = 1.25γ^​1​=5/4=1.25.
  4. Compute sample autocorrelation at lag 1: ρ^1=γ^1/γ^0=1.25/5=0.25\hat\rho_1 = \hat\gamma_1 / \hat\gamma_0 = 1.25/5 = 0.25ρ^​1​=γ^​1​/γ^​0​=1.25/5=0.25.
  5. Summarize: xˉ=5\bar x=5xˉ=5, γ^0=5\hat\gamma_0=5γ^​0​=5, γ^1=1.25\hat\gamma_1=1.25γ^​1​=1.25, ρ^1=0.25\hat\rho_1=0.25ρ^​1​=0.25.

Insight: This example shows directly how the sample autocovariance and autocorrelation are computed from data and connects to the Covariance and Correlation prerequisite: lagged pairs play the same role as paired variables in a standard correlation computation.

Identify AR(1) vs MA(1) from theoretical ACF #

Consider two processes: (A) AR(1) with ϕ=0.6\phi=0.6ϕ=0.6 and noise variance σ2=1\sigma^2=1σ2=1, (B) MA(1) with θ=0.5\theta=0.5θ=0.5 and noise variance σ2=1\sigma^2=1σ2=1. Compute theoretical ACF at lags 1,2,3 for both and use these to argue which pattern each would show on an ACF plot.

  1. AR(1) theoretical ACF: ρk=ϕk\rho_k = \phi^kρk​=ϕk.

    Compute numeric values with ϕ=0.6\phi=0.6ϕ=0.6: ρ1=0.6\rho_1=0.6ρ1​=0.6, ρ2=0.62=0.36\rho_2=0.6^2=0.36ρ2​=0.62=0.36, ρ3=0.63=0.216\rho_3=0.6^3=0.216ρ3​=0.63=0.216.

  2. MA(1) theoretical ACF: ρ1=θ/(1+θ2)\rho_1 = \theta/(1+\theta^2)ρ1​=θ/(1+θ2) and ρk=0\rho_k=0ρk​=0 for k>1k>1k>1.

    Compute numeric values with θ=0.5\theta=0.5θ=0.5: ρ1=0.5/(1+0.25)=0.5/1.25=0.4\rho_1=0.5/(1+0.25)=0.5/1.25=0.4ρ1​=0.5/(1+0.25)=0.5/1.25=0.4, ρ2=0\rho_2=0ρ2​=0, ρ3=0\rho_3=0ρ3​=0.

  3. Interpretation: AR(1) ACF decays exponentially (0.6, 0.36, 0.216, ...). MA(1) ACF has a single non-zero lag (0.4 at lag 1) then zeros. So on a sample ACF plot, AR(1) would show gradually decreasing bars while MA(1) would show a significant bar at lag 1 and non-significant bars beyond.

  4. Thus, if you observe ACF cutting off after lag 1, MA(1) is likely; if it decays, AR(1) is likely. PACF patterns provide complementary evidence.

Insight: This demonstrates the identification rule-of-thumb used in Box–Jenkins: ACF decay vs cut-off helps distinguish AR vs MA behavior. The numeric computations show concrete numbers you would expect in plots.

Yule–Walker estimation for AR(1) #

Using the sample autocovariances from worked example 1 (γ^0=5\hat\gamma_0=5γ^​0​=5, γ^1=1.25\hat\gamma_1=1.25γ^​1​=1.25), estimate the AR(1) coefficient ϕ\phiϕ via Yule–Walker and compute the implied variance σ2\sigma^2σ2 of residuals (using σ^2=γ^0−ϕ^γ^1\hat\sigma^2 = \hat\gamma_0 - \hat\phi\hat\gamma_1σ^2=γ^​0​−ϕ^​γ^​1​).

  1. Yule–Walker for AR(1) gives ϕ=γ1/γ0\phi = \gamma_1 / \gamma_0ϕ=γ1​/γ0​. Substitute sample values: ϕ^=1.25/5=0.25\hat\phi = 1.25 / 5 = 0.25ϕ^​=1.25/5=0.25.
  2. Compute residual variance estimate: σ^2=γ^0−ϕ^γ^1\hat\sigma^2 = \hat\gamma_0 - \hat\phi \hat\gamma_1σ^2=γ^​0​−ϕ^​γ^​1​ (this follows from decomposition of variance for AR(1)). Plug numbers: σ^2=5−0.25⋅1.25\hat\sigma^2 = 5 - 0.25 \cdot 1.25σ^2=5−0.25⋅1.25.
  3. Compute $0.25\cdot 1.25 = 0.3125,so, so ,so\hat\sigma^2 = 5 - 0.3125 = 4.6875$.
  4. Thus estimated AR(1) model is Xt=0.25Xt−1+εtX_t = 0.25 X_{t-1} + \varepsilon_tXt​=0.25Xt−1​+εt​ with estimated noise variance $4.6875$.
  5. Check interpretation: the relatively small sample and short series make estimates noisy; in practice we use longer series and refine via maximum likelihood.

Insight: This example shows how Yule–Walker turns autocovariances into AR coefficients. It leverages the link between autocovariance structure and AR parameters and provides a direct, algebraic estimator without optimization.

Key Takeaways #

Common Mistakes #

Practice #

easy

Easy: Given a stationary AR(1) process X_t = 0.8 X_{t-1} + ε_t with Var(ε_t)=1, compute the theoretical autocorrelation ρ_1 and ρ_3, and the stationary variance Var(X_t).

Hint: Use ρ_k = φ^k and Var(X_t)=σ^2/(1-φ^2).

Show solution

ρ_1 = 0.8, ρ_3 = 0.8^3 = 0.512. Var(X_t) = 1/(1-0.8^2)=1/(1-0.64)=1/0.36 ≈ 2.7778.

medium

Medium: You observe sample autocorrelations \hatρ1=0.45 and \hatρ2=0.2 from a long series (n large). Using the Yule–Walker equations for AR(2), solve approximately for φ1 and φ2 by solving the linear system [\hatρ1,\hatρ2]^T = [[1,\hatρ1],[\hatρ1,1]][φ1,φ2]^T.

Hint: Solve the 2×2 linear system: invert the matrix [[1,\hatρ1],[\hatρ1,1]] and multiply by [\hatρ1,\hatρ2].

Show solution

Matrix A = [[1,0.45],[0.45,1]]. Determinant = 1-0.45^2 = 1-0.2025 = 0.7975. Inverse = (1/0.7975)[[1,-0.45],[-0.45,1]]. Multiply by vector r=[0.45,0.2]: φ = (1/0.7975)[10.45 -0.450.2 , -0.450.45 +10.2] = (1/0.7975)[0.45-0.09, -0.2025+0.2] = (1/0.7975)[0.36, -0.0025] ≈ [0.4517, -0.0031]. So φ1≈0.452, φ2≈-0.003.

hard

Hard: Suppose you have monthly data with a strong annual seasonality (period s=12) and a linear upward trend. Outline and justify a Box–Jenkins modeling plan: preprocessing steps, identification choices (including possible seasonal orders), and how you would check residuals. Be explicit about differencing and SARIMA components.

Hint: Remove trend via differencing (first difference). Consider seasonal differencing (1-B^12). After differencing, examine ACF/PACF at seasonal lags (12,24) and nonseasonal lags to propose (p,d,q)(P,D,Q)_12. Use residual ACF/Ljung–Box to check.

Show solution

Plan: 1) Take first difference to remove linear trend: Y_t = X_t - X_{t-1} (d=1). 2) Check seasonality: compute ACF of Y_t; if strong spikes at lags 12,24,... take seasonal difference (D=1): Z_t=(1-B^12)Y_t = (1-B)(1-B^{12})X_t with total differencing d=1,D=1. 3) On stationary Z_t, plot ACF/PACF. If ACF shows spike at lag 12 and PACF decays, consider seasonal MA term Q=1; if PACF spikes at lag 12 and ACF decays, consider seasonal AR P=1. For nonseasonal structure, use ACF/PACF at low lags to suggest p and q. For example candidate SARIMA(p,1,q)(P,1,Q)_12. 4) Fit candidate models (e.g., SARIMA(0,1,1)(0,1,1)_12), estimate parameters via ML, and inspect residuals: plot residual ACF (should be within ±1.96/√n), perform Ljung–Box test up to lag 24 or more to test joint correlation, check residual histogram/QQ-plot for normality assumption if needed. 5) If diagnostics fail, iterate: try different p,q,P,Q or additional differencing cautiously. 6) Validate via out-of-sample forecast performance. This plan explicitly uses seasonal differencing and SARIMA components and relies on ACF/PACF identification and diagnostic checks.

Connections #

Looking back: This lesson reuses core ideas from your prerequisites. From Covariance and Correlation we extended covariance to autocovariance γk=Cov(Xt,Xt−k)\gamma_k=\mathrm{Cov}(X_t,X_{t-k})γk​=Cov(Xt​,Xt−k​) and defined autocorrelation ρk\rho_kρk​, which are the numerical heart of the ACF. From Linear Regression we borrowed the idea of projecting onto explanatory variables: PACF at lag k is the coefficient on Xt−kX_{t-k}Xt−k​ when regressing XtX_tXt​ on lags 1..k, and OLS/residual calculations underpin AR estimation and diagnostics. Looking forward: mastering these foundations enables work on forecasting (point and interval forecasts), state-space models and the Kalman filter (used in engineering and econometrics), volatility modeling (GARCH models in finance), seasonal modeling (SARIMA), and modern probabilistic sequence models. Many machine learning sequence methods (RNNs, LSTMs) often require careful feature engineering and stationarity-aware preprocessing that derive from the ideas covered here. Specific downstream concepts that require this foundation include: Kalman filtering for state-space inference, GARCH for volatility clustering (which presumes understanding of serial dependence), and structural time series for decomposing trend/seasonality — all use ACF/PACF intuition, differencing/ARMA parametrizations, and residual diagnostics taught in this lesson.

Quality: pending (0.0/5)

← back to treebrowse all →