June 25, 2024


A study of the Augmented Dickey-Fuller (ADF) test from a weird example

Photo by Jan Huber on Unsplash

Stationarity is one of the most fundamental concepts for time series analysis. Generally, stationarity will provide excellent properties for modelling the time series with various statistical methods. Augmented Dickey-Fuller (ADF) test is probably the most widely used approach for checking stationarity.

There are tons of articles online on this topic. I won’t waste your time on the basic intro, such as the definition of stationarity, how to do ADF tests, etc. In this post, I will share my journey of exploring the ADF test after encountering a strange case in an application.

The learning path I will show is typical for learning data science. First, we think we understand a tool or a concept, but we just acknowledge and remember the term. When we apply it to actual data, we may find unexpected, challenging problems which push us to investigate more and understand further.

Again, I’ll share my code on GitHub; please find the link in the reference section at last.

The start of an unexpected journey

I was working with some time series the other day. Figure 1 shows one of them. There is no double that an upward trend exists, and the variance also changes over time. With this clear visualization, I didn’t need to test the stationarity. For some reason I don’t remember, I still tried it with the ADF test. Surprisingly, the p-value is almost 0, which implies I should reject the Null hypothesis and accept it is stationary.

Figure 1. Time series with a trend (Image by the Author)

That’s odd. The test result seemed wrong. I wanted to investigate what’s going on behind the ADF test. The first step I tried was to replicate this issue with synthetic data. I generated some test data with the following code. The code only mimicked the behaviour of slow trending but not the seasonality.

There are 20,000 observations in Figure 1, implying the trend is going up extremely slowly. I create a time series with a little slope of 0.0005. The time series mean value goes up from around 0 to 0.5 after 1000 observations. Then let’s test it with the adfuller() function from statsmodels.tsa.stattools with the default parameters. The p-value is 0.01, and the “problem” happens again. Figure 2 shows the result. You may ignore the title, and focus on the upward trend. I’ll explain why we have p-values from three different ADF tests later.

Figure 2. Synthetic time series with ADF test result (Image by the Author)

The math behind the DF test

We must go down deep to see what precisely the ADF test is doing. It turns out its math background is not complicated. First, the ADF test is just an advanced version of the Dickey-Fuller test. There are three main versions of the DF test (from Wikipedia):

Version 1: Test for a unit root: ∆yᵢ = δyᵢ₋₁ + uᵢ

Version 2: Test for a unit root with constant: ∆yᵢ = a₀ + δyᵢ₋₁ + uᵢ

Version 3. Test for a unit root with constant and deterministic trend: ∆yᵢ = a₀ + a₁*t + δyᵢ₋₁ + uᵢ

In each version, the null hypothesis is that there is a unit root, δ=0.

The Statsmodels package supports all three versions with the parameter “regression”.

For version 1, regression is ’n’ (no constant, no trend).

For version 2, regression is ‘c’ (constant only); this is the default setting.

For version 3, regression is ‘ct’ (constant and trend).

I rerun the test with three different settings, and below are the new results.

For version 1, the p-value is 0.09. We should not reject the Null hypothesis.

For version 2, the p-value is 0.01. We have already treated it as a problem because this is the default setting.

For version 3, the p-value is 0.00. That is expected because the time series is indeed stationary with a deterministic trend.

So for this example data, if we test it with version 1 (regression=’n’), we won’t say it’s stationary. We probably shouldn’t use the default setting. But you may also wonder, why did the constant term make such a big difference here? Let’s dig more.

From the DF test to linear regression

Based on the above definition, the DF test is just linear regression. Figure 3 shows all the points for the linear regression. The Y axis is ∆yᵢ, the X axis is yᵢ₋₁, and uᵢ is the residual. Version 1 means we must fit a line without intercept (no constant). Version 2 means we must fit a line with intercept (constant).

Figure 3. ∆yᵢ and yᵢ₋₁ (Image by the Author)

Skitlearn LinearRegression supports those two options well with the parameter fit_intercept. Below Figure 4 is the two fitted lines. As you can see, the line with the intercept fits better than the line without the intercept. The R-squared score also confirmed it clearly. Also, note that the orange line’s slope is smaller than the blue line’s. In other words, the slope of the orange line is closer to 0.

Figure 4. Linear regression result (Image by the Author)

We can also think by visualization: the points are not centred around (0,0), so the fitted regression line should not pass (0,0). The intercept should be greater than 0. Because 0 is the starting mean, ∆y should be greater than 0, so the overall mean will increase. If we force the regression line to go through (0,0), it will underfit the data, and the slope will become closer to 0 because of the influence from (0,0).

We have seen whether including the intercept impacts fit the linear regression. Why does how a line is fitted impact the ADF test results, and where does the p-value come from?

From linear regression to p-value

Now is where getting a bit complicated. The final p-value of the DF test is not from the coefficients’ p-value from the linear regression. Basically, the statistic has a specific distribution known as the Dickey-Fuller table. Then we use MacKinnon’s approximate p-value for the test statistic. You may find the details in the Statsmodels source code.

Because the null hypothesis is δ=0, meaning testing the slope of the fitted line is 0. We don’t need to go into details about how to get the p-value. The logic chain of the association between the p-value and the slope (δ in linear regression, not the trending slope) is like this:

Generally, if the slope is far from 0, the p-value should be smaller, more likely rejecting the null hypothesis and suggesting no unit root and stationary. If the slope is 0 or very close to 0, the p-value should be higher, more likely accepting the null hypothesis and suggesting unit root and non-stationary. For the second scenario, Wikipedia says, “The tests have low statistical power in that they often cannot distinguish between true unit-root processes (δ = 0) and near unit-root processes (δ is close to 0)”. That’s why we have the problem in the first place. We are dealing with a near-unit-root process. Version 1 finds a unit root and version 2 cannot find a unit root.

Why does version 1 work for the above example?

Version 1 works for data in Figure 2 because we make the slope closer to 0, so the p-value is higher.

However, we cannot use version 1 as the default setting. There are two cases for version 1 (forcing the line to go through (0,0) ):

Case 1: (0,0) is closer to all the data points. If the line has to go through (0,0), then the line will be flatter, and the slope will come closer to 0. Figure 4 demonstrates this case. Please note that the demonstration in Figure 4 only fits one variable yᵢ₋₁, the actual ADF will fit more lag variables.

Case 2: (0,0) is far away from all the data points. If the line must go through (0,0), we may fail the fitting; essentially, the slope is 0, meaning we cannot find a linear relationship of ∆yᵢ and yᵢ₋₁ such that a line will pass (0,0) and cover most of the data points. Therefore, the test result will be biased towards having a root unit.

Figure 5 below shows an example of the Version 1 test failing to reject the Null hypothesis (p-value 0.6), and the data is stationary with a mean of 10. Figure 6 explains the reason. As you can see, we cannot find a line without intercept (R-squared is 0), so the slope of the fitted line is 0 (∆yᵢ does not depend on yᵢ₋₁).

Figure 5. Version 1 failed to recognize a stationary time series (Image By the Author)
Figure 6. Linear regression fails to find a line without intercept (passing (0,0)) (Image by the Author)

From the DF test to the ADF test

Now we understand that the DF test is linear regression and how to get the p-value from the linear regression, let’s move on to ADF. The formula of ADF is:

Again, linear regression. The “Augmented” part is we have to fit more coefficients.

The statsmodels package allows showing a detailed summary of the AFT test. Figure 7 is the result.

Figure 7. ADF test with a detailed summary (Image by the Author)

We see ‘OLS Regression’ (the default solution for linear regression) and 17 coefficients. I didn’t specify the max lag, so the test will try up to a number based on the time series length, which is 17.

The const (intercept) is fitted as well. The value is 0.0596.

Let’s try to implement the ADF test by using linear regression in Scikit-learn. Figure 8 is the code and output.

Figure 8. ADF test (just the linear regression part) with the Scikit-learn (Image by Author)

The intercept is 0.0596, and the other coefficients are the same as in Figure 7. The linear regression in Scikit-learn is just plain OLS. We are doing the same thing, so it is no surprise that the results are identical.

The end of the journey

After I figured out how to set the parameter, I tested the original time series in Figure 1 using version 1 (regression =’n’) and got the p-value of 0.08, suggesting it’s not stationary. Please note that the data in Figure 1 is zero-mean, so you can imagine that (0,0) is closer to the data points (∆yᵢ, yᵢ₋₁). Using the version 1 test will help us.

Because the trending slope in Figure 1 is minimal, we can also resample the time series with steps, which increases the slope. For instance, if I test it with four steps ( value[::4] ), it won’t pass the ADF test with the default setting (p-value is 0.17 for regression=’c’).

Problem solved.


Don’t trust ADF results blindly. Visualization is your friend.

ADF test is a simple linear regression, and the implementation of statsmodels uses OLS to solve the regression problem. Then it uses the Dickey–Fuller table to extract the p-value which validates the Null hypothesis that the coefficient of the first lag variable from the fitted regression is 0.

ADF test has limitations when testing the near unit-root processes (δ is close to 0).

We need to choose the proper ADF version accordingly. For instance, when you see a constant trend and want to test ‘trend stationary,’ you need to select ‘ct’ as the parameter. If you’re going to catch a slow trend for a signal whose mean is supposed to be 0 like in Figure 1 and Figure 2, maybe you need to select ’n’ as the parameter to avoid the impact of fitting the intercept. Statsmodels also support quantic trend with the parameter ‘ctt.’ This advanced option could be a good choice for some cases. In case you want to dig further, Please refer to Dealing with uncertainty about including the intercept and deterministic time trend terms.

I hope you have learned something about the ADF test.

Have fun with your time series!

Contact me on LinkedIn.

PS: I have experience and passion for time series data. If you like this article, you may be interested in my other posts about time series.


Notebook file on GitHub

Why Is This Trending Time Series Stationary? Republished from Source https://towardsdatascience.com/why-is-this-trending-time-series-stationary-f3fb9447336f?source=rss—-7f60cf5620c9—4 via https://towardsdatascience.com/feed




Source link