时间序列分析之趋势判断

时间序列分析之趋势判断

时间序列分析之趋势判断

判断时间序列数据是上升还是下降是我们常见的问题。比如某个股票在过去一年整体趋势是上升还是下降。我们可以通过画图的方式直接观测出上升还是下降。但每次观测图片非常的麻烦，有没有一些数学方法进行检验？

方案一：直接统计法

假设A序列有N个数。则：

∑

(

∣

−

∣

)

S=\sum_{m=1}^{n}{(|A_m-A_{m-1}|)}

$S = \sum_{m = 1}^{n} (∣ A_{m} - A_{m - 1} ∣)$

当序列单调时：

∣

−

∣

S=|A_n-A_0|

$S = ∣ A_{n} - A_{0} ∣$ ，否则

∣

−

∣

S>|A_n-A_0|

$S > ∣ A_{n} - A_{0} ∣$ 。

求

∣

−

∣

\frac{|A_n-A_0|}{S}

$\frac{∣ A _{n} - A _{0} ∣}{S}$ 可以大概标识震荡的幅度，你可以设置阈值来判断即可。

该方法存在的问题，又有取得差值的绝对值，所以无法确定时上升还是下降。但是可以确切的知道震荡的幅度。

方案二：斜率法

斜率方法比较的简单，核心是使用最小二乘法将序列拟合成直线。后根据直线的斜率k去取得序列的走势。例如：

0.1763

k>0.1763

$k > 0.1763$ 表示上升

−

0.1763

k<-0.1763

$k < - 0.1763$ 表示下降

其他，则表示平稳或震荡

上面的判断中的0.1763 实际含义为：

∘

0.1763

tan 10^ \circ=0.1763

$t an 1 0^{\circ} = 0.1763$ ，即直线的倾斜角度，你也可以使用其他的值进行替换。

最后，如果想要评定震荡幅度情况，则可以通过残差平方和SSE进行评估。

import numpy as np


def trendline(index, data, order=1):
    coeffs = np.polyfit(index, list(data), order)
    slope = coeffs[-2]
    return float(slope)


index = [1, 2, 3, 4]
List = [1043, 6582, 5452, 7571]
resultent = trendline(index, List)
print(resultent)

1845.3999999999999

如果返回的是正数则为增长，如果返回的是负数，则为下降，如果为0则表示没有趋势。

方案三：Cox-Stuart趋势检验

Cox-Stuart是一种不依赖趋势结构的快速判断趋势是否存在的方法。Cox-Stuart趋势存在性检验的理论基础是符号检验。它的检验思想是：直接考虑数据的变化趋势，若数据有上升趋势，那么排在后面的数据的值要比排在前面的数据的值显著的大，反之，若数据有下降趋势，那么排在后面的数据的值要比排在前面的数据的值显著的小，利用前后两个时期不同数据的差值正负来判断数据总的变化趋势。为保证数对同分布，前后两个数的间隔应固定。这就意味着将数据一分为二，自然形成前后数对。Cox-Staut提出最优的拆分点是数列中位于中间位置的数。

把每一个观测值与相隔n/2得另一个观测值进行配对比较，因此大致会产生n/2个数据对。然后看增长得对子和减少得对子各有多少的数量来判断总的趋势。

检验步骤：可以把每一个观察值和相隔大约n/2的另一个观察值配对比较，因此大约有n/2个对子。然后看增长的对子和减少的对子多少来判断总的趋势。

具体做法：用每一对数据中的两个元素做差，利用差值的符号来衡量是递增还是递减，一般使用

−

N=X_i-X_{i+c}

$N = X_{i} - X_{i + c}$ 。令S+为差值为正的数目，S-为插值为负的数目。当S+的数目多，即正号太多时有下降趋势，否则，有增长趋势。且当插值为0时，不计数。在没有趋势的零假设下，数据服从二项分布

(

0.5

)

B(N,0.5)

$B (N, 0.5)$ 。

import scipy.stats as stats


def cos_staut(list_c, debug=True):
    lst = list_c.copy()
    raw_len = len(lst)
    if raw_len % 2 == 1:
        del lst[int((raw_len - 1) / 2)]
    c = int(len(lst) / 2)
    n_pos = n_neg = 0
    for i in range(c):
        diff = lst[i + c] - lst[i]
        if diff > 0:
            n_pos += 1
        elif diff < 0:
            n_neg += 1
        else:
            continue
    num = n_pos + n_neg
    k = min(n_pos, n_neg)
    p_value = 2 * stats.binom.cdf(k, num, 0.5)
    if debug:
        print("fall:%i, rise:%i, p-value:%f" % (n_neg, n_pos, p_value))
    print(n_pos)
    if n_pos > n_neg and p_value < 0.05:
        return "increasing"
    elif n_neg > n_pos and p_value < 0.05:
        return "decreasing"
    else:
        return "no trend"


cos_staut(
    [
        206,
        223,
        235,
        264,
        229,
        217,
        188,
        204,
        182,
        230,
        223,
        227,
        242,
        238,
        207,
        208,
        216,
        233,
        233,
        274,
        234,
        227,
        221,
        214,
        226,
        228,
        235,
        237,
        243,
        240,
        231,
        210,
    ]
)

fall:2, rise:14, p-value:0.004181
14





'increasing'

方案四：Mann-Kendall趋势检验法

https://vsp.pnnl.gov/help/Vsample/Design_Trend_Mann_Kendall.htm

Mann-Kendall(MK)检验(test)(Mann 1945, Kendall 1975, Gilbert 1987) 的目的是统计评估我们所感兴趣的变量，随着时间变化，是否有单调上升或下降的趋势。单调上升（下降）的趋势意味着该变量随时间增加（减少），但此趋势可能是、也可能不是线性的。MK test可替代参数线性回归分析——线性回归可检验线性拟合直线的斜率是否不为零。回归分析要求拟合回归线的残差是正态分布的，MK检验不需要这种假设，MK检验是非参数检验（不要求服从任何分布-distribution free）

以下这些假设是MK检验的基础：

当没有趋势时，随时间获得的数据是独立同分布的。独立的假设是说数据随着时间不是连续相关的。
所获得的时间序列上的数据代表了采样时的真实条件。（样本具有代表性）
样本的采集、处理和测量方法提供了总体样本中的无偏且具有代表性的观测值。

MK检验不要求数据是正态分布，也不要求变化趋势——如果存在的话——是线性的。如果有缺失值或者值低于一个或多个检测限制，是可以计算MK检测的，但检测性能会受到不利影响。独立性假设要求样本之间的时间足够大，这样在不同时间收集的测量值之间不存在相关性。

MK检验是检验是否拒绝零假设(null hypothesis:

H_0

$H_{0}$ )，并接受替代假设（alternative hypothesis:

H_a

$H_{a}$ ）：

H_0

$H_{0}$ ：没有单调趋势

H_a

$H_{a}$ ：存在单调趋势

最初的假设是：

H_0

$H_{0}$ 为真，在拒绝

H_0

$H_{0}$ 并接受

H_a

$H_{a}$ 之前，数据必须要超出合理怀疑——要到达一定的置信度。MK检验的流程：

1、将数据按采集时间列出：

…

x_1,x_2,\dots,x_n

$x_{1}, x_{2}, \dots, x_{n}$ ，即分别在时间

…

1,2,\dots,n

$1, 2, \dots, n$ 得到的数据。

2、确定所有

(

−

)

\frac{n(n-1)}{2}

$\frac{n ( n - 1 )}{2}$ 个

−

x_j-x_k

$x_{j} - x_{k}$ 差值的符号，其中

j > k

$j > k$ ，即

−

…

−

…

−

…

−

x_2-x_1,x_3-x_1,\dots,x_n-x_1,x_3-x_2,x_4-x_2,\dots,x_n-x_2,\dots,x_n-x_{n-1}

$x_{2} - x_{1}, x_{3} - x_{1}, \dots, x_{n} - x_{1}, x_{3} - x_{2}, x_{4} - x_{2}, \dots, x_{n} - x_{2}, \dots, x_{n} - x_{n - 1}$

3、令

(

−

)

sgn(x_j−x_k)

$s g n (x_{j} - x_{k})$ 作为指示函数，依据

−

x_j−x_k

$x_{j} - x_{k}$ 的正负号取值为1,0或-1

4、计算

∑

−

∑

(

−

)

S=\sum_{k=1}^{n-1}\sum_{j=k+1}^{n}{sgn(x_j-x_k)}

$S = \sum_{k = 1}^{n - 1} \sum_{j = k + 1}^{n} s g n (x_{j} - x_{k})$ 。即差值为正的数量减去差值为负的数量。如果S是一个正数，那么后一部分的观测值相比之前的观测值会趋向于变大；如果S是一个负数，那么后一部分的观测值相比之前的观测值会趋向于变小。

5、如果

≤

n\le10

$n \leq 10$ ，依据Gilbert (1987, page 209, Section 16.4.1)中所描述，要在概率表 (Gilbert 1987, Table A18, page 272) 中查找S。如果此概率小于

\alpha

$α$ (认为没有趋势时的截止概率)，那就拒绝零假设，认为趋势存在。如果在概率表中找不到n(存在结数据——tied data values——会发生此情况)，就用表中远离0的下一个值。比如S=12，如果概率表中没有S=12，那么就用S=13来处理也是一样的。如果

n > 10

$n > 10$ ，则依以下步骤6-10来判断有无趋势。这里遵循的是Gilbert (1987, page 211, Section 16.4.2)中的程序。

6、计算S的方差如下：VAR(S)=

(

)

[

(

−

)

(

)

−

∑

−

(

−

)

(

)

]

VAR(S)=\frac{1}{18}[n(n-1)(2n+5)-\sum_{p-1}^{g}t_p(t_p-1)(2t_p+5)]

$V A R (S) = \frac{1}{18} [n (n - 1) (2 n + 5) - \sum_{p - 1}^{g} t_{p} (t_{p} - 1) (2 t_{p} + 5)]$ 。其中g是结组（tied groups）的数量，也就是重复数据的数量，

t_p

$t_{p}$ 是第

$p$ 组的观测值的数量。例如：在观测值的时间序列

{23, 24, 29, 6, 29, 24, 24, 29, 23}

$23, 24, 29, 6, 29, 24, 24, 29, 23$ 中有

g = 3

$g = 3$ 个结组，相应地，对于结值(tiied value)23有

t_1=2

$t_{1} = 2$ 、结值24有

t_2=3

$t_{2} = 3$ 、结值29有

t_3=3

$t_{3} = 3$ 。当因为有相等值或未检测到而出现结时，

(

)

VAR(S)

$V A R (S)$ 可以通过Helsel (2005, p. 191)中的结修正方法来调整。

7、计算MK检验统计量

Z_{MK}

$Z_{M K}$ :

{

−

(

)

(

)

Z_{MK}=\left\{ \begin{matrix} \frac{S-1}{\sqrt{VAR(S)}} ~~~~ if~ S>0 \\ 0~~~~ if ~S=0 \\ \frac{S+1}{\sqrt{VAR(S)}} ~~~~ if~ S<0 \end{matrix} \right.

$Z_{M K} = ⎩ ⎨ ⎧ \frac{S - 1}{V A R ( S )} i f S > 0 0 i f S = 0 \frac{S + 1}{V A R ( S )} i f S < 0$

正（负）值

Z_{MK}

$Z_{M K}$ 为表示数据倾向于随时间增加（减少）。

8、设想我们要测试零假设。

H_0

$H_{0}$ (没有单调趋势)对比替代假设

H_a

$H_{a}$ (有单调增趋势)，其1型错误率为

alpha

$a lp ha$ ，

0.50

0<\alpha<0.50

$0 < α < 0.50$ （注意

\alpha

$α$ 是MK检验错误地拒绝了零假设时可容忍的概率——即MK检验拒绝了零假设是错误地，但这个事情发生概率是

\alpha

$α$ ，我们可以容忍这个错误）。如果

≥

−

Z_{MK}\geq Z_1-\alpha

$Z_{M K} \geq Z_{1} - α$ ，就拒绝零假设

H_0

$H_{0}$ ，接受替代假设

H_a

$H_{a}$ ，其中

−

Z_1-\alpha

$Z_{1} - α$ 是标准正态分布的

100

(

−

)

100(1-\alpha)^{th}

$100 (1 - α)^{t h}$ 百分位。

9、测试上面的

H_0

$H_{0}$ 与

H_a

$H_{a}$ （有单调递减趋势），其1型错误率为

\alpha

$α$ ，

0.5

0<\alpha<0.5

$0 < α < 0.5$ ，如果

≤

−

Z_{MK}\leq -Z_{1-\alpha}

$Z_{M K} \leq - Z_{1 - α}$ ，就拒绝零假设

H_0

$H_{0}$ ，接受替代假设

H_a

$H_{a}$

10、测试上面的

H_0

$H_{0}$ 与

H_a

$H_{a}$ （有单调递增或递减趋势），其1型错误率为

\alpha

$α$ ，

0.5

0<\alpha<0.5

$0 < α < 0.5$ ，如果

∣

≤

−

|Z_{MK}|\leq -Z_{1-\frac{\alpha}{2}}

$∣ Z_{M K} ∣ \leq - Z_{1 - \frac{α}{2}}$ ，就拒绝零假设

H_0

$H_{0}$ ，接受替代假设

H_a

$H_{a}$ ，其中竖线代表绝对值。

import math

from scipy.stats import mstats, norm


def mk_test(x, alpha=0.05):
    """
    This function is derived from code originally posted by Sat Kumar Tomer
    (satkumartomer@gmail.com)
    See also: http://vsp.pnnl.gov/help/Vsample/Design_Trend_Mann_Kendall.htm
    The purpose of the Mann-Kendall (MK) test (Mann 1945, Kendall 1975, Gilbert
    1987) is to statistically assess if there is a monotonic upward or downward
    trend of the variable of interest over time. A monotonic upward (downward)
    trend means that the variable consistently increases (decreases) through
    time, but the trend may or may not be linear. The MK test can be used in
    place of a parametric linear regression analysis, which can be used to test
    if the slope of the estimated linear regression line is different from
    zero. The regression analysis requires that the residuals from the fitted
    regression line be normally distributed; an assumption not required by the
    MK test, that is, the MK test is a non-parametric (distribution-free) test.
    Hirsch, Slack and Smith (1982, page 107) indicate that the MK test is best
    viewed as an exploratory analysis and is most appropriately used to
    identify stations where changes are significant or of large magnitude and
    to quantify these findings.
    Input:
        x:   a vector of data
        alpha: significance level (0.05 default)
    Output:
        trend: tells the trend (increasing, decreasing or no trend)
        h: True (if trend is present) or False (if trend is absence)
        p: p value of the significance test
        z: normalized test statistics
    Examples
    --------
      >>> x = np.random.rand(100)
      >>> trend,h,p,z = mk_test(x,0.05)
    """
    n = len(x)

    # calculate S
    s = 0
    for k in range(n - 1):
        for j in range(k + 1, n):
            s += np.sign(x[j] - x[k])

    # calculate the unique data
    unique_x, tp = np.unique(x, return_counts=True)
    g = len(unique_x)

    # calculate the var(s)
    if n == g:  # there is no tie
        var_s = (n * (n - 1) * (2 * n + 5)) / 18
    else:  # there are some ties in data
        var_s = (n * (n - 1) * (2 * n + 5) - np.sum(tp * (tp - 1) * (2 * tp + 5))) / 18

    if s > 0:
        z = (s - 1) / np.sqrt(var_s)
    elif s < 0:
        z = (s + 1) / np.sqrt(var_s)
    else:  # s == 0:
        z = 0

    # calculate the p_value
    p = 2 * (1 - norm.cdf(abs(z)))  # two tail test
    h = abs(z) > norm.ppf(1 - alpha / 2)

    if (z < 0) and h:
        trend = "decreasing"
    elif (z > 0) and h:
        trend = "increasing"
    else:
        trend = "no trend"

    return trend


mk_test(
    [
        206,
        223,
        235,
        264,
        229,
        217,
        188,
        204,
        182,
        230,
        223,
        227,
        242,
        238,
        207,
        208,
        216,
        233,
        233,
        274,
        234,
        227,
        221,
        214,
        226,
        228,
        235,
        237,
        243,
        240,
        231,
        210,
    ]
)

'no trend'