Markov switching dynamic regression models

reference:

  • example: https://www.statsmodels.org/dev/examples/notebooks/generated/markov_regression.html
  • doc: https://www.statsmodels.org/dev/generated/statsmodels.tsa.regime_switching.markov_regression.MarkovRegression.html

Steps

  1. download market data using yfinance: download S&P 500 (‘^GSPC')
  2. calculate return 20 day max return (i.e. target in supervised learning problem):
    • for each date (T):
    • calculate the max price change in next 20 trading dates: price_change = (max{close price in T+1 to T+20} - {close price on T})/({close price on T})
  3. create exogenous variables: lagged dependent variable
    • lag 1 of target variable
    • lag 20 of target varable

Markov switching parameters

  • endog: The endogenous variable. the dependent variable (i.e. the target - 20 day max return)
  • k_regimes: The number of regimes.
  • trend: Whether or not to include a trend. Default is an intercept.
    • include an intercept: trend='c'
    • include time trend: trend='t'
    • include an intercept and time trend: trend='ct'
    • no trend: trend='n'
  • exog:exogenous regressors
  • switching_trend: whether or not all trend coefficients are switching across regimes. Default is True.
  • switching_exog:whether or not all regression coefficients are switching across regimes. Default is True.
  • switching_variance: Whether or not there is regime-specific heteroskedasticity, i.e. whether or not the error term has a switching variance. Default is False.

Summary

  1. switching intercept: 2 regimes v. 5 regimes
    • set k_regimes as 2 or 5 and leave the rest as default
  2. switching intercept and lagged dependent variable:
    • k_regimes = 3
    • lag 1 and lag20 as exog variables
import numpy as np
import pandas as pd
import statsmodels.api as sm

from datetime import datetime, timedelta
import yfinance as yf #to download stock price data

download S&P 500 price data

ticker = '^GSPC'
cur_data = yf.Ticker(ticker)
hist = cur_data.history(period="max")
print(ticker, hist.shape, hist.index.min(), hist.index.max())
^GSPC (19720, 7) 1927-12-30 00:00:00 2021-11-05 00:00:00
df=hist[hist.index>='2000-01-01'].copy(deep=True)
df.head()
Open High Low Close Volume Dividends Stock Splits
Date
2000-01-03 1469.250000 1478.000000 1438.359985 1455.219971 931800000 0 0
2000-01-04 1455.219971 1455.219971 1397.430054 1399.420044 1009000000 0 0
2000-01-05 1399.420044 1413.270020 1377.680054 1402.109985 1085500000 0 0
2000-01-06 1402.109985 1411.900024 1392.099976 1403.449951 1092300000 0 0
2000-01-07 1403.449951 1441.469971 1400.729980 1441.469971 1225200000 0 0

calcualte max return in next 20 trading days

#for each stock_id, get the max close in next 20 trading days
price_col = 'Close'
roll_len=20
new_col = 'next_20day_max'
target_list = []

df.sort_index(ascending=True, inplace=True)
df.head(3)
Open High Low Close Volume Dividends Stock Splits
Date
2000-01-03 1469.250000 1478.000000 1438.359985 1455.219971 931800000 0 0
2000-01-04 1455.219971 1455.219971 1397.430054 1399.420044 1009000000 0 0
2000-01-05 1399.420044 1413.270020 1377.680054 1402.109985 1085500000 0 0
df_next20dmax=df[[price_col]].shift(1).rolling(roll_len).max()
df_next20dmax.columns=[new_col]
df = df.merge(df_next20dmax, right_index=True, left_index=True, how='inner')

df.dropna(how='any', inplace=True)
df['target']= 100*(df[new_col]-df[price_col])/df[price_col]  
df.head(3)
Open High Low Close Volume Dividends Stock Splits next_20day_max target
Date
2000-02-01 1394.459961 1412.489990 1384.790039 1409.280029 981000000 0 0 1465.150024 3.964435
2000-02-02 1409.280029 1420.609985 1403.489990 1409.119995 1038600000 0 0 1465.150024 3.976243
2000-02-03 1409.119995 1425.780029 1398.520020 1424.969971 1146500000 0 0 1465.150024 2.819712
df['lag1']=df['target'].shift(1)
df['lag20']=df['target'].shift(20)
df.dropna(how='any', inplace=True)
df.head(3)
Open High Low Close Volume Dividends Stock Splits next_20day_max target lag1 lag20
Date
2000-03-01 1366.420044 1383.459961 1366.420044 1379.189941 1274100000 0 0 1441.719971 4.533823 5.510745 3.964435
2000-03-02 1379.189941 1386.560059 1370.349976 1381.760010 1198600000 0 0 1441.719971 4.339390 4.533823 3.976243
2000-03-03 1381.760010 1410.880005 1381.760010 1409.170044 1150300000 0 0 1441.719971 2.309865 4.339390 2.819712
df['target'].plot.line(figsize=(12, 4))
<AxesSubplot:xlabel='Date'>

png

Markov switching with switching intercept: 2 regimes

  • set k_regimes=2: assuming 2 regimes
  • leave the rest as default
# Fit the model
# (a switching mean is the default of the MarkovRegession model)
markov_reg = sm.tsa.MarkovRegression(df['target'], k_regimes=2)
res_target = markov_reg.fit()
res_target.summary()
Markov Switching Model Results
Dep. Variable: target No. Observations: 5458
Model: MarkovRegression Log Likelihood -13468.861
Date: Sat, 06 Nov 2021 AIC 26947.723
Time: 21:37:15 BIC 26980.747
Sample: 0 HQIC 26959.246
- 5458
Covariance Type: approx
Regime 0 parameters
coef std err z P>|z| [0.025 0.975]
const 1.6017 0.042 38.541 0.000 1.520 1.683
Regime 1 parameters
coef std err z P>|z| [0.025 0.975]
const 13.3001 0.237 56.154 0.000 12.836 13.764
Non-switching parameters
coef std err z P>|z| [0.025 0.975]
sigma2 7.5238 0.147 51.137 0.000 7.235 7.812
Regime transition parameters
coef std err z P>|z| [0.025 0.975]
p[0->0] 0.9946 0.001 923.936 0.000 0.992 0.997
p[1->0] 0.0701 0.014 5.182 0.000 0.044 0.097



Warnings:
[1] Covariance matrix calculated using numerical (complex-step) differentiation.

  • note when P> z is not small (typically less than 0.05), we accept null hypothesis.
  • From the summary output, the first regime (the "low regime") is estimated to be 1.6 whereas in the "high regime" it is 13.3. Below we plot the smoothed probabilities of being in the high regime.
res_target.smoothed_marginal_probabilities[[0]].plot(
    title="Probability of being in the low regime", figsize=(12, 3)
)
res_target.smoothed_marginal_probabilities[[1]].plot(
    title="Probability of being in the high regime", figsize=(12, 3)
)
<AxesSubplot:title={'center':'Probability of being in the high regime'}, xlabel='Date'>

png

png

  • From the estimated transition matrix we can calculate the expected duration of a low regime versus a high regime.
print(res_target.expected_durations)
[185.00635239  14.26299654]

Markov switching with switching intercept: 5 regimes

  • set k_regimes=5: assuming 5 regimes
  • leave the rest as default
# Fit the model
# (a switching mean is the default of the MarkovRegession model)
markov_reg = sm.tsa.MarkovRegression(df['target'], k_regimes=5)
res_target = markov_reg.fit()
res_target.summary()
Markov Switching Model Results
Dep. Variable: target No. Observations: 5458
Model: MarkovRegression Log Likelihood -10388.194
Date: Sat, 06 Nov 2021 AIC 20828.388
Time: 21:39:57 BIC 21000.114
Sample: 0 HQIC 20888.309
- 5458
Covariance Type: approx
Regime 0 parameters
coef std err z P>|z| [0.025 0.975]
const 0.5066 0.027 18.914 0.000 0.454 0.559
Regime 1 parameters
coef std err z P>|z| [0.025 0.975]
const 3.5746 0.072 49.828 0.000 3.434 3.715
Regime 2 parameters
coef std err z P>|z| [0.025 0.975]
const 7.7889 0.112 69.659 0.000 7.570 8.008
Regime 3 parameters
coef std err z P>|z| [0.025 0.975]
const 14.8677 0.140 106.019 0.000 14.593 15.143
Regime 4 parameters
coef std err z P>|z| [0.025 0.975]
const 30.9587 0.212 145.718 0.000 30.542 31.375
Non-switching parameters
coef std err z P>|z| [0.025 0.975]
sigma2 1.7108 0.036 47.337 0.000 1.640 1.782
Regime transition parameters
coef std err z P>|z| [0.025 0.975]
p[0->0] 0.9708 nan nan nan nan nan
p[1->0] 0.0780 0.001 55.775 0.000 0.075 0.081
p[2->0] 0.0213 nan nan nan nan nan
p[3->0] 0.0128 0.012 1.109 0.268 -0.010 0.035
p[4->0] 8.241e-06 0.000 0.019 0.985 -0.001 0.001
p[0->1] 0.0278 0.001 24.099 0.000 0.026 0.030
p[1->1] 0.8680 0.001 1313.738 0.000 0.867 0.869
p[2->1] 0.1157 0.005 24.987 0.000 0.107 0.125
p[3->1] 0.0137 0.012 1.183 0.237 -0.009 0.036
p[4->1] 3.511e-06 0.001 0.004 0.997 -0.002 0.002
p[0->2] 0.0013 0.002 0.563 0.574 -0.003 0.006
p[1->2] 0.0540 nan nan nan nan nan
p[2->2] 0.8007 0.027 30.029 0.000 0.748 0.853
p[3->2] 0.1635 6.57e-10 2.49e+08 0.000 0.164 0.164
p[4->2] 1.364e-06 0.000 0.003 0.998 -0.001 0.001
p[0->3] 1.016e-05 nan nan nan nan nan
p[1->3] 6.434e-05 0.002 0.028 0.978 -0.004 0.005
p[2->3] 0.0623 5.27e-07 1.18e+05 0.000 0.062 0.062
p[3->3] 0.8100 4.42e-09 1.83e+08 0.000 0.810 0.810
p[4->3] 8.906e-07 4.43e-07 2.009 0.045 2.16e-08 1.76e-06



Warnings:
[1] Covariance matrix calculated using numerical (complex-step) differentiation.

for i in range(5):
    res_target.smoothed_marginal_probabilities[[i]].plot(
        title=f"Probability of being in the {i} regime", figsize=(12, 3)
    )

png

png

png

png

png

print(res_target.expected_durations)
[3.42706644e+01 7.57564599e+00 5.01865546e+00 5.26380977e+00
 7.13942479e+04]

Markov switching with switching intercept and exogenous variables

  • set k_regimes=3: assuming 3 regimes
  • lag 1 and lag20 as exogenous variables
  • Because the models can be often difficult to estimate, for the 3-regime model we employ a search over starting parameters to improve results, specifying 50 random search repetitions.
# Fit the model
# (a switching mean is the default of the MarkovRegession model)
markov_reg = sm.tsa.MarkovRegression(df['target'], k_regimes=3, exog=df[['lag1', 'lag20']])
res_target = markov_reg.fit()
res_target.summary()
Markov Switching Model Results
Dep. Variable: target No. Observations: 5458
Model: MarkovRegression Log Likelihood -8402.891
Date: Sat, 06 Nov 2021 AIC 16837.782
Time: 21:40:27 BIC 16943.459
Sample: 0 HQIC 16874.656
- 5458
Covariance Type: approx
Regime 0 parameters
coef std err z P>|z| [0.025 0.975]
const 0.2457 0.030 8.124 0.000 0.186 0.305
x1 0.7263 0.006 121.947 0.000 0.715 0.738
x2 -0.0624 0.006 -10.622 0.000 -0.074 -0.051
Regime 1 parameters
coef std err z P>|z| [0.025 0.975]
const 0.0278 0.034 0.821 0.412 -0.039 0.094
x1 0.9996 0.006 171.072 0.000 0.988 1.011
x2 0.0788 0.008 9.287 0.000 0.062 0.095
Regime 2 parameters
coef std err z P>|z| [0.025 0.975]
const 0.3858 0.118 3.274 0.001 0.155 0.617
x1 1.3538 0.014 98.518 0.000 1.327 1.381
x2 0.0428 0.016 2.653 0.008 0.011 0.074
Non-switching parameters
coef std err z P>|z| [0.025 0.975]
sigma2 0.8872 0.021 42.126 0.000 0.846 0.928
Regime transition parameters
coef std err z P>|z| [0.025 0.975]
p[0->0] 0.5232 0.039 13.336 0.000 0.446 0.600
p[1->0] 0.4120 0.029 14.022 0.000 0.354 0.470
p[2->0] 0.3652 0.055 6.611 0.000 0.257 0.473
p[0->1] 0.4031 0.041 9.779 0.000 0.322 0.484
p[1->1] 0.4723 0.034 14.089 0.000 0.407 0.538
p[2->1] 0.5848 0.057 10.222 0.000 0.473 0.697



Warnings:
[1] Covariance matrix calculated using numerical (complex-step) differentiation.

for i in range(3):
    res_target.smoothed_marginal_probabilities[[i]].plot(
        title=f"Probability of being in the {i} regime", figsize=(12, 3)
    )

png

png

png

np.random.seed(5678)

markov_reg = sm.tsa.MarkovRegression(df['target'], k_regimes=3, 
                                     trend="c", #{‘n', ‘c', ‘t', ‘ct'}
                                     #switching_trend=False, 
                                     #switching_exog=False,
                                     switching_variance=True, 
                                     exog=df[['lag20']]
                                    )
res_target = markov_reg.fit(search_reps=50, method='bfgs')
res_target.summary()
Markov Switching Model Results
Dep. Variable: target No. Observations: 5458
Model: MarkovRegression Log Likelihood -9471.406
Date: Sat, 06 Nov 2021 AIC 18972.812
Time: 21:40:58 BIC 19071.885
Sample: 0 HQIC 19007.382
- 5458
Covariance Type: approx
Regime 0 parameters
coef std err z P>|z| [0.025 0.975]
const 0.2896 0.018 16.545 0.000 0.255 0.324
x1 -0.0239 0.005 -4.748 0.000 -0.034 -0.014
sigma2 0.4468 0.018 25.439 0.000 0.412 0.481
Regime 1 parameters
coef std err z P>|z| [0.025 0.975]
const 2.9133 0.061 47.784 0.000 2.794 3.033
x1 -0.0071 0.011 -0.663 0.507 -0.028 0.014
sigma2 1.7139 0.106 16.204 0.000 1.507 1.921
Regime 2 parameters
coef std err z P>|z| [0.025 0.975]
const 8.5583 0.306 27.989 0.000 7.959 9.158
x1 0.3187 0.043 7.347 0.000 0.234 0.404
sigma2 36.3329 1.987 18.285 0.000 32.438 40.227
Regime transition parameters
coef std err z P>|z| [0.025 0.975]
p[0->0] 0.9523 nan nan nan nan nan
p[1->0] 0.0785 0.007 11.245 0.000 0.065 0.092
p[2->0] 0.0029 0.003 1.114 0.265 -0.002 0.008
p[0->1] 0.0477 0.002 30.042 0.000 0.045 0.051
p[1->1] 0.8910 0.008 106.474 0.000 0.875 0.907
p[2->1] 0.0673 0.010 6.811 0.000 0.048 0.087



Warnings:
[1] Covariance matrix calculated using numerical (complex-step) differentiation.

for i in range(3):
    res_target.smoothed_marginal_probabilities[[i]].plot(
        title=f"Probability of being in the {i} regime", figsize=(12, 3)
    )

png

png

png

np.random.seed(5678)

markov_reg = sm.tsa.MarkovRegression(df['target'].iloc[:-100], k_regimes=3, 
                                     trend="c", #{‘n', ‘c', ‘t', ‘ct'}
                                     #switching_trend=False, 
                                     #switching_exog=False,
                                     switching_variance=True, 
                                     exog=df[['lag20']].iloc[:-100]
                                    )
res_target = markov_reg.fit(search_reps=50, method='bfgs')
res_target.summary()
Markov Switching Model Results
Dep. Variable: target No. Observations: 5358
Model: MarkovRegression Log Likelihood -9347.917
Date: Sat, 06 Nov 2021 AIC 18725.834
Time: 21:41:29 BIC 18824.629
Sample: 0 HQIC 18760.339
- 5358
Covariance Type: approx
Regime 0 parameters
coef std err z P>|z| [0.025 0.975]
const 0.2965 0.018 16.736 0.000 0.262 0.331
x1 -0.0233 0.005 -4.849 0.000 -0.033 -0.014
sigma2 0.4539 0.016 28.239 0.000 0.422 0.485
Regime 1 parameters
coef std err z P>|z| [0.025 0.975]
const 2.9293 0.064 46.106 0.000 2.805 3.054
x1 -0.0064 0.011 -0.601 0.548 -0.027 0.015
sigma2 1.7220 0.111 15.491 0.000 1.504 1.940
Regime 2 parameters
coef std err z P>|z| [0.025 0.975]
const 8.5903 0.308 27.899 0.000 7.987 9.194
x1 0.3182 0.043 7.435 0.000 0.234 0.402
sigma2 36.4287 1.965 18.542 0.000 32.578 40.279
Regime transition parameters
coef std err z P>|z| [0.025 0.975]
p[0->0] 0.9522 3.01e-05 3.16e+04 0.000 0.952 0.952
p[1->0] 0.0784 0.007 11.100 0.000 0.065 0.092
p[2->0] 0.0029 0.003 1.146 0.252 -0.002 0.008
p[0->1] 0.0478 9.76e-06 4895.569 0.000 0.048 0.048
p[1->1] 0.8905 0.009 104.534 0.000 0.874 0.907
p[2->1] 0.0675 0.010 6.828 0.000 0.048 0.087



Warnings:
[1] Covariance matrix calculated using numerical (complex-step) differentiation.