Equity Market Crisis Regime Prediction using Machine Learning GBDT

Part I: Introduction

This project builds an equity crisis regime classifier using gradient-boosted decision trees (GBDT), with a data pipeline that combines:

  • Yahoo Finance market data (via yfinance)
  • FRED macro/financial conditions series (via fredapi)
  • A reproducible feature engineering layer (changes, vol, z-scores, Sharpe, RSI, interactions)
  • A binary crisis vs normal target derived from S&P 500 dynamics
  • Part I: Introduction
  • Part II: Data Preparation
  • Part III: Exploratory Data Analysis
  • Part IV: Feature Selection, Hyperparameter Tuning (LightGBM)
  • Part V: Model Evaluation and Interpretation (LightGBM)
  • Part VI: SVM and Neural Networks (MLP and 1D-CNN) — SVM | MLP | 1D-CNN
  • Part VII: Compare GBDT Models: XGBoost and LightGBM
  • Part VIII: Deployment of LightGBM Models (end-to-end process)

Define the target variable

Assume there are two regimes for equity markets:

  • a normal regime where an asset manager should be long to benefit from the long bias of equity markets.
  • a crisis regime, where an asset manager should either reduce its equity exposure or even sell short it if the strategy is a long short one.

Binary classification target:

  • crisis regime: if returns (of S&P 500) are below the historical 5 percentile computed on the training data set. (encoded as 1)
  • normal regime: encoded as 0

Initial (raw) data and feature engineering

The ~150 data series grouped to the following categories:

  • The Risk Aversion metrics include the equities’ and G10/emerging currencies’ implied volatilities, the High Yield corporate credit bonds credit spreads, and the shape of the VIX forward curve, defined as the ratio of the VIX Spot over the VIX three-month forward. These indicators characterize the financial assets’ liquidity conditions or the accessibility of funding, two complementary measures of risk appetite.
  • Financial metrics include the one month, six months and one year growth of Earnings per Share, Price/Earnings and Price/Sales for each equity index. These indicators predict the earnings and sales growth cycle, while providing an insight into valuation multiples changes.
  • Macroeconomic indicators consist of the Citigroup Economic Surprise indices in the main economic zones (US,EU, Japan, Emerging, Worldwide). These indicators convey the cycle of positive or negative economic surprises on a daily basis.
  • US Yields change (10 years yield, 2 years yield, 10 year breakeven, US Libor) over the same horizons: one month, six months and one year. A change in yields may either reflect the business cycle, the inflation cycle, or the monetary stance of the Federal Reserve.
  • The steepness of the US yield curve is also computed as a difference between the government bond yield rate and the short term LIBOR rate on two distinct maturities (10 years, 2 years). This indicator is a well-known predictor of the economic cycle as it computes the spread between long term and short term rates.
  • Technical indicators comprise the put/call ratio (as provided by the CBOE), and the market breadth (the percentage of individual stocks above their respective 200 days Moving Average) on the six equity indices and the MSCI World ACWI. The Put/Call ratio may reflect extreme optimism or pessimism in the investors’ consensus while market breadth characterizes the unweighted average participation of individual stocks among the global equity indices.
  • Technical indicators from various asset classes are analyzed:
    • Excess returns of six equity indices, BCOM Energy and Industrial Metals, FX Emerging Bloomberg Index Excess Return (reflecting the aggregate evolution of 8 emerging currencies vs. the dollar), dollar index, as computed by the ICE US. Returns are computed over the same time horizons as before (one month, six months and one year),
    • Historical volatilities, computed over horizons of 10,20 and 30 days,
    • Distance to 250 days and 500 days moving average.
    • Sharpe Ratios of all the above-mentioned assets, evaluated over horizons of 6 months and 1 year.
  • Cyclical commodities, the dollar index as well as emerging currencies are often leading indicators of the economic cycle. Furthermore, cyclical asset returns and volatilities may either be used procyclically or countercyclically to predict an incoming crisis.

102 features are engineered upon the 150+ data series

  • 102 features for each of the 150+ series, making 10k+ features to fit into the feature selection process
  • These features are used to predict the crash probability in the equity markets.
  • These features capture the universal behaviors documented in (Kahneman 2011), namely herding and trending behavior, cross-market contagions, leverage procyclicality etc.
  • They also contained a mix of fundamental and technical indicators to capture the two main approaches used in the asset management industry.

Data used in this project

It is impossible to fully obtain the datasets described in the papers via public available dataset. So for this project:

  • market data is downloaded from yahoo using package yfinance
  • economic data is downloaded from fred.stlouisfed.org

↑ Top

© 2026 A W. Quantitative Research

This site uses Just the Docs, a documentation theme for Jekyll.