Anomaly Detection: Isolation Forest, VAE, Ensemble¶

Applies Salesforce Merlion to generate anomaly scores on a market-derived time series using three unsupervised detectors:

  • Isolation Forest
  • VAE (Variational Autoencoder)
  • Ensemble (Isolation Forest + VAE via DetectorEnsemble)

The goal is not labeled anomaly detection; instead, this analysis test whether anomaly scores carry information about forward upside potential.

Data & target¶

  • Data source: yfinance
  • Asset: S&P 500 (^GSPC)
  • Start date: 2000-01-01
  • Test split: last 400 observations

Forward 20-day max return target¶

For each date $$t$$ with close $$C_t$$:

$$ target_t = 100 \times \frac{\max(C_{t+1}, \dots, C_{t+20}) - C_t}{C_t} $$

Implementation detail: the max is computed over $$[t+1, t+20]$$ (starting tomorrow) to avoid look-ahead leakage.


Method¶

Models (Merlion)¶

  • IsolationForest(IsolationForestConfig())
  • VAE(VAEConfig())
  • DetectorEnsemble(...) over both detectors, using AggregateAlarms(alm_threshold=4)

Training & scoring¶

  • Train on the target series as a univariate TimeSeries (no anomaly labels).
  • Training anomaly scores are produced during .train(...).
  • Test anomaly scores are produced via .get_anomaly_score(test_data).

Evaluation (post-hoc)¶

This notebook treats the anomaly score as a candidate signal and reports:

  • Correlation between $$target_t$$ and each anomaly score on train
  • Correlation on test
  • Train → test correlation drop as a rough generalization check
In [1]:
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Data acquisition
import yfinance as yf

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
In [2]:
!pip install salesforce-merlion --quiet
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.0/61.0 kB 3.5 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 10.9 MB/s eta 0:00:0000:0100:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.0/18.0 MB 76.1 MB/s eta 0:00:00:00:0100:01
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
bigframes 2.26.0 requires google-cloud-bigquery-storage<3.0.0,>=2.30.0, which is not installed.
cesium 0.12.4 requires numpy<3.0,>=2.0, but you have numpy 1.26.4 which is incompatible.
google-colab 1.0.0 requires jupyter-server==2.14.0, but you have jupyter-server 2.12.5 which is incompatible.
google-colab 1.0.0 requires requests==2.32.4, but you have requests 2.32.5 which is incompatible.
dopamine-rl 4.1.2 requires gymnasium>=1.0.0, but you have gymnasium 0.29.0 which is incompatible.
jaxlib 0.7.2 requires numpy>=2.0, but you have numpy 1.26.4 which is incompatible.
thinc 8.3.6 requires numpy<3.0.0,>=2.0.0, but you have numpy 1.26.4 which is incompatible.
opencv-contrib-python 4.12.0.88 requires numpy<2.3.0,>=2; python_version >= "3.9", but you have numpy 1.26.4 which is incompatible.
opencv-python 4.12.0.88 requires numpy<2.3.0,>=2; python_version >= "3.9", but you have numpy 1.26.4 which is incompatible.
jax 0.7.2 requires numpy>=2.0, but you have numpy 1.26.4 which is incompatible.
cudf-cu12 25.6.0 requires pyarrow<20.0.0a0,>=14.0.0; platform_machine == "x86_64", but you have pyarrow 22.0.0 which is incompatible.
gradio 5.49.1 requires pydantic<2.12,>=2.0, but you have pydantic 2.12.5 which is incompatible.
bigframes 2.26.0 requires rich<14,>=12.4.4, but you have rich 14.2.0 which is incompatible.
opencv-python-headless 4.12.0.88 requires numpy<2.3.0,>=2; python_version >= "3.9", but you have numpy 1.26.4 which is incompatible.
pytensor 2.35.1 requires numpy>=2.0, but you have numpy 1.26.4 which is incompatible.
pylibcudf-cu12 25.6.0 requires pyarrow<20.0.0a0,>=14.0.0; platform_machine == "x86_64", but you have pyarrow 22.0.0 which is incompatible.
In [3]:
# Merlion
from merlion.utils import TimeSeries
from merlion.models.anomaly.isolation_forest import IsolationForest, IsolationForestConfig
from merlion.models.anomaly.vae import VAE, VAEConfig
from merlion.models.ensemble.anomaly import DetectorEnsemble, DetectorEnsembleConfig
from merlion.post_process.threshold import AggregateAlarms

sns.set_style('whitegrid')
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (15, 7.7)
plt.rcParams['font.size'] = 11
In [4]:
# Set random seed for reproducibility
np.random.seed(2570)

CONFIG = {
    'ticker': '^GSPC',
    'start_date': '2000-01-01',
    'lookforward_window': 20,
    'test_size': 400,
}
In [5]:
def download_stock_data(ticker: str, start_date: str) -> pd.DataFrame:
    try:
        print(f"Downloading data for {ticker}...")
        stock = yf.Ticker(ticker)
        hist = stock.history(period="max")
        
        if hist.empty:
            raise ValueError(f"No data retrieved for {ticker}")
        
        hist.index = hist.index.date
        hist.index = pd.to_datetime(hist.index)
        df = hist[hist.index >= start_date].copy()
        print(f"Downloaded {len(df)} records from {df.index.min()} to {df.index.max()}")
        
        return df
    
    except Exception as e:
        print(f"Error downloading data: {str(e)}")
        raise


def calculate_forward_max_return(df: pd.DataFrame, 
                                 price_col: str = 'Close', 
                                 window: int = 20) -> pd.DataFrame:
    df = df.copy()
    
    # Fixed bug: Correct forward-looking calculation
    # Calculate rolling max of next 'window' days, starting from tomorrow
    future_max = []
    for i in range(len(df)):
        if i + window >= len(df):
            future_max.append(np.nan)
        else:
            future_max.append(df[price_col].iloc[i+1:i+window+1].max())
    
    df['future_max'] = future_max
    df['target'] = 100 * (df['future_max'] - df[price_col]) / df[price_col]
    
    df = df.dropna(subset=['target'])
    
    print(f"Calculated forward {window}-day max return")
    print(f"Target range: [{df['target'].min():.2f}%, {df['target'].max():.2f}%]")
    
    return df


def create_train_test_split(df: pd.DataFrame, 
                            test_size: int,
                            target_col: str = 'target') -> tuple:
    train_df = df[[target_col]].iloc[:-test_size].copy()
    test_df = df[[target_col]].iloc[-test_size:].copy()
    
    train_data = TimeSeries.from_pd(train_df)
    test_data = TimeSeries.from_pd(test_df)
    
    print(f"\nData Split:")
    print(f"Training: {len(train_df)} samples ({train_df.index.min()} to {train_df.index.max()})")
    print(f"Testing: {len(test_df)} samples ({test_df.index.min()} to {test_df.index.max()})")
    
    return train_data, test_data, train_df, test_df
In [6]:
def train_models(train_data: TimeSeries) -> dict:
    print("\nInitializing models...")
    
    iso_forest_config = IsolationForestConfig()
    iso_forest_model = IsolationForest(iso_forest_config)
    
    vae_config = VAEConfig()
    vae_model = VAE(vae_config)
    
    en_config = DetectorEnsembleConfig(threshold=AggregateAlarms(alm_threshold=4))
    en_model = DetectorEnsemble(config=en_config, models=[iso_forest_model, vae_model])
    
    print("\nTraining models...")
    
    iso_forest_train_score = iso_forest_model.train(train_data=train_data, anomaly_labels=None)
    print("Isolation Forest trained")
    
    vae_train_score = vae_model.train(train_data=train_data, anomaly_labels=None)
    print("VAE trained")
    
    en_train_score = en_model.train(train_data=train_data, anomaly_labels=None)
    print("Ensemble trained")
    
    return {
        'iso_forest': {'model': iso_forest_model, 'train_score': iso_forest_train_score},
        'vae': {'model': vae_model, 'train_score': vae_train_score},
        'ensemble': {'model': en_model, 'train_score': en_train_score}
    }
In [7]:
def evaluate_models(models: dict, 
                    train_df: pd.DataFrame, 
                    test_data: TimeSeries,
                    test_df: pd.DataFrame) -> tuple:
    print("\nEvaluating models...")
    
    # Combine training scores - FIXED: Use index alignment instead of .values
    df_train_scores = train_df.copy()
    
    # Get training scores as DataFrames with proper indices
    iso_forest_train_df = models['iso_forest']['train_score'].to_pd()
    iso_forest_train_df.columns = ['iso_forest_score']
    
    vae_train_df = models['vae']['train_score'].to_pd()
    vae_train_df.columns = ['vae_score']
    
    ensemble_train_df = models['ensemble']['train_score'].to_pd()
    ensemble_train_df.columns = ['ensemble_score']
    
    # Align by index to avoid length mismatch
    df_train_scores = pd.concat([
        df_train_scores,
        iso_forest_train_df,
        vae_train_df,
        ensemble_train_df
    ], axis=1)
    
    # Get test predictions
    df_test_scores = test_df.copy()
    
    # Isolation Forest
    if_test_scores = models['iso_forest']['model'].get_anomaly_score(test_data)
    if_test_df = if_test_scores.to_pd()
    if_test_df.columns = ['iso_forest_score']
    
    # VAE
    vae_test_scores = models['vae']['model'].get_anomaly_score(test_data)
    vae_test_df = vae_test_scores.to_pd()
    vae_test_df.columns = ['vae_score']
    
    # Ensemble
    en_test_scores = models['ensemble']['model'].get_anomaly_score(test_data)
    en_test_df = en_test_scores.to_pd()
    en_test_df.columns = ['ensemble_score']
    
    # Align by index
    df_test_scores = pd.concat([
        df_test_scores,
        if_test_df,
        vae_test_df,
        en_test_df
    ], axis=1)
    
    print(f"Model evaluation complete - Train shape: {df_train_scores.shape}, Test shape: {df_test_scores.shape}")
    
    return df_train_scores, df_test_scores
In [8]:
def print_correlation_analysis(df_train_scores: pd.DataFrame, 
                               df_test_scores: pd.DataFrame):
    print("\n" + "="*70)
    print("CORRELATION ANALYSIS")
    print("="*70)
    
    score_cols = ['iso_forest_score', 'vae_score', 'ensemble_score']
    
    print("\nTraining Data Correlations with Target:")
    for col in score_cols:
        corr = df_train_scores['target'].corr(df_train_scores[col])
        print(f"  {col:20s}: {corr:.4f}")
    
    print("\nTesting Data Correlations with Target:")
    for col in score_cols:
        corr = df_test_scores['target'].corr(df_test_scores[col])
        print(f"  {col:20s}: {corr:.4f}")
    
    print("\nPerformance Drop (Train -> Test):")
    for col in score_cols:
        train_corr = df_train_scores['target'].corr(df_train_scores[col])
        test_corr = df_test_scores['target'].corr(df_test_scores[col])
        drop = train_corr - test_corr
        drop_pct = (drop / train_corr) * 100 if train_corr != 0 else 0
        print(f"  {col:20s}: {drop:+.4f} ({drop_pct:+.1f}%)")
In [10]:
def create_visualization(df_train_scores: pd.DataFrame,
                         df_test_scores: pd.DataFrame):
    """
    Create comprehensive visualizations using Matplotlib and Seaborn.
    """
    print(f"\nCreating visualizations...")
    
    # Set Seaborn style for better aesthetics
    import seaborn as sns
    sns.set_style("whitegrid")
    sns.set_palette("colorblind")
    
    # Create figure with subplots
    fig, axes = plt.subplots(4, 2, figsize=(16, 16))
    fig.suptitle('Merlion Anomaly Detection Results', fontsize=16, fontweight='bold')
    
    # Adjust layout to prevent overlap
    plt.subplots_adjust(hspace=0.4, wspace=0.3)
    
    # 1. Training Data: Target vs Anomaly Scores
    ax1 = axes[0, 0]
    ax1.plot(df_train_scores.index, df_train_scores['target'], 
             color='black', linewidth=2, label='Target (20-day Max Return %)')
    ax1.set_ylabel('Target Return (%)', fontsize=10)
    ax1.set_title('Training Data: Target Returns', fontsize=12, fontweight='bold')
    ax1.legend(loc='upper left')
    ax1.grid(True, alpha=0.3)
    ax1.tick_params(axis='x', rotation=45)
    
    # 2. Training Data: Anomaly Scores
    ax2 = axes[0, 1]
    ax2.plot(df_train_scores.index, df_train_scores['iso_forest_score'], 
             color='blue', linewidth=1, label='Isolation Forest')
    ax2.plot(df_train_scores.index, df_train_scores['vae_score'], 
             color='red', linewidth=1, label='VAE')
    ax2.plot(df_train_scores.index, df_train_scores['ensemble_score'], 
             color='green', linewidth=1, label='Ensemble')
    ax2.set_ylabel('Anomaly Score', fontsize=10)
    ax2.set_title('Training Data: Anomaly Scores', fontsize=12, fontweight='bold')
    ax2.legend(loc='upper left')
    ax2.grid(True, alpha=0.3)
    ax2.tick_params(axis='x', rotation=45)
    
    # 3. Testing Data: Target vs Anomaly Scores
    ax3 = axes[1, 0]
    ax3.plot(df_test_scores.index, df_test_scores['target'], 
             color='black', linewidth=2, label='Target (20-day Max Return %)')
    ax3.set_ylabel('Target Return (%)', fontsize=10)
    ax3.set_title('Testing Data: Target Returns', fontsize=12, fontweight='bold')
    ax3.legend(loc='upper left')
    ax3.grid(True, alpha=0.3)
    ax3.tick_params(axis='x', rotation=45)
    
    # 4. Testing Data: Anomaly Scores
    ax4 = axes[1, 1]
    ax4.plot(df_test_scores.index, df_test_scores['iso_forest_score'], 
             color='blue', linewidth=1, label='Isolation Forest')
    ax4.plot(df_test_scores.index, df_test_scores['vae_score'], 
             color='red', linewidth=1, label='VAE')
    ax4.plot(df_test_scores.index, df_test_scores['ensemble_score'], 
             color='green', linewidth=1, label='Ensemble')
    ax4.set_ylabel('Anomaly Score', fontsize=10)
    ax4.set_title('Testing Data: Anomaly Scores', fontsize=12, fontweight='bold')
    ax4.legend(loc='upper left')
    ax4.grid(True, alpha=0.3)
    ax4.tick_params(axis='x', rotation=45)
    
    # 5. Correlation Heatmap: Training Data
    ax5 = axes[2, 0]
    train_corr = df_train_scores[['target', 'iso_forest_score', 'vae_score', 'ensemble_score']].corr()
    sns.heatmap(train_corr, annot=True, fmt='.3f', cmap='coolwarm', 
                center=0, ax=ax5, cbar_kws={'label': 'Correlation'})
    ax5.set_title('Training Data: Correlation Matrix', fontsize=12, fontweight='bold')
    
    # 6. Correlation Heatmap: Testing Data
    ax6 = axes[2, 1]
    test_corr = df_test_scores[['target', 'iso_forest_score', 'vae_score', 'ensemble_score']].corr()
    sns.heatmap(test_corr, annot=True, fmt='.3f', cmap='coolwarm', 
                center=0, ax=ax6, cbar_kws={'label': 'Correlation'})
    ax6.set_title('Testing Data: Correlation Matrix', fontsize=12, fontweight='bold')
    
    # 7. Distribution of Anomaly Scores: Training Data
    ax7 = axes[3, 0]
    sns.kdeplot(df_train_scores['iso_forest_score'], ax=ax7, label='Isolation Forest', color='blue')
    sns.kdeplot(df_train_scores['vae_score'], ax=ax7, label='VAE', color='red')
    sns.kdeplot(df_train_scores['ensemble_score'], ax=ax7, label='Ensemble', color='green')
    ax7.set_xlabel('Anomaly Score', fontsize=10)
    ax7.set_ylabel('Density', fontsize=10)
    ax7.set_title('Training Data: Score Distributions', fontsize=12, fontweight='bold')
    ax7.legend()
    ax7.grid(True, alpha=0.3)
    
    # 8. Distribution of Anomaly Scores: Testing Data
    ax8 = axes[3, 1]
    sns.kdeplot(df_test_scores['iso_forest_score'], ax=ax8, label='Isolation Forest', color='blue')
    sns.kdeplot(df_test_scores['vae_score'], ax=ax8, label='VAE', color='red')
    sns.kdeplot(df_test_scores['ensemble_score'], ax=ax8, label='Ensemble', color='green')
    ax8.set_xlabel('Anomaly Score', fontsize=10)
    ax8.set_ylabel('Density', fontsize=10)
    ax8.set_title('Testing Data: Score Distributions', fontsize=12, fontweight='bold')
    ax8.legend()
    ax8.grid(True, alpha=0.3)
    
    plt.tight_layout()    
    plt.show()
In [11]:
print("\n" + "="*70)
print("MERLION ANOMALY DETECTION PIPELINE")
print("="*70)

df = download_stock_data(CONFIG['ticker'], CONFIG['start_date'])

df = calculate_forward_max_return(
    df, 
    price_col='Close', 
    window=CONFIG['lookforward_window']
)

train_data, test_data, train_df, test_df = create_train_test_split(
    df, 
    CONFIG['test_size']
)

models = train_models(train_data)

df_train_scores, df_test_scores = evaluate_models(
    models, 
    train_df, 
    test_data, 
    test_df
)

print_correlation_analysis(df_train_scores, df_test_scores)

create_visualization(
    df_train_scores, 
    df_test_scores
)
======================================================================
MERLION ANOMALY DETECTION PIPELINE
======================================================================
Downloading data for ^GSPC...
Downloaded 6546 records from 2000-01-03 00:00:00 to 2026-01-12 00:00:00
Calculated forward 20-day max return
Target range: [-5.27%, 28.48%]

Data Split:
Training: 6126 samples (2000-01-03 00:00:00 to 2024-05-08 00:00:00)
Testing: 400 samples (2024-05-09 00:00:00 to 2025-12-11 00:00:00)

Initializing models...

Training models...
Isolation Forest trained
 |========================================| 100.0% Complete, Loss 0.0010
VAE trained
 |========================================| 100.0% Complete, Loss 0.0010
Ensemble trained

Evaluating models...
Model evaluation complete - Train shape: (6126, 4), Test shape: (400, 4)

======================================================================
CORRELATION ANALYSIS
======================================================================

Training Data Correlations with Target:
  iso_forest_score    : 0.3515
  vae_score           : 0.5082
  ensemble_score      : 0.4330

Testing Data Correlations with Target:
  iso_forest_score    : 0.3874
  vae_score           : 0.5242
  ensemble_score      : 0.4530

Performance Drop (Train -> Test):
  iso_forest_score    : -0.0359 (-10.2%)
  vae_score           : -0.0160 (-3.2%)
  ensemble_score      : -0.0199 (-4.6%)

Creating visualizations...
No description has been provided for this image