Anomaly Detection: Isolation Forest, VAE, Ensemble

Applies Salesforce Merlion to generate anomaly scores on a market-derived time series using three unsupervised detectors:

The goal is not labeled anomaly detection; instead, this analysis test whether anomaly scores carry information about forward upside potential.

Data & target

For each date \(t\) with close \(C_t\):

\[target_t = 100 \times \frac{\max(C_{t+1}, \dots, C_{t+20}) - C_t}{C_t}\]

Implementation detail: the max is computed over \([t+1, t+20]\) (starting tomorrow) to avoid look-ahead leakage.

IsolationForest(IsolationForestConfig())
VAE(VAEConfig())
DetectorEnsemble(...) over both detectors, using AggregateAlarms(alm_threshold=4)

This notebook treats the anomaly score as a candidate signal and reports:

VAE is the strongest single model in this run: it shows the highest correlation with the forward \(20\)-day max return target on both the training and testing splits (relative to Isolation Forest and the ensemble).
Scale matters when comparing anomaly scores: VAE produces consistently larger-magnitude anomaly scores than the other methods. This likely reflects different score calibration/normalization across detectors rather than “more anomalies” in an absolute sense.
Practical implication: if you want a single unsupervised score that best tracks the forward-move proxy used here, VAE is the best candidate from these three. For downstream use, consider standardizing scores (e.g., z-score by a rolling window) before thresholding or combining with other signals.