Time-Series Anomaly Detection in 2026: From Classical Methods to Foundation Models

Last updated: July 19, 2026

By kongastral

Published April 3, 2026 · Updated July 19, 2026 · 23 min read

Summary

What this post covers: The full landscape of time-series anomaly detection in 2026, from classical statistical methods through transformer architectures to zero-shot foundation models like TimesFM, Chronos, and MOMENT, with practical guidance on choosing the right model.

Key insights:

Time-series anomaly detection is uniquely hard because “anomalous” is context-dependent, labels are scarce (often less than 0.01% of data), normal behavior drifts over time, and the most dangerous anomalies often manifest only as subtle multivariate correlations.
Foundation models pre-trained on 100B+ time points (TimesFM, Chronos) deliver competitive zero-shot anomaly detection without any per-dataset training, collapsing time-to-deployment from weeks to hours.
Classical methods (Isolation Forest, Matrix Profile, seasonal decomposition) remain surprisingly competitive and should always be benchmarked as baselines before reaching for deep learning.
Different anomaly types (point, contextual, collective, trend, shapelet) require different model architectures, no single model wins across all five categories.
The field is now shifting from detection alone toward integrated detect-explain-remediate systems combining LLMs, multimodal foundation models, and edge deployment of distilled detectors.

Main topics: Why Time-Series Anomaly Detection Is Harder Than Often Assumed, A Taxonomy of Time-Series Anomalies, Classical Approaches: Where It All Started, The Deep Learning Revolution in Anomaly Detection, Transformer-Based Models: The Current Best, Foundation Models for Time Series: The 2025-2026 Frontier, Benchmarks and Real-World Performance, Practical Guide: Choosing the Right Model for the Problem, Implementation: Building an Anomaly Detection Pipeline, Where the Field Is Heading, References.

On 19 July 2024, a faulty content update from CrowdStrike caused 8.5 million Windows machines to crash simultaneously, producing the largest IT outage in history. Airlines grounded flights, hospitals postponed surgeries, and banks froze transactions. The total economic damage exceeded 10 billion USD. The root cause was a single faulty configuration file pushed to production. An anomaly detection system monitoring the deployment’s telemetry—CPU spikes, crash rates, memory patterns—could have flagged the cascading failure within seconds and triggered an automatic rollback before more than 0.1% of those machines were affected.

The benefit is not hypothetical. Companies such as Netflix, Uber, and Meta operate real-time anomaly detection systems that identify precisely these patterns: sudden deviations in request latency, error rates, transaction volumes, or system metrics indicating that a problem has arisen before users notice it. The difference between detection in 30 seconds and detection in 30 minutes can be the difference between a minor incident and a high-profile failure.

Time-series anomaly detection—the task of identifying unusual patterns in sequential, timestamped data—has undergone substantial transformation over the past three years. Classical statistical methods that served practitioners for decades are now being augmented, and in some cases replaced, by deep learning architectures, transformer-based models, and, most recently, pre-trained foundation models that can detect anomalies in time series they have never encountered before, without any task-specific training. The pace of innovation has been notable, and the gap between research results and production performance is narrowing rapidly.

This guide surveys the full landscape, from classical approaches that remain surprisingly competitive, through the deep learning developments of 2020 to 2024, to the foundation model frontier of 2025 and 2026. For practitioners building anomaly detection for infrastructure monitoring, financial fraud detection, predictive maintenance, or healthcare, understanding these models—their strengths, limitations, and practical trade-offs—is essential.

Why Time-Series Anomaly Detection Is Harder Than Often Assumed

Detecting anomalies in tabular data is relatively straightforward: a transaction of 50,000 USD when the customer’s average is 200 USD is clearly unusual. Time-series anomaly detection is fundamentally harder because the definition of “unusual” depends on temporal context: patterns that are normal at one time may be anomalous at another.

Consider server CPU usage. A spike to 95% utilisation at 3 AM may be entirely normal—it is when the batch processing job runs. The same spike at 3 PM, when only light API traffic is expected, may indicate a runaway process or a denial-of-service attack. A gradual drift from a 40% baseline to 60% over six weeks may indicate a memory leak that will eventually cause a crash. Each of these requires the detection system to understand not only the current value but also its relationship to seasonal patterns, trends, and the broader temporal context.

The challenges fall into several categories:

Rarity of labelled anomalies. In most real-world datasets, anomalies represent less than 1% of observations and often less than 0.01%. Supervised learning approaches struggle because the classes are so imbalanced. Most current best methods therefore operate in unsupervised or semi-supervised settings, learning the structure of normal behaviour and flagging deviations.

Concept drift. The definition of “normal” changes over time. A system that learned normal patterns from January data may flag entirely healthy February patterns as anomalous if the business has grown, the user base has shifted, or the infrastructure has been upgraded. Models must adapt to evolving baselines without losing sensitivity to genuine anomalies.

Multivariate dependencies. Modern systems generate hundreds or thousands of metrics simultaneously. An anomaly may not be visible in any single metric—CPU appears normal, memory appears normal, disk I/O appears normal—yet the simultaneous combination of all three at slightly elevated levels indicates an emerging problem. Capturing these inter-metric correlations is where deep learning approaches surpass classical univariate methods.

Key Takeaway: Time-series anomaly detection is difficult because “anomalous” is context-dependent, labelled data is scarce, normal behaviour evolves, and the most consequential anomalies often manifest only as subtle correlations across multiple variables. Models that handle all four challenges simultaneously are rare, which accounts for the continued rapid advancement of the field.

A Taxonomy of Time-Series Anomalies

Before selecting a model, a practitioner must identify the type of anomaly under consideration. Different model architectures perform differently across anomaly types:

Anomaly Type	Description	Example	Best Detection Approach
Point anomaly	A single observation far from expected	Sudden CPU spike to 100%	Statistical thresholds, Isolation Forest
Contextual anomaly	Normal value in wrong context	High traffic at 4 AM (normally low)	Seasonal decomposition, LSTM, Transformer
Collective anomaly	A sequence of observations anomalous together	Sustained elevated error rate for 10 minutes	Sliding-window models, sequence-to-sequence
Trend anomaly	Gradual shift from expected trajectory	Memory usage growing 2% weekly (leak)	Change-point detection, trend decomposition
Shapelet anomaly	Unusual pattern shape in a subsequence	Abnormal ECG waveform morphology	Matrix Profile, deep autoencoders

Classical Approaches: Where It All Started

Before deep learning, time-series anomaly detection relied on statistical methods that remain relevant and surprisingly competitive for many use cases. Understanding these foundations is essential: they serve as baselines, they are interpretable, and they run efficiently without GPU infrastructure.

Statistical and Decomposition Methods

STL Decomposition with Residual Thresholding. Seasonal-Trend decomposition using LOESS (STL) separates a time series into trend, seasonal, and residual components. Anomalies are identified by flagging residuals that exceed a threshold (typically three standard deviations). The method is simple, interpretable, and handles seasonality well, which makes it well suited to business metrics such as daily active users or hourly revenue.

ARIMA-based Detection. AutoRegressive Integrated Moving Average models forecast the next value based on historical patterns. Observations that deviate significantly from the forecast are flagged. ARIMA performs well for stationary series with clear autoregressive structure but struggles with complex multi-seasonal patterns or nonlinear dynamics.

Exponential Smoothing State Space Models (ETS). Similar in spirit to ARIMA but using exponential weighting of past observations. The Holt-Winters variant handles both trend and seasonality and remains a standard tool in production monitoring systems.

Isolation Forest and Tree-Based Methods

Isolation Forest (Liu et al., 2008) takes a distinctly different approach. Instead of building a model of normal behaviour and looking for deviations, it directly identifies anomalies by measuring how easy they are to isolate. Anomalous points, being different from the majority, require fewer random partitions to separate from the rest of the data. Isolation Forest is fast, scales well to high-dimensional data, and handles multivariate anomaly detection naturally.

from sklearn.ensemble import IsolationForest
import numpy as np
import pandas as pd

# Create windowed features from raw time series
def create_features(series, window=24):
    features = []
    for i in range(window, len(series)):
        window_data = series[i-window:i]
        features.append({
            'mean': np.mean(window_data),
            'std': np.std(window_data),
            'min': np.min(window_data),
            'max': np.max(window_data),
            'last': window_data[-1],
            'trend': np.polyfit(range(window), window_data, 1)[0]
        })
    return pd.DataFrame(features)

# Fit Isolation Forest
features = create_features(cpu_usage_series, window=24)
model = IsolationForest(contamination=0.01, random_state=42)
predictions = model.fit_predict(features)
# -1 = anomaly, 1 = normal

Matrix Profile: Subsequence Analysis

Matrix Profile (Yeh et al., 2016) computes the distance between every subsequence in a time series and its nearest neighbour, producing a profile of how distinctive each subsequence is. Subsequences with high matrix profile values—those whose nearest neighbour lies unusually far away—are anomalous. Matrix Profile is particularly effective at detecting shapelet anomalies (unusual pattern shapes) and is computationally efficient thanks to the STOMP algorithm, which computes the full matrix profile in O(n² log n) time.

The Python library stumpy provides production-grade Matrix Profile implementations and remains one of the more underused tools in the anomaly detection practitioner’s repertoire.

The Deep Learning Revolution in Anomaly Detection

From approximately 2019 onward, deep learning models began consistently outperforming classical methods on complex, multivariate anomaly detection benchmarks. The central insight is that deep neural networks can learn nonlinear temporal patterns that are invisible to linear statistical models.

LSTM Autoencoders: The First Deep Success

The LSTM Autoencoder architecture, consisting of an encoder that compresses a time-series window into a latent representation followed by a decoder that reconstructs the original window, became the first widely adopted deep learning approach for time-series anomaly detection. The model learns to reconstruct normal patterns during training. At inference, windows with high reconstruction error are flagged as anomalous, since the model has not learned to reconstruct those patterns.

LSTM Autoencoders handle temporal dependencies (the LSTM component) and learn expected patterns (the autoencoder objective) simultaneously. They were the standard deep approach from approximately 2019 to 2022 and remain effective for many applications.

import torch
import torch.nn as nn

class LSTMAutoencoder(nn.Module):
    def __init__(self, n_features, hidden_size=64, n_layers=2):
        super().__init__()
        self.encoder = nn.LSTM(
            n_features, hidden_size, n_layers, batch_first=True
        )
        self.decoder = nn.LSTM(
            hidden_size, hidden_size, n_layers, batch_first=True
        )
        self.output_layer = nn.Linear(hidden_size, n_features)

    def forward(self, x):
        # Encode: compress the sequence
        _, (hidden, cell) = self.encoder(x)

        # Decode: reconstruct the sequence
        seq_len = x.size(1)
        decoder_input = hidden[-1].unsqueeze(1).repeat(1, seq_len, 1)
        decoder_out, _ = self.decoder(decoder_input)
        reconstruction = self.output_layer(decoder_out)

        return reconstruction

# Anomaly score = reconstruction error (MSE per window)
# High reconstruction error → anomaly

GDN and GNN-Based Methods: Modelling Inter-Metric Relationships

Graph Deviation Network (GDN) (Deng and Hooi, 2021) introduced an elegant solution for multivariate anomaly detection: model the relationships between sensors and metrics as a graph, in which each node is a time series and edges represent learned dependencies. When a metric deviates from what the graph structure predicts based on its neighbours’ values, it is flagged as anomalous.

GDN’s principal advantage is its ability to identify anomalies that are not visible in individual metrics but manifest as broken inter-metric correlations. For example, in a server cluster, CPU and memory usage typically correlate. If CPU spikes while memory does not, or vice versa, GDN detects the correlation violation, even when both values lie individually within normal ranges.

USAD: Unsupervised Anomaly Detection

USAD (Audibert et al., 2020) combines autoencoders with adversarial training. Two decoder networks compete: one reconstructs the input from the latent space, while the other attempts to reconstruct the first decoder’s output. This adversarial scheme requires the autoencoders to learn sharper boundaries between normal and anomalous patterns, significantly improving detection accuracy relative to standard autoencoders. USAD is fast to train, performs well on multivariate data, and has become a popular baseline in academic benchmarks.

Transformer-Based Models: The Current Best

The transformer architecture, originally designed for natural language processing, has proven highly effective for time-series analysis. Its self-attention mechanism captures long-range dependencies in sequences without the vanishing gradient problems that limit RNNs and LSTMs. Several transformer-based models have set new state-of-the-art results on anomaly detection benchmarks.

Anomaly Transformer (ICLR 2022)

Anomaly Transformer (Xu et al., 2022) introduced a central insight: in normal time-series data, each point’s attention pattern should focus on adjacent points (the “prior-association”) and on semantically similar points elsewhere in the series (the “series-association”). These two association patterns align for normal data but diverge for anomalies. Anomaly Transformer introduces an Association Discrepancy metric that measures this divergence, providing a principled anomaly score.

The model achieved leading results on six benchmark datasets at the time of publication and remains among the strongest methods for unsupervised multivariate anomaly detection. Its principal contribution—using attention-pattern discrepancy rather than reconstruction error as the anomaly score—represents a conceptual advance over prior autoencoder-based approaches.

DCdetector: Dual-Attention Contrastive Learning (ICML 2023)

DCdetector (Yang et al., 2023) builds on the association discrepancy idea with a contrastive learning framework. It creates two representations of each time step, one from a “patch-wise” attention view and one from a “channel-wise” attention view, and uses contrastive learning to maximise agreement for normal patterns and divergence for anomalies. DCdetector achieved new state-of-the-art results on multiple benchmarks, improving on Anomaly Transformer’s F1 scores by 2 to 5 points on several datasets.

TimesNet: From Temporal to Spatial (ICLR 2023)

TimesNet (Wu et al., 2023) takes a creative approach: it transforms 1D time-series data into 2D representations by reshaping each period (daily, weekly, and so on) into a 2D image-like tensor, and then applies 2D convolutional neural networks to capture both intra-period and inter-period patterns simultaneously. This transformation allows TimesNet to use the feature extraction capabilities of CNNs, originally developed for computer vision, on temporal data.

TimesNet is a general-purpose time-series model (it handles forecasting, classification, and anomaly detection), and its multi-task capability makes it a strong choice for teams that require a single architecture for multiple analytical needs.

Model	Year	Core Idea	Strengths	Limitations
LSTM Autoencoder	2019	Reconstruct normal patterns	Simple, well-understood	Limited long-range context
GDN	2021	Graph-based inter-metric modeling	Catches correlation anomalies	Complex graph construction
Anomaly Transformer	2022	Attention association discrepancy	Strong benchmark results	Computationally expensive
TimesNet	2023	1D→2D transformation + CNN	Multi-task capable	Assumes periodic structure
DCdetector	2023	Dual-attention contrastive learning	SOTA on multiple benchmarks	Requires careful tuning

Foundation Models for Time Series: The 2025-2026 Frontier

The most consequential development in time-series analysis over the past two years has been the emergence of foundation models—large, pre-trained models capable of performing time-series tasks, including anomaly detection, on data they have never previously seen, without task-specific training. This represents the same paradigm shift that GPT introduced to language and CLIP introduced to vision: train once on substantial diverse data, then apply to arbitrary downstream tasks via fine-tuning or zero-shot inference.

TimesFM (Google, 2024)

TimesFM (Time Series Foundation Model), developed by Google Research, was pre-trained on approximately 100 billion time points from diverse sources, including financial markets, weather stations, energy consumption, web traffic, and synthetic data. At 200 million parameters, TimesFM is designed as a decoder-only transformer that generates point forecasts. Anomaly detection is achieved by flagging observations that deviate significantly from the model’s zero-shot forecast.

TimesFM’s notable property is that it produces competitive forecasts, and therefore competitive anomaly detection, without exposure to the user’s specific data during training. A practitioner provides a time series, the model generates a forecast based on patterns learned from 100 billion diverse time points, and the actuals are compared against the forecasts. This zero-shot capability removes the need for per-dataset model training and substantially reduces time-to-deployment for new monitoring use cases.

Chronos (Amazon, 2024)

Chronos (Ansari et al., 2024), from Amazon, takes an innovative approach: it tokenises time-series values into discrete bins (analogous to how language models tokenise words) and then applies a standard language model architecture (T5) to the tokenised sequence. This allows Chronos to use production-proven language model architectures and training procedures for time-series tasks.

Chronos offers multiple model sizes (Mini: 20M, Small: 46M, Base: 200M, Large: 710M parameters) and performs well in zero-shot evaluations. For anomaly detection, the approach is forecast-based: Chronos generates probabilistic forecasts, and observations falling outside the prediction intervals are flagged as anomalous.

import torch
from chronos import ChronosPipeline

# Load pre-trained Chronos model
pipeline = ChronosPipeline.from_pretrained(
    "amazon/chronos-t5-base",
    device_map="auto",
    torch_dtype=torch.float32,
)

# Generate probabilistic forecast (zero-shot — no training needed)
context = torch.tensor(historical_data)  # Your time series
forecast = pipeline.predict(
    context,
    prediction_length=24,  # Forecast next 24 steps
    num_samples=100,       # Generate 100 forecast samples
)

# Anomaly detection via prediction intervals
median_forecast = forecast.median(dim=1).values
lower_bound = forecast.quantile(0.025, dim=1).values  # 2.5th percentile
upper_bound = forecast.quantile(0.975, dim=1).values   # 97.5th percentile

# Points outside the 95% prediction interval are anomalies
anomalies = (actual_values < lower_bound) | (actual_values > upper_bound)

MOMENT (CMU, 2024)

MOMENT (Goswami et al., 2024)—Multi-task Open-source pre-trained Model for Every Time series—is a family of models specifically designed for multiple time-series tasks, including anomaly detection, classification, forecasting, and imputation. Unlike TimesFM and Chronos, which approach anomaly detection indirectly through forecasting, MOMENT is explicitly trained with an anomaly detection objective during pre-training.

MOMENT uses a masked reconstruction objective. During pre-training, random patches of the time series are masked, and the model learns to reconstruct them. For anomaly detection, the reconstruction error at each time step serves as the anomaly score. Observations that the model finds difficult to reconstruct from context—because they deviate from patterns learned across its substantial pre-training dataset—receive high anomaly scores.

MOMENT is open source, available on Hugging Face, and supports fine-tuning for domain-specific applications. Its anomaly detection performance is competitive with specialised models trained on the target dataset, despite MOMENT requiring no task-specific training.

Timer and TimeGPT: Commercial and Research Alternatives

TimeGPT (Nixtla, 2024) is a commercially available foundation model with an API-based interface. Users send time-series data to the API and receive forecasts and anomaly scores without managing any model infrastructure. TimeGPT is attractive for teams that wish to access foundation model capabilities without the complexity of model deployment, though it requires sending data to an external service, which is unacceptable for sensitive applications.

Timer (Liu et al., 2024), from Tsinghua University, is a generative pre-trained transformer for time series that unifies multiple analytical tasks. It uses an autoregressive next-token prediction objective (analogous to GPT) on tokenised time-series data, and can perform anomaly detection, forecasting, and imputation in a single framework.

Foundation Model	Origin	Parameters	Open Source	Anomaly Approach	Key Advantage
TimesFM	Google	200M	Yes	Forecast-based	substantial pre-training data (100B points)
Chronos	Amazon	20M-710M	Yes	Probabilistic forecast	Multiple sizes, LLM architecture
MOMENT	CMU	40M-385M	Yes	Masked reconstruction	Explicit anomaly detection objective
TimeGPT	Nixtla	Undisclosed	No (API)	Forecast-based	Zero infrastructure, API-ready
Timer	Tsinghua	67M	Yes	Autoregressive	GPT-style unified framework

Tip: Foundation models perform particularly well when anomaly detection must be deployed quickly on new, unseen time series without first collecting training data. If abundant historical data with labelled anomalies is available for the relevant domain, a fine-tuned specialised model (such as Anomaly Transformer or DCdetector) may still outperform zero-shot foundation models. The appropriate choice depends on whether the principal constraint is labelled-data availability or model performance ceiling.

Benchmarks and Real-World Performance

The academic community evaluates anomaly detection models on several standard benchmark datasets. Understanding these benchmarks, and their limitations, helps calibrate expectations for real-world performance.

Dataset	Domain	Dimensions	Anomaly %	Key Challenge
SMD	Server Machines	38	~4.2%	Multi-entity, diverse patterns
MSL	NASA Spacecraft	55	~10.7%	Telemetry with complex physics
SMAP	NASA Soil Moisture	25	~13.1%	Sensor noise, gradual drifts
SWaT	Water Treatment Plant	51	~12.1%	Cyber-physical attacks, subtle
PSM	eBay Server Metrics	25	~27.8%	High anomaly rate, noisy labels

Caution: A 2022 paper by Kim et al. (“Towards a Rigorous Evaluation of Time-Series Anomaly Detection”) demonstrated that many published benchmark results are inflated by methodology issues, particularly the use of point-adjust (PA) metrics that credit models for detecting any point within an anomaly segment, even when the detection is delayed. Under stricter metrics, the performance gap between methods narrows considerably, and some classical methods perform comparably with deep models. Models should always be evaluated on the practitioner’s own data using metrics that reflect operational requirements, including detection latency and the false positive rate at a target recall.

Practical Guide: Choosing the Right Model for the Problem

With so many available models, selection can be challenging. The following decision framework draws on real-world constraints:

Decision Framework

Is labelled anomaly data available?

Yes (100 or more labelled anomalies): Fine-tune a supervised or semi-supervised model. Consider fine-tuning MOMENT or training DCdetector with the labels guiding threshold selection.
No: Use unsupervised methods. Proceed to the next question.

Is the deployment new, with no historical training data?

Yes: Use a foundation model (Chronos, TimesFM, or MOMENT) in zero-shot mode. Competitive detection is available immediately without training.
No (ample historical data): Train a specialised model for best performance. Proceed to the next question.

Is the problem univariate or multivariate?

Univariate (single metric): STL decomposition with thresholding is difficult to beat for simplicity and interpretability. For higher accuracy, use Matrix Profile or an LSTM autoencoder.
Multivariate (many correlated metrics): Use Anomaly Transformer, DCdetector, or GDN to capture inter-metric correlations.

What are the latency requirements?

Real time (sub-second): Avoid transformer models at inference. Use Isolation Forest, streaming Matrix Profile (via STUMPY), or lightweight LSTM models.
Near real time (seconds to minutes): Any model is feasible with appropriate infrastructure.
Batch (hourly or daily): Prioritise accuracy over speed. Use the most capable model available.

Implementation: Building an Anomaly Detection Pipeline

A production anomaly detection system involves more than the model alone. The full pipeline architecture is as follows:

# Complete anomaly detection pipeline with Chronos
import torch
import numpy as np
from chronos import ChronosPipeline
from dataclasses import dataclass
from typing import Optional

@dataclass
class AnomalyResult:
    timestamp: str
    value: float
    expected: float
    lower_bound: float
    upper_bound: float
    anomaly_score: float
    is_anomaly: bool

class TimeSeriesAnomalyDetector:
    def __init__(
        self,
        model_name: str = "amazon/chronos-t5-small",
        context_length: int = 512,
        prediction_length: int = 1,
        confidence_level: float = 0.95,
    ):
        self.pipeline = ChronosPipeline.from_pretrained(
            model_name,
            device_map="auto",
            torch_dtype=torch.float32,
        )
        self.context_length = context_length
        self.prediction_length = prediction_length
        self.alpha = 1 - confidence_level

    def detect(
        self,
        history: np.ndarray,
        actual_value: float,
        timestamp: str,
    ) -> AnomalyResult:
        """Detect if actual_value is anomalous given history."""
        # Use last context_length points
        context = torch.tensor(
            history[-self.context_length:]
        ).unsqueeze(0).float()

        # Generate probabilistic forecast
        forecast = self.pipeline.predict(
            context,
            prediction_length=self.prediction_length,
            num_samples=200,
        )

        # Extract prediction intervals
        median = forecast.median(dim=1).values[0, 0].item()
        lower = forecast.quantile(
            self.alpha / 2, dim=1
        ).values[0, 0].item()
        upper = forecast.quantile(
            1 - self.alpha / 2, dim=1
        ).values[0, 0].item()

        # Calculate anomaly score (normalized deviation)
        interval_width = upper - lower
        if interval_width > 0:
            score = abs(actual_value - median) / interval_width
        else:
            score = abs(actual_value - median)

        is_anomaly = actual_value < lower or actual_value > upper

        return AnomalyResult(
            timestamp=timestamp,
            value=actual_value,
            expected=median,
            lower_bound=lower,
            upper_bound=upper,
            anomaly_score=score,
            is_anomaly=is_anomaly,
        )

# Usage
detector = TimeSeriesAnomalyDetector()
result = detector.detect(
    history=cpu_usage_last_7_days,
    actual_value=current_cpu_reading,
    timestamp="2026-04-03T08:15:00Z",
)

if result.is_anomaly:
    print(f"ANOMALY at {result.timestamp}: "
          f"value={result.value:.1f}, "
          f"expected={result.expected:.1f} "
          f"[{result.lower_bound:.1f}, {result.upper_bound:.1f}]")

Pipeline components beyond the model itself include:

Data preprocessing. Handle missing values (forward-fill or interpolation), normalise scales across metrics, and align timestamps across data sources.
Threshold calibration. Use a validation period of known-normal data to calibrate anomaly thresholds. A threshold set too low produces a flood of false positives; one set too high misses real incidents.
Suppression and deduplication. A single incident may trigger dozens of anomaly alerts across correlated metrics. Group alerts by time window and root cause to avoid alert fatigue.
Feedback loop. Operators who acknowledge or dismiss alerts provide implicit labels. This data should be fed back into the model as a fine-tuning signal to improve detection over time.
Seasonal awareness. Explicitly model known business cycles (daily patterns, weekend effects, holiday traffic shifts) to reduce false positives during expected but unusual periods.

Where the Field Is Heading

Time-series anomaly detection is at an inflection point. The convergence of foundation models, transformer architectures, and practical tooling is making it possible to deploy sophisticated anomaly detection systems with substantially less effort than was the case even two years ago. Whereas a 2022 deployment required collecting domain-specific training data, training a specialised model, and calibrating thresholds through iterative experimentation, a 2026 deployment can begin with a zero-shot foundation model that delivers competitive performance from day one and improves with domain-specific fine-tuning.

Several trends will shape the next two to three years:

Multimodal foundation models that jointly reason over time-series metrics, log messages, and trace data are emerging from research laboratories. An anomaly detection system that can correlate a latency spike with a specific error message in the application logs and a deployment event in the change management system would substantially reduce mean time to diagnosis, not merely detection.

LLM-augmented anomaly explanation represents a further frontier. Current systems indicate that something is anomalous but rarely explain why. Integrating LLMs that can explain anomaly detections in natural language (“CPU spiked to 95% at 3:14 PM, coinciding with a deployment of version 2.4.1 to the payment service; the historical pattern suggests a connection between this deployment and similar spikes”) would close the gap between detection and remediation.

Edge deployment of lightweight anomaly detection models is becoming practical as foundation model distillation techniques improve. Running a compact anomaly detector directly on IoT devices, industrial sensors, or network routers, without round-tripping data to a cloud service, enables real-time detection with lower latency and improved data privacy.

The field has moved from the question “can anomalies be detected automatically?” (yes, reliably, since the late 2010s) to “can anomalies be detected without per-dataset training?” (yes, with foundation models, since 2024). The current frontier is whether anomalies can be detected, explained, and accompanied by suggested remediation, all in real time. That question is being actively answered, and the pace of progress suggests it will not remain open for long.

References

Xu, Jiehui, et al. “Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy.” ICLR 2022.
Yang, Yiyuan, et al. “DCdetector: Dual Attention Contrastive Representation Learning for Time Series Anomaly Detection.” ICML 2023.
Wu, Haixu, et al. “TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis.” ICLR 2023.
Ansari, Abdul Fatir, et al. “Chronos: Learning the Language of Time Series.” arXiv:2403.07815, 2024.
Das, Abhimanyu, et al. “A Decoder-Only Foundation Model for Time-Series Forecasting.” (TimesFM) ICML 2024.
Goswami, Mononito, et al. “MOMENT: A Family of Open Time-Series Foundation Models.” ICML 2024.
Deng, Ailin, and Bryan Hooi. “Graph Neural Network-Based Anomaly Detection in Multivariate Time Series.” AAAI 2021.
Audibert, Julien, et al. “USAD: UnSupervised Anomaly Detection on Multivariate Time Series.” KDD 2020.
Kim, Siwon, et al. “Towards a Rigorous Evaluation of Time-Series Anomaly Detection.” AAAI 2022.
Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou. “Isolation Forest.” ICDM 2008.
Yeh, Chin-Chia Michael, et al. “Matrix Profile I: All Pairs Similarity Joins for Time Series.” ICDM 2016.
Time-Series-Library (THU)—Unified framework for time-series models including anomaly detection
Amazon Chronos GitHub Repository
MOMENT GitHub Repository

AI/MLDomain Adaptation for Time-Series Anomaly Detection: Complete Implementation Guide with Full Training Scripts AI/MLGraph Attention Networks (GAT) Explained: A Complete Guide AI/MLTime-Series Forecasting in 2026: From ARIMA to Foundation Models — A Complete Guide