Author: kongastral

How to Transfer Data from InfluxDB to AWS Iceberg Using Telegraf: A Complete Data Pipeline Guide

Summary

What this post covers: A production-ready guide to building a data pipeline that moves time-series data from InfluxDB into Apache Iceberg tables on AWS S3 using Telegraf, AWS Glue, and Athena, with a complete reference telegraf.conf, automation, monitoring, performance tuning, cost analysis, and an alternative Kafka+Spark path.

Key insights:

Telegraf is dramatically cheaper than rolling a custom ETL: 300+ plugins let you read from InfluxDB, transform records, and land partitioned files on S3 with zero application code, which is what makes the Iceberg migration economically viable.
The right landing-zone schema is Hive-partitioned (year=/month=/day=/) Parquet—not JSON—so that AWS Glue crawlers and Athena partition-pruning queries cost a fraction of what they would on JSON.
Iceberg’s ACID semantics, time travel, and schema evolution mean you can backfill, fix bad data, and add columns without rewriting historical files—capabilities that pure-S3 or pure-InfluxDB storage cannot match.
For high-throughput pipelines (>100k events/sec), swap the direct Telegraf→S3 path for Telegraf→Kafka→Spark Structured Streaming→Iceberg; the article includes the exact configuration and the throughput breakpoint where this matters.
Total cost on S3+Glue+Athena is typically 70-90% lower than running InfluxDB Cloud at terabyte scale, with the trade-off being slightly higher query latency for recent data—addressable with a hot/cold tiering strategy.

Main topics: Introduction, Architecture Overview, Understanding the Components, Prerequisites and Setup, Configure Telegraf to Read from InfluxDB, Transform Data with Telegraf Processors, Output to S3 (Landing Zone), Create the Iceberg Table in AWS Glue, Automate the Iceberg Ingestion, Complete End-to-End telegraf.conf, Querying Iceberg Data with Athena, Alternative Pipeline: InfluxDB to Telegraf to Kafka to Spark to Iceberg, Monitoring and Troubleshooting, Performance Optimization, Cost Analysis.

Introduction

A familiar scenario unfolds at thousands of organisations each year: an engineering team begins collecting time-series data with InfluxDB, perhaps IoT sensor readings from a factory floor, server CPU and memory metrics from a Kubernetes cluster, or application telemetry from a fleet of microservices. At inception, InfluxDB is the appropriate fit—offering fast writes, efficient compression, and purpose-built queries for time-stamped data. The dataset, however, has now grown to terabytes. The InfluxDB Cloud bill is rising. The data science team wishes to run SQL joins between the time-series data and business data in the warehouse. Machine learning engineers require historical metrics in Parquet format to train anomaly-detection models. The compliance team is enquiring about data governance, schema evolution, and audit trails.

A lakehouse is required. For readers who have not yet evaluated their storage options, the comparison of databases for preprocessed time-series data may assist in determining whether a lakehouse is the appropriate choice. Specifically, Apache Iceberg on AWS is the open table format that provides ACID transactions, time travel, schema evolution, and partition evolution on top of inexpensive S3 storage. The remaining question is how to transfer data from InfluxDB into Iceberg efficiently, reliably, and without substantial custom code.

The answer is Telegraf, InfluxData’s open-source agent originally built to collect and ship metrics but now evolved into a remarkably versatile data-pipeline tool with more than three hundred plugins. Telegraf can read from InfluxDB, transform the data on the fly, and land it on S3 in formats that AWS Glue can crawl and convert into Iceberg tables.

This guide constructs the complete pipeline from scratch. Every configuration file is production-ready, and every SQL statement has been tested. By the end, readers will have a fully operational data pipeline that transfers time-series data from InfluxDB into queryable Iceberg tables on AWS, with sufficient understanding of each component to customise the system for individual use cases.

Architecture Overview

Before configuration begins, the full data flow should be understood. The pipeline moves data through five distinct stages:

InfluxDB → Telegraf (Input Plugin) → Telegraf (Processors) → Telegraf (S3 Output) → AWS Glue Crawler/ETL → Iceberg Table on S3 → Athena/Spark Queries

In more detail:

InfluxDB holds the raw time-series data in its native line protocol format, organised by measurements, tags, and fields.
Telegraf Input reads data from InfluxDB using either pull-based Flux queries or push-based listener endpoints.
Telegraf Processors transform the data: renaming fields, converting types, extracting date partitions, and flattening the InfluxDB tag/field model into a columnar schema suitable for Iceberg. When the data include sensor metadata alongside measurements, the guide on managing metadata for time-series sensor signals describes how to preserve that context through the migration.
Telegraf S3 Output writes the transformed data as JSON or CSV files into an S3 landing zone, organised with Hive-style partitioning (year=2026/month=04/day=03/).
AWS Glue crawls the landing zone, discovers the schema, and either creates or updates an Iceberg table in the Glue Data Catalog.
Athena or Spark queries the Iceberg table using standard SQL, with full support for time travel, partition pruning, and schema evolution.

Rationale for the Architecture

The combination of Telegraf and Iceberg addresses four important needs simultaneously:

Cost reduction: S3 storage costs approximately $0.023 per GB per month, compared with InfluxDB Cloud at $0.002 per MB per month (equivalent to $2 per GB per month). For 10TB of data, the difference is between $230 and $20,000 per month.
SQL analytics: Iceberg tables are queryable with standard SQL via Athena, Spark, Trino, and Presto; neither Flux nor InfluxQL is required.
ML pipelines: Data scientists can read Iceberg tables directly as Parquet files for model training, or query them through Spark DataFrames. This facilitates feeding historical data into time-series forecasting models without querying InfluxDB directly.
Data governance: Iceberg provides ACID transactions, schema evolution, and time travel—features that InfluxDB was never designed to offer. When events must be streamed from Kafka into this pipeline, the Apache Kafka multivariate time-series engine guide covers the producer side of this architecture.

Architecture Comparison

Approach	Complexity	Real-Time?	Schema Transformation	Maintenance
Direct InfluxDB Export (CSV/LP)	Low	No (batch only)	None (manual post-processing)	High (scripting)
Telegraf Pipeline (this guide)	Medium	Near real-time	Built-in processors	Low (declarative config)
Custom ETL (Python/Go)	High	Yes (configurable)	Unlimited flexibility	High (code ownership)
Kafka Connect	High	Yes (streaming)	SMTs + custom connectors	Medium (cluster ops)

Key Takeaway: The Telegraf-based pipeline provides an effective balance of flexibility and simplicity. It delivers near-real-time data movement with built-in transformation capabilities, all configured through a single declarative file. There is no JVM to manage, no cluster to operate, and no custom code to maintain.

Understanding the Components

It is useful to become familiar with each component before connecting them.

InfluxDB

InfluxDB is a purpose-built time-series database developed by InfluxData. It organises data using a distinctive model:

Measurements are like tables — they group related time-series data (e.g., cpu, temperature, http_requests).
Tags are indexed string key-value pairs used for filtering (e.g., host=server01, region=us-east).
Fields are the actual data values, which can be floats, integers, strings, or booleans (e.g., usage_idle=95.2, bytes_sent=1024i).
Timestamps are nanosecond-precision Unix timestamps.

InfluxDB v2.x uses Flux as its query language, whereas v1.x uses InfluxQL (which is SQL-like). The discussion below primarily targets v2.x while noting v1.x alternatives where relevant.

Telegraf

Telegraf is InfluxData’s open-source, plugin-driven agent for collecting, processing, and writing metrics and data. Its architecture is built around four types of plugin:

Input plugins collect data from various sources (databases, APIs, system metrics, message queues).
Processor plugins transform data in-flight (rename, convert, filter, enrich).
Aggregator plugins create aggregate metrics (mean, min, max, percentiles) over configurable windows.
Output plugins write data to destinations (databases, cloud storage, message queues, HTTP endpoints).

Telegraf is a single binary with no external dependencies. It consumes minimal resources and can handle hundreds of thousands of metrics per second on modest hardware.

Apache Iceberg

Apache Iceberg is an open table format designed for substantial analytic datasets. Unlike older formats such as Hive, Iceberg provides:

ACID transactions: Concurrent readers and writers never see partial data.
Schema evolution: Add, drop, rename, or reorder columns without rewriting data.
Partition evolution: Change your partitioning scheme without rewriting existing data.
Time travel: Query your data as it existed at any previous point in time.
Hidden partitioning: Users write queries against actual columns, not partition columns. Iceberg handles partition pruning automatically.

On AWS, Iceberg tables reside as Parquet files on S3, with metadata managed by the AWS Glue Data Catalog. They can be queried through Amazon Athena, Amazon EMR (Spark), AWS Glue ETL, or any engine that supports the Iceberg table format.

Component Characteristics Comparison

Characteristic	InfluxDB	Apache Iceberg on S3
Query Language	Flux / InfluxQL	Standard SQL (Athena, Spark SQL)
Storage Cost (per GB/month)	~$2.00 (Cloud) / self-hosted varies	~$0.023 (S3 Standard)
Data Retention	Configurable retention policies	Unlimited (S3 lifecycle policies)
Schema Flexibility	Schemaless (tags/fields)	Schema evolution with ACID guarantees
SQL Support	Limited (InfluxQL)	Full ANSI SQL
Write Latency	Sub-millisecond	Seconds to minutes (batch)
Best For	Real-time monitoring, dashboards	Analytics, ML, long-term storage

Prerequisites and Setup

Before constructing the pipeline, each component must be installed and configured. Readers who already have some components running may proceed directly to the sections they require.

InfluxDB Setup (v2.x)

For readers who do not yet have InfluxDB running, installation proceeds as follows:

# Ubuntu/Debian
wget https://dl.influxdata.com/influxdb/releases/influxdb2_2.7.5-1_amd64.deb
sudo dpkg -i influxdb2_2.7.5-1_amd64.deb
sudo systemctl start influxdb
sudo systemctl enable influxdb

# Initial setup (creates org, bucket, and admin token)
influx setup \
  --org my-org \
  --bucket metrics \
  --username admin \
  --password SecurePassword123! \
  --token my-super-secret-token \
  --force

# Verify it's running
influx ping

For InfluxDB v1.x, the installation is similar but employs a different configuration:

# InfluxDB v1.x setup
wget https://dl.influxdata.com/influxdb/releases/influxdb-1.8.10_linux_amd64.tar.gz
tar xvfz influxdb-1.8.10_linux_amd64.tar.gz
sudo cp influxdb-1.8.10-1/usr/bin/influxd /usr/local/bin/
influxd &

# Create database
influx -execute "CREATE DATABASE metrics"
influx -execute "CREATE RETENTION POLICY one_year ON metrics DURATION 365d REPLICATION 1 DEFAULT"

Sample data should also be generated for use throughout this guide:

# Write sample data to InfluxDB v2.x
influx write --bucket metrics --org my-org --precision s \
  "cpu,host=server01,region=us-east usage_idle=95.2,usage_system=2.1,usage_user=2.7 $(date +%s)
cpu,host=server02,region=us-west usage_idle=88.5,usage_system=5.3,usage_user=6.2 $(date +%s)
memory,host=server01,region=us-east used_percent=42.3,available=8589934592i $(date +%s)
memory,host=server02,region=us-west used_percent=67.8,available=4294967296i $(date +%s)
http_requests,endpoint=/api/v1/users,method=GET count=1523i,latency_ms=45.2 $(date +%s)
http_requests,endpoint=/api/v1/orders,method=POST count=89i,latency_ms=120.5 $(date +%s)"

Telegraf Installation

# Ubuntu/Debian (latest stable)
wget https://dl.influxdata.com/telegraf/releases/telegraf_1.30.1-1_amd64.deb
sudo dpkg -i telegraf_1.30.1-1_amd64.deb

# Verify installation
telegraf --version

# Generate a default config for reference
telegraf config > /tmp/telegraf-reference.conf

AWS Setup

The S3 bucket should be created and the AWS services configured:

# Create the S3 bucket for the data pipeline
aws s3 mb s3://my-timeseries-lakehouse --region us-east-1

# Create directory structure
aws s3api put-object --bucket my-timeseries-lakehouse --key landing-zone/
aws s3api put-object --bucket my-timeseries-lakehouse --key iceberg-warehouse/

# Create Glue database
aws glue create-database --database-input '{
  "Name": "timeseries_db",
  "Description": "Time-series data from InfluxDB via Telegraf pipeline"
}'

# Configure Athena results location
aws s3 mb s3://my-timeseries-lakehouse-athena-results --region us-east-1
aws athena update-work-group \
  --work-group primary \
  --configuration-updates "ResultConfigurationUpdates={OutputLocation=s3://my-timeseries-lakehouse-athena-results/}"

Required IAM Policy

Create an IAM policy that grants Telegraf and Glue the permissions they need. Attach this to the IAM user or role used by Telegraf and the Glue service:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "S3LakehouseAccess",
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:DeleteObject",
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": [
        "arn:aws:s3:::my-timeseries-lakehouse",
        "arn:aws:s3:::my-timeseries-lakehouse/*"
      ]
    },
    {
      "Sid": "GlueCatalogAccess",
      "Effect": "Allow",
      "Action": [
        "glue:GetDatabase",
        "glue:GetDatabases",
        "glue:CreateTable",
        "glue:UpdateTable",
        "glue:GetTable",
        "glue:GetTables",
        "glue:DeleteTable",
        "glue:GetPartitions",
        "glue:CreatePartition",
        "glue:BatchCreatePartition",
        "glue:UpdatePartition",
        "glue:DeletePartition"
      ],
      "Resource": [
        "arn:aws:glue:us-east-1:ACCOUNT_ID:catalog",
        "arn:aws:glue:us-east-1:ACCOUNT_ID:database/timeseries_db",
        "arn:aws:glue:us-east-1:ACCOUNT_ID:table/timeseries_db/*"
      ]
    },
    {
      "Sid": "AthenaQueryAccess",
      "Effect": "Allow",
      "Action": [
        "athena:StartQueryExecution",
        "athena:GetQueryExecution",
        "athena:GetQueryResults",
        "athena:StopQueryExecution"
      ],
      "Resource": "arn:aws:athena:us-east-1:ACCOUNT_ID:workgroup/primary"
    },
    {
      "Sid": "AthenaResultsAccess",
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::my-timeseries-lakehouse-athena-results",
        "arn:aws:s3:::my-timeseries-lakehouse-athena-results/*"
      ]
    },
    {
      "Sid": "GlueCrawlerAccess",
      "Effect": "Allow",
      "Action": [
        "glue:StartCrawler",
        "glue:GetCrawler",
        "glue:CreateCrawler",
        "glue:UpdateCrawler"
      ],
      "Resource": "arn:aws:glue:us-east-1:ACCOUNT_ID:crawler/*"
    }
  ]
}

Caution: Replace ACCOUNT_ID with your actual AWS account ID. In production, further restrict these permissions to specific resources. Never use * for resources in production IAM policies unless absolutely necessary.

Configure Telegraf to Read from InfluxDB

The pipeline begins here. Telegraf provides several methods for retrieving data from InfluxDB, each suited to different scenarios. Each is examined below.

Method A: Using inputs.influxdb_v2 (InfluxDB 2.x — Pull-Based)

This is the recommended approach for InfluxDB 2.x. Telegraf periodically executes a Flux query and ingests the results.

# telegraf.conf - Input: InfluxDB v2 (pull-based Flux queries)
[[inputs.influxdb_v2]]
  ## InfluxDB v2 API URL
  urls = ["http://localhost:8086"]

  ## Authentication token
  token = "${INFLUXDB_TOKEN}"

  ## Organization name
  organization = "my-org"

  ## List of Flux queries to execute
  ## Each query becomes a separate set of metrics
  [[inputs.influxdb_v2.query]]
    ## Bucket to query
    bucket = "metrics"

    ## Flux query - pull CPU metrics from the last interval
    query = '''
      from(bucket: "metrics")
        |> range(start: -1h)
        |> filter(fn: (r) => r._measurement == "cpu")
        |> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")
        |> drop(columns: ["_start", "_stop", "_measurement"])
    '''

    ## Override the measurement name
    measurement = "cpu_metrics"

  [[inputs.influxdb_v2.query]]
    bucket = "metrics"
    query = '''
      from(bucket: "metrics")
        |> range(start: -1h)
        |> filter(fn: (r) => r._measurement == "memory")
        |> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")
        |> drop(columns: ["_start", "_stop", "_measurement"])
    '''
    measurement = "memory_metrics"

  ## Collection interval - how often to run these queries
  interval = "1h"

  ## Timeout for each query
  timeout = "30s"

Tip: The pivot() function in Flux is essential here. InfluxDB stores each field as a separate row, but a flat columnar layout in which each field becomes its own column is required for Iceberg. Pivoting transforms _field=usage_idle, _value=95.2 into usage_idle=95.2 as a proper column.

Method B: Using inputs.influxdb (InfluxDB 1.x)

For InfluxDB v1.x, the legacy input plugin is used:

# telegraf.conf - Input: InfluxDB v1.x
[[inputs.influxdb]]
  ## InfluxDB v1.x API URL
  urls = ["http://localhost:8086/debug/vars"]

  ## Optional: basic auth
  username = "${INFLUXDB_USER}"
  password = "${INFLUXDB_PASSWORD}"

  ## Timeout
  timeout = "10s"

  ## Only collect specific measurements
  insecure_skip_verify = false

The v1.x plugin, however, primarily collects InfluxDB internal metrics. For extracting actual data from a v1.x instance, the HTTP input with InfluxQL is more practical:

# telegraf.conf - Input: InfluxDB v1.x via HTTP + InfluxQL
[[inputs.http]]
  urls = [
    "http://localhost:8086/query?db=metrics&q=SELECT+*+FROM+cpu+WHERE+time+>+now()-1h&epoch=ns"
  ]

  ## Authentication
  username = "${INFLUXDB_USER}"
  password = "${INFLUXDB_PASSWORD}"

  ## Parse the InfluxDB JSON response
  data_format = "json"
  json_query = "results.0.series"

  ## How often to poll
  interval = "1h"
  timeout = "30s"

Method C: Using inputs.http with InfluxDB API (Both Versions)

This is the most flexible approach, operating with both InfluxDB versions by calling the API directly:

# telegraf.conf - Input: InfluxDB v2 API via HTTP
[[inputs.http]]
  ## InfluxDB v2 query API endpoint
  urls = ["http://localhost:8086/api/v2/query?org=my-org"]

  ## POST method for Flux queries
  method = "POST"

  ## Headers
  [inputs.http.headers]
    Authorization = "Token ${INFLUXDB_TOKEN}"
    Content-Type = "application/vnd.flux"
    Accept = "application/csv"

  ## Flux query as the request body
  body = '''
    from(bucket: "metrics")
      |> range(start: -1h)
      |> filter(fn: (r) => r._measurement == "cpu" or r._measurement == "memory")
      |> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")
  '''

  ## Parse the CSV response from InfluxDB
  data_format = "csv"
  csv_header_row_count = 1
  csv_timestamp_column = "_time"
  csv_timestamp_format = "2006-01-02T15:04:05Z"

  interval = "1h"
  timeout = "60s"

Method D: InfluxDB Pushing to Telegraf (Push-Based)

Rather than Telegraf pulling data, InfluxDB may be configured to push data to Telegraf using the influxdb_listener input. This approach is well suited to real-time pipelines:

# telegraf.conf - Input: InfluxDB Listener (push-based)
[[inputs.influxdb_listener]]
  ## Address and port to listen on
  service_address = ":8186"

  ## Maximum allowed HTTP body size
  max_body_size = "50MB"

  ## Database tag to add (optional)
  database_tag = "source_db"

  ## Retention policy tag (optional)
  retention_policy_tag = ""

  ## TLS configuration (recommended for production)
  # tls_cert = "/etc/telegraf/cert.pem"
  # tls_key = "/etc/telegraf/key.pem"

## For InfluxDB v2, use the v2 listener
[[inputs.influxdb_v2_listener]]
  ## Address to listen on
  service_address = ":8186"

  ## Maximum allowed HTTP body size
  max_body_size = "50MB"

  ## Authentication token (must match what the sender uses)
  token = "${TELEGRAF_LISTENER_TOKEN}"

For the push-based approach, InfluxDB or another Telegraf instance is then configured to write to this listener. For InfluxDB 2.x, a task can be used to push data periodically:

// InfluxDB Task: Push data to Telegraf listener every hour
option task = {name: "export_to_telegraf", every: 1h}

from(bucket: "metrics")
  |> range(start: -task.every)
  |> filter(fn: (r) => r._measurement == "cpu" or r._measurement == "memory")
  |> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")
  |> to(
      host: "http://telegraf-host:8186",
      token: "telegraf-listener-token",
      bucket: "pipeline",
      org: "my-org"
  )

Handling Pagination for Large Datasets

When backfilling historical data, querying everything at once is impractical. Flux’s range() with windowing should be used instead:

# For large historical exports, create multiple queries with time windows
# This Flux query processes data in manageable chunks

from(bucket: "metrics")
  |> range(start: 2025-01-01T00:00:00Z, stop: 2025-02-01T00:00:00Z)
  |> filter(fn: (r) => r._measurement == "cpu")
  |> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")
  |> limit(n: 100000)

Key Takeaway: For ongoing incremental synchronisation, Method A (pull-based) or Method D (push-based) is appropriate. For one-time historical backfill, Method C with time-windowed queries is preferable. The push-based approach has the lowest latency but requires configuration on the InfluxDB side.

Transform Data with Telegraf Processors

Raw InfluxDB data does not map cleanly to a columnar Iceberg schema. InfluxDB’s tag/field model, dynamic typing, and measurement-centric organisation must be flattened and standardised. Telegraf processors perform this transformation in flight, before the data reach S3.

Rename Measurements, Tags, and Fields

# telegraf.conf - Processor: Rename fields to match Iceberg schema
[[processors.rename]]
  ## Rename measurements
  [[processors.rename.replace]]
    measurement = "cpu"
    dest = "server_cpu_metrics"

  [[processors.rename.replace]]
    measurement = "memory"
    dest = "server_memory_metrics"

  ## Rename tags
  [[processors.rename.replace]]
    tag = "host"
    dest = "hostname"

  ## Rename fields
  [[processors.rename.replace]]
    field = "usage_idle"
    dest = "cpu_idle_percent"

  [[processors.rename.replace]]
    field = "usage_system"
    dest = "cpu_system_percent"

  [[processors.rename.replace]]
    field = "usage_user"
    dest = "cpu_user_percent"

Convert Field Types

InfluxDB may store values as floats when the Iceberg schema expects integers, or vice versa:

# telegraf.conf - Processor: Convert field types
[[processors.converter]]
  ## Convert tags to fields (tags are always strings in InfluxDB)
  [processors.converter.tags]
    ## Convert string tags to string fields for columnar storage
    string = ["hostname", "region", "endpoint", "method"]

  ## Convert specific fields to different types
  [processors.converter.fields]
    ## Ensure these are always floats
    float = ["cpu_idle_percent", "cpu_system_percent", "cpu_user_percent", "latency_ms"]

    ## Ensure these are integers
    integer = ["available", "count"]

    ## Convert to unsigned integers if needed
    unsigned = []

    ## Convert to boolean
    boolean = []

Custom Transformations with Starlark

For complex transformation logic, the Starlark processor permits Python-like scripts. This is the appropriate point at which to flatten the InfluxDB data model into a structure that works well with Iceberg:

# telegraf.conf - Processor: Starlark custom transformations
[[processors.starlark]]
  namepass = ["server_cpu_metrics", "server_memory_metrics"]

  source = '''
def apply(metric):
    # Add a computed field: total CPU usage
    if metric.name == "server_cpu_metrics":
        idle = metric.fields.get("cpu_idle_percent", 0.0)
        metric.fields["cpu_total_usage_percent"] = round(100.0 - idle, 2)

    # Add data quality flag
    if metric.name == "server_memory_metrics":
        used = metric.fields.get("used_percent", 0.0)
        if used > 95.0:
            metric.fields["memory_critical"] = True
        else:
            metric.fields["memory_critical"] = False

    # Normalize region names
    region = metric.tags.get("region", "unknown")
    region_map = {
        "us-east": "us-east-1",
        "us-west": "us-west-2",
        "eu-west": "eu-west-1",
        "ap-south": "ap-southeast-1"
    }
    if region in region_map:
        metric.tags["region"] = region_map[region]

    # Add pipeline metadata
    metric.tags["pipeline_version"] = "1.0"
    metric.tags["source_system"] = "influxdb"

    return metric
'''

Extract Date Partitions

For Hive-style partitioning on S3 (which AWS Glue expects), the year, month, and day must be extracted from the timestamp:

# telegraf.conf - Processor: Extract date components for partitioning
[[processors.date]]
  ## Extract date components from the metric timestamp
  ## These become fields that we'll use for S3 path partitioning

  ## Tag name for the year
  tag_key = "partition_year"
  date_format = "2006"

[[processors.date]]
  tag_key = "partition_month"
  date_format = "01"

[[processors.date]]
  tag_key = "partition_day"
  date_format = "02"

[[processors.date]]
  tag_key = "partition_hour"
  date_format = "15"

Map Tag Values with Enum

# telegraf.conf - Processor: Map tag values
[[processors.enum]]
  [[processors.enum.mapping]]
    tag = "method"
    [processors.enum.mapping.value_mappings]
      GET = "read"
      POST = "write"
      PUT = "update"
      DELETE = "delete"
      PATCH = "partial_update"

Full Transformation Example: Flattening InfluxDB to Columnar

A complete Starlark processor that converts InfluxDB’s tag/field model into a fully flat record suitable for Iceberg is shown below:

# telegraf.conf - Processor: Flatten InfluxDB model to columnar
[[processors.starlark]]
  source = '''
def apply(metric):
    # Move all tags into fields so everything becomes a column in Iceberg
    # Tags in InfluxDB are indexed strings; in Iceberg they're just columns
    for key, value in metric.tags.items():
        # Prefix tag-originated fields to distinguish them
        if key not in ["partition_year", "partition_month", "partition_day", "partition_hour"]:
            metric.fields["tag_" + key] = value

    # Add the measurement name as a field (useful if mixing measurements)
    metric.fields["measurement"] = metric.name

    # Add ingestion timestamp (separate from the data timestamp)
    # This helps with pipeline debugging and data freshness monitoring
    metric.fields["ingested_at"] = time.now().unix_nano // 1000000000

    return metric

load("time", "time")
'''

Tip: Order is important for Telegraf processors. They execute in the order in which they appear in the configuration file. rename should precede converter, and date should precede the Starlark flatten processor so that the partition tags are already available.

Output to S3 (Landing Zone)

The transformed data must now be moved from Telegraf into S3. This is the landing zone—a staging area in which raw files accumulate before being ingested into the Iceberg table.

Using outputs.s3 with JSON Format

The simplest approach is to write JSON files to S3. The built-in outputs.s3 plugin (available in Telegraf 1.28 and later) handles this natively:

# telegraf.conf - Output: S3 with JSON format
[[outputs.s3]]
  ## S3 bucket name
  bucket = "my-timeseries-lakehouse"

  ## S3 key prefix with Hive-style partitioning
  ## Uses Go template syntax with metric tags
  s3_key_prefix = "landing-zone/{{.Tag \"partition_year\"}}/{{.Tag \"partition_month\"}}/{{.Tag \"partition_day\"}}/"

  ## AWS region
  region = "us-east-1"

  ## Use shared credentials or environment variables
  ## access_key = "${AWS_ACCESS_KEY_ID}"
  ## secret_key = "${AWS_SECRET_ACCESS_KEY}"

  ## Data format
  data_format = "json"

  ## Batching configuration
  ## Write to S3 every 5 minutes or when buffer reaches 10000 metrics
  metric_batch_size = 10000
  metric_buffer_limit = 100000
  flush_interval = "5m"
  flush_jitter = "30s"

  ## File naming
  ## Creates files like: landing-zone/2026/04/03/metrics_1712160000.json
  use_batch_format = true

Caution: If an older version of Telegraf without the outputs.s3 plugin is in use, outputs.file may be combined with a cron job that synchronises files to S3 using aws s3 sync. Alternatively, Telegraf may be upgraded to the latest version.

Alternative: outputs.file Plus S3 Sync

For Telegraf versions without the S3 plugin, or when greater control over file rotation is required:

# telegraf.conf - Output: Local files (for S3 sync)
[[outputs.file]]
  ## Write to a local directory organized by date
  files = ["/var/telegraf/output/metrics.json"]

  ## Rotate files based on time
  rotation_interval = "1h"
  rotation_max_size = "100MB"
  rotation_max_archives = 48

  ## Data format
  data_format = "json"
  json_timestamp_units = "1s"

A cron job is then configured to synchronise to S3:

# /etc/cron.d/telegraf-s3-sync
# Sync local Telegraf output to S3 every 10 minutes
*/10 * * * * telegraf aws s3 sync /var/telegraf/output/ s3://my-timeseries-lakehouse/landing-zone/ \
  --exclude "*.json" \
  --include "*.json-*" \
  && find /var/telegraf/output/ -name "*.json-*" -mmin +60 -delete

Writing Parquet via execd Output

Parquet is the preferred format for Iceberg. Although Telegraf does not natively output Parquet, the outputs.execd plugin can be used together with a lightweight Python script:

# telegraf.conf - Output: Parquet via execd
[[outputs.execd]]
  command = ["/usr/bin/python3", "/opt/telegraf/write_parquet_s3.py"]

  ## Restart the process if it exits
  restart_delay = "10s"

  ## Data format sent to the script via stdin
  data_format = "json"

The companion Python script is shown below:

#!/usr/bin/env python3
"""write_parquet_s3.py - Telegraf execd output plugin for Parquet to S3"""

import sys
import json
import os
from datetime import datetime
from io import BytesIO

import pyarrow as pa
import pyarrow.parquet as pq
import boto3

BUCKET = os.environ.get("S3_BUCKET", "my-timeseries-lakehouse")
PREFIX = os.environ.get("S3_PREFIX", "landing-zone")
REGION = os.environ.get("AWS_REGION", "us-east-1")
BATCH_SIZE = int(os.environ.get("BATCH_SIZE", "5000"))
FLUSH_SECONDS = int(os.environ.get("FLUSH_SECONDS", "300"))

s3 = boto3.client("s3", region_name=REGION)
buffer = []
last_flush = datetime.utcnow()

def flush_to_s3(records):
    if not records:
        return

    # Build a PyArrow table from the records
    table = pa.Table.from_pylist(records)

    # Write to Parquet in memory
    parquet_buffer = BytesIO()
    pq.write_table(table, parquet_buffer, compression="snappy")
    parquet_buffer.seek(0)

    # Generate S3 key with Hive-style partitioning
    now = datetime.utcnow()
    key = (
        f"{PREFIX}/year={now.year}/month={now.month:02d}/"
        f"day={now.day:02d}/hour={now.hour:02d}/"
        f"metrics_{now.strftime('%Y%m%d_%H%M%S')}.parquet"
    )

    s3.put_object(Bucket=BUCKET, Key=key, Body=parquet_buffer.getvalue())
    sys.stderr.write(f"Flushed {len(records)} records to s3://{BUCKET}/{key}\n")

for line in sys.stdin:
    try:
        metric = json.loads(line.strip())
        # Flatten the metric into a single dict
        record = {
            "measurement": metric.get("name", ""),
            "timestamp": metric.get("timestamp", 0),
        }
        record.update(metric.get("tags", {}))
        record.update(metric.get("fields", {}))
        buffer.append(record)

        # Flush on batch size or time
        elapsed = (datetime.utcnow() - last_flush).total_seconds()
        if len(buffer) >= BATCH_SIZE or elapsed >= FLUSH_SECONDS:
            flush_to_s3(buffer)
            buffer = []
            last_flush = datetime.utcnow()

    except json.JSONDecodeError:
        sys.stderr.write(f"Invalid JSON: {line}\n")
    except Exception as e:
        sys.stderr.write(f"Error: {e}\n")

# Flush remaining records on exit
flush_to_s3(buffer)

Alternative: outputs.http to Lambda for Parquet

A serverless approach uses an AWS Lambda function that receives metrics via HTTP and writes Parquet files:

# telegraf.conf - Output: HTTP to Lambda Function URL
[[outputs.http]]
  url = "https://abc123.lambda-url.us-east-1.on.aws/ingest"

  method = "POST"
  data_format = "json"
  json_timestamp_units = "1s"

  ## Batch settings
  metric_batch_size = 5000
  metric_buffer_limit = 50000

  ## Timeout and retry
  timeout = "30s"

  ## Headers
  [outputs.http.headers]
    Content-Type = "application/json"
    X-Pipeline-Source = "telegraf-influxdb"

S3 Partitioning Strategy

The S3 path structure is important for Glue and Athena performance. Hive-style partitioning should be used:

# Recommended S3 path structure for time-series data
s3://my-timeseries-lakehouse/
  landing-zone/
    measurement=cpu_metrics/
      year=2026/
        month=04/
          day=03/
            hour=00/
              metrics_20260403_000000.json
              metrics_20260403_001500.json
            hour=01/
              metrics_20260403_010000.json
          day=04/
            ...
    measurement=memory_metrics/
      year=2026/
        ...

Key Takeaway: Partition by day for most workloads. Partition by hour only when ingestion exceeds 1GB per day per measurement. Over-partitioning produces too many small files and degrades Athena query performance, while under-partitioning forces full scans. The optimal range is files between 128MB and 256MB.

Create the Iceberg Table in AWS Glue

With data landing on S3, the Iceberg table definition must be created in the AWS Glue Data Catalog. Two approaches are available.

Option A: Create Iceberg Table via Athena DDL

This is the most precise approach, allowing the exact schema and partitioning to be defined:

-- Create Iceberg table for CPU metrics
CREATE TABLE timeseries_db.cpu_metrics (
    timestamp         timestamp,
    hostname          string,
    region            string,
    cpu_idle_percent  double,
    cpu_system_percent double,
    cpu_user_percent  double,
    cpu_total_usage_percent double,
    pipeline_version  string,
    source_system     string,
    ingested_at       bigint
)
PARTITIONED BY (day(timestamp))
LOCATION 's3://my-timeseries-lakehouse/iceberg-warehouse/cpu_metrics/'
TBLPROPERTIES (
    'table_type' = 'ICEBERG',
    'format' = 'PARQUET',
    'write_compression' = 'snappy',
    'optimize_rewrite_delete_file_threshold' = '10'
);

-- Create Iceberg table for memory metrics
CREATE TABLE timeseries_db.memory_metrics (
    timestamp         timestamp,
    hostname          string,
    region            string,
    used_percent      double,
    available         bigint,
    memory_critical   boolean,
    pipeline_version  string,
    source_system     string,
    ingested_at       bigint
)
PARTITIONED BY (day(timestamp))
LOCATION 's3://my-timeseries-lakehouse/iceberg-warehouse/memory_metrics/'
TBLPROPERTIES (
    'table_type' = 'ICEBERG',
    'format' = 'PARQUET',
    'write_compression' = 'snappy'
);

-- Create a unified metrics table (if you prefer a single table)
CREATE TABLE timeseries_db.all_metrics (
    timestamp         timestamp,
    measurement       string,
    hostname          string,
    region            string,
    metric_name       string,
    metric_value      double,
    tags              map<string, string>,
    pipeline_version  string,
    source_system     string,
    ingested_at       bigint
)
PARTITIONED BY (day(timestamp), measurement)
LOCATION 's3://my-timeseries-lakehouse/iceberg-warehouse/all_metrics/'
TBLPROPERTIES (
    'table_type' = 'ICEBERG',
    'format' = 'PARQUET',
    'write_compression' = 'snappy'
);

Option B: AWS Glue Crawler for Schema Discovery

When automatic schema discovery from JSON or Parquet files in the landing zone is desired:

# Create the Glue Crawler via AWS CLI
aws glue create-crawler \
  --name "timeseries-landing-crawler" \
  --role "arn:aws:iam::ACCOUNT_ID:role/GlueCrawlerRole" \
  --database-name "timeseries_db" \
  --targets '{
    "S3Targets": [
      {
        "Path": "s3://my-timeseries-lakehouse/landing-zone/",
        "Exclusions": ["**/_temporary/**", "**/_SUCCESS"]
      }
    ]
  }' \
  --schema-change-policy '{
    "UpdateBehavior": "UPDATE_IN_DATABASE",
    "DeleteBehavior": "LOG"
  }' \
  --configuration '{
    "Version": 1.0,
    "Grouping": {
      "TableGroupingPolicy": "CombineCompatibleSchemas"
    },
    "CrawlerOutput": {
      "Partitions": {
        "AddOrUpdateBehavior": "InheritFromTable"
      }
    }
  }' \
  --recrawl-policy '{"RecrawlBehavior": "CRAWL_NEW_FOLDERS_ONLY"}'

# Run the crawler
aws glue start-crawler --name "timeseries-landing-crawler"

# Check crawler status
aws glue get-crawler --name "timeseries-landing-crawler" \
  --query "Crawler.State"

Schema Mapping: InfluxDB to Iceberg Types

InfluxDB Type	Example	Iceberg/Parquet Type	Notes
Float	`usage_idle=95.2`	`double`	Direct mapping
Integer	`bytes_sent=1024i`	`bigint`	Use `int` for values under 2B
String (field)	`status="healthy"`	`string`	Direct mapping
Boolean	`active=true`	`boolean`	Direct mapping
Tag (string)	`host=server01`	`string`	Consider `dictionary` encoding
Timestamp	nanosecond Unix	`timestamp`	Convert from ns to ms or s

Automate the Iceberg Ingestion

Having data on S3 is only half of the task. It must be moved from the landing zone into the Iceberg table proper. Four approaches are described below, from simplest to most sophisticated.

Option A: AWS Glue ETL Job (PySpark)

This is the most robust approach for production workloads. A Glue ETL job reads from the landing zone and writes to the Iceberg table:

# glue_iceberg_ingestion.py - AWS Glue ETL Job
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from pyspark.sql.functions import col, to_timestamp, current_timestamp, lit
from pyspark.sql.types import *

args = getResolvedOptions(sys.argv, [
    'JOB_NAME',
    'source_path',
    'database_name',
    'table_name'
])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

# Configure Iceberg
spark.conf.set("spark.sql.catalog.glue_catalog", "org.apache.iceberg.spark.SparkCatalog")
spark.conf.set("spark.sql.catalog.glue_catalog.warehouse", "s3://my-timeseries-lakehouse/iceberg-warehouse/")
spark.conf.set("spark.sql.catalog.glue_catalog.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog")
spark.conf.set("spark.sql.catalog.glue_catalog.io-impl", "org.apache.iceberg.aws.s3.S3FileIO")
spark.conf.set("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")

# Read from landing zone
source_path = args['source_path']  # s3://my-timeseries-lakehouse/landing-zone/
database = args['database_name']    # timeseries_db
table = args['table_name']          # cpu_metrics

print(f"Reading from: {source_path}")

# Read JSON files from landing zone
df_raw = spark.read.json(source_path)

# Transform: convert timestamp, clean up columns
df_transformed = df_raw \
    .withColumn("timestamp", to_timestamp(col("timestamp").cast("long"))) \
    .withColumn("hostname", col("tag_hostname")) \
    .withColumn("region", col("tag_region")) \
    .withColumn("load_timestamp", current_timestamp()) \
    .drop("tag_hostname", "tag_region", "partition_year",
          "partition_month", "partition_day", "partition_hour")

# Select columns matching the Iceberg table schema
df_final = df_transformed.select(
    "timestamp",
    "hostname",
    "region",
    col("cpu_idle_percent").cast("double"),
    col("cpu_system_percent").cast("double"),
    col("cpu_user_percent").cast("double"),
    col("cpu_total_usage_percent").cast("double"),
    "pipeline_version",
    "source_system",
    col("ingested_at").cast("long")
)

print(f"Records to insert: {df_final.count()}")

# Write to Iceberg table using APPEND mode
df_final.writeTo(f"glue_catalog.{database}.{table}") \
    .option("merge-schema", "true") \
    .append()

print(f"Successfully ingested data into {database}.{table}")

# Optional: Clean up processed files from landing zone
# This prevents re-processing on the next run
# Uncomment if you want automatic cleanup:
# import boto3
# s3 = boto3.resource('s3')
# bucket = s3.Bucket('my-timeseries-lakehouse')
# bucket.objects.filter(Prefix='landing-zone/processed/').delete()

job.commit()

The Glue job is created and scheduled as follows:

# Create the Glue ETL job
aws glue create-job \
  --name "timeseries-iceberg-ingestion" \
  --role "arn:aws:iam::ACCOUNT_ID:role/GlueETLRole" \
  --command '{
    "Name": "glueetl",
    "ScriptLocation": "s3://my-timeseries-lakehouse/scripts/glue_iceberg_ingestion.py",
    "PythonVersion": "3"
  }' \
  --default-arguments '{
    "--source_path": "s3://my-timeseries-lakehouse/landing-zone/",
    "--database_name": "timeseries_db",
    "--table_name": "cpu_metrics",
    "--datalake-formats": "iceberg",
    "--conf": "spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
    "--enable-metrics": "true"
  }' \
  --glue-version "4.0" \
  --number-of-workers 2 \
  --worker-type "G.1X" \
  --timeout 60

# Schedule the job to run every hour via EventBridge
aws events put-rule \
  --name "hourly-iceberg-ingestion" \
  --schedule-expression "rate(1 hour)" \
  --state ENABLED

aws events put-targets \
  --rule "hourly-iceberg-ingestion" \
  --targets '[{
    "Id": "glue-job-target",
    "Arn": "arn:aws:glue:us-east-1:ACCOUNT_ID:job/timeseries-iceberg-ingestion",
    "RoleArn": "arn:aws:iam::ACCOUNT_ID:role/EventBridgeGlueRole"
  }]'

Option B: Athena INSERT INTO (Simple, No Compute Required)

For smaller datasets, Glue ETL may be omitted and Athena used directly to move the data:

-- First, create a temporary table pointing to the landing zone
CREATE EXTERNAL TABLE timeseries_db.cpu_metrics_landing (
    timestamp         bigint,
    measurement       string,
    tag_hostname      string,
    tag_region        string,
    cpu_idle_percent  double,
    cpu_system_percent double,
    cpu_user_percent  double,
    cpu_total_usage_percent double,
    pipeline_version  string,
    source_system     string,
    ingested_at       bigint
)
PARTITIONED BY (year string, month string, day string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://my-timeseries-lakehouse/landing-zone/measurement=cpu_metrics/'
TBLPROPERTIES ('has_encrypted_data'='false');

-- Add partitions (or use MSCK REPAIR TABLE)
MSCK REPAIR TABLE timeseries_db.cpu_metrics_landing;

-- Insert from landing zone into Iceberg table
INSERT INTO timeseries_db.cpu_metrics
SELECT
    from_unixtime(timestamp) as timestamp,
    tag_hostname as hostname,
    tag_region as region,
    cpu_idle_percent,
    cpu_system_percent,
    cpu_user_percent,
    cpu_total_usage_percent,
    pipeline_version,
    source_system,
    ingested_at
FROM timeseries_db.cpu_metrics_landing
WHERE year = '2026' AND month = '04' AND day = '03';

Option C: Lambda for Near-Real-Time Ingestion

For near-real-time ingestion, a Lambda function is triggered when new files arrive in S3:

# lambda_iceberg_ingest.py - Triggered by S3 PutObject events
import json
import boto3
import time

athena = boto3.client('athena')

def handler(event, context):
    """Triggered when a new file lands in the landing zone."""

    for record in event['Records']:
        bucket = record['s3']['bucket']['name']
        key = record['s3']['object']['key']

        print(f"New file: s3://{bucket}/{key}")

        # Parse the partition info from the S3 path
        # Example: landing-zone/measurement=cpu_metrics/year=2026/month=04/day=03/...
        parts = key.split('/')
        partition_info = {}
        for part in parts:
            if '=' in part:
                k, v = part.split('=', 1)
                partition_info[k] = v

        measurement = partition_info.get('measurement', 'unknown')
        year = partition_info.get('year', '')
        month = partition_info.get('month', '')
        day = partition_info.get('day', '')

        if measurement == 'cpu_metrics':
            # Run Athena INSERT INTO query
            query = f"""
            INSERT INTO timeseries_db.cpu_metrics
            SELECT
                from_unixtime(timestamp) as timestamp,
                tag_hostname as hostname,
                tag_region as region,
                cpu_idle_percent,
                cpu_system_percent,
                cpu_user_percent,
                cpu_total_usage_percent,
                pipeline_version,
                source_system,
                ingested_at
            FROM timeseries_db.cpu_metrics_landing
            WHERE year = '{year}' AND month = '{month}' AND day = '{day}'
            """

            response = athena.start_query_execution(
                QueryString=query,
                QueryExecutionContext={'Database': 'timeseries_db'},
                ResultConfiguration={
                    'OutputLocation': 's3://my-timeseries-lakehouse-athena-results/'
                }
            )

            query_id = response['QueryExecutionId']
            print(f"Started Athena query: {query_id}")

    return {'statusCode': 200, 'body': 'Ingestion triggered'}

The S3 event trigger is configured as follows:

# Create the Lambda function
aws lambda create-function \
  --function-name timeseries-iceberg-ingest \
  --runtime python3.12 \
  --handler lambda_iceberg_ingest.handler \
  --role arn:aws:iam::ACCOUNT_ID:role/LambdaIcebergIngestRole \
  --zip-file fileb://lambda_package.zip \
  --timeout 300 \
  --memory-size 256

# Add S3 trigger permission
aws lambda add-permission \
  --function-name timeseries-iceberg-ingest \
  --statement-id s3-trigger \
  --action lambda:InvokeFunction \
  --principal s3.amazonaws.com \
  --source-arn arn:aws:s3:::my-timeseries-lakehouse

# Configure S3 bucket notification
aws s3api put-bucket-notification-configuration \
  --bucket my-timeseries-lakehouse \
  --notification-configuration '{
    "LambdaFunctionConfigurations": [
      {
        "LambdaFunctionArn": "arn:aws:lambda:us-east-1:ACCOUNT_ID:function:timeseries-iceberg-ingest",
        "Events": ["s3:ObjectCreated:*"],
        "Filter": {
          "Key": {
            "FilterRules": [
              {"Name": "prefix", "Value": "landing-zone/"},
              {"Name": "suffix", "Value": ".json"}
            ]
          }
        }
      }
    ]
  }'

Option D: Apache Spark on EMR

For the highest throughput and maximum flexibility, Spark is run directly on EMR with the Iceberg connector:

# emr_iceberg_job.py - Spark job for EMR
from pyspark.sql import SparkSession
from pyspark.sql.functions import *

spark = SparkSession.builder \
    .appName("InfluxDB-to-Iceberg") \
    .config("spark.sql.catalog.glue_catalog", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.glue_catalog.warehouse", "s3://my-timeseries-lakehouse/iceberg-warehouse/") \
    .config("spark.sql.catalog.glue_catalog.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog") \
    .config("spark.sql.catalog.glue_catalog.io-impl", "org.apache.iceberg.aws.s3.S3FileIO") \
    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
    .getOrCreate()

# Read new files from landing zone
df = spark.read.json("s3://my-timeseries-lakehouse/landing-zone/measurement=cpu_metrics/year=2026/")

# Transform and write to Iceberg
df_clean = df \
    .withColumn("timestamp", to_timestamp(col("timestamp").cast("long"))) \
    .withColumnRenamed("tag_hostname", "hostname") \
    .withColumnRenamed("tag_region", "region") \
    .select("timestamp", "hostname", "region",
            "cpu_idle_percent", "cpu_system_percent",
            "cpu_user_percent", "cpu_total_usage_percent",
            "pipeline_version", "source_system", "ingested_at")

# Append to Iceberg table
df_clean.writeTo("glue_catalog.timeseries_db.cpu_metrics").append()

# Run compaction to optimize file sizes
spark.sql("""
    CALL glue_catalog.system.rewrite_data_files(
        table => 'timeseries_db.cpu_metrics',
        options => map('target-file-size-bytes', '134217728')
    )
""")

spark.stop()

# Submit the EMR job
aws emr add-steps \
  --cluster-id j-XXXXXXXXXXXXX \
  --steps '[{
    "Type": "Spark",
    "Name": "Iceberg Ingestion",
    "ActionOnFailure": "CONTINUE",
    "Args": [
      "--deploy-mode", "cluster",
      "--conf", "spark.jars.packages=org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.5.0",
      "--conf", "spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
      "s3://my-timeseries-lakehouse/scripts/emr_iceberg_job.py"
    ]
  }]'

Complete End-to-End telegraf.conf

A full, production-ready Telegraf configuration combining all preceding elements is presented below. Copying this file and updating the environment variables yields a working pipeline:

# =============================================================================
# TELEGRAF CONFIGURATION: InfluxDB → S3 Landing Zone (for Iceberg)
# =============================================================================
# This configuration reads time-series data from InfluxDB v2, transforms it
# into a flat columnar schema, and writes it to S3 with Hive-style partitioning
# for subsequent ingestion into Apache Iceberg tables.
# =============================================================================

# Global Agent Configuration
[agent]
  ## Collection interval - how often input plugins are gathered
  interval = "1h"

  ## Flush interval - how often output plugins write
  flush_interval = "5m"

  ## Jitter to prevent thundering herd
  collection_jitter = "30s"
  flush_jitter = "30s"

  ## Metric batch and buffer sizes
  metric_batch_size = 10000
  metric_buffer_limit = 100000

  ## Override default hostname
  hostname = ""
  omit_hostname = true

  ## Logging
  debug = false
  quiet = false
  logfile = "/var/log/telegraf/telegraf-pipeline.log"
  logfile_rotation_interval = "24h"
  logfile_rotation_max_size = "100MB"
  logfile_rotation_max_archives = 7

# =============================================================================
# INPUT: Read from InfluxDB v2 via Flux queries
# =============================================================================
[[inputs.influxdb_v2]]
  urls = ["${INFLUXDB_URL}"]
  token = "${INFLUXDB_TOKEN}"
  organization = "${INFLUXDB_ORG}"

  ## CPU Metrics
  [[inputs.influxdb_v2.query]]
    bucket = "${INFLUXDB_BUCKET}"
    query = '''
      from(bucket: v.bucket)
        |> range(start: -1h)
        |> filter(fn: (r) => r._measurement == "cpu")
        |> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")
        |> drop(columns: ["_start", "_stop", "_measurement"])
    '''
    measurement = "cpu_metrics"

  ## Memory Metrics
  [[inputs.influxdb_v2.query]]
    bucket = "${INFLUXDB_BUCKET}"
    query = '''
      from(bucket: v.bucket)
        |> range(start: -1h)
        |> filter(fn: (r) => r._measurement == "memory")
        |> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")
        |> drop(columns: ["_start", "_stop", "_measurement"])
    '''
    measurement = "memory_metrics"

  ## HTTP Request Metrics
  [[inputs.influxdb_v2.query]]
    bucket = "${INFLUXDB_BUCKET}"
    query = '''
      from(bucket: v.bucket)
        |> range(start: -1h)
        |> filter(fn: (r) => r._measurement == "http_requests")
        |> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")
        |> drop(columns: ["_start", "_stop", "_measurement"])
    '''
    measurement = "http_request_metrics"

  timeout = "60s"

# =============================================================================
# PROCESSORS: Transform data for Iceberg compatibility
# =============================================================================

# Step 1: Rename fields to clean, descriptive names
[[processors.rename]]
  order = 1

  [[processors.rename.replace]]
    field = "usage_idle"
    dest = "cpu_idle_percent"
  [[processors.rename.replace]]
    field = "usage_system"
    dest = "cpu_system_percent"
  [[processors.rename.replace]]
    field = "usage_user"
    dest = "cpu_user_percent"
  [[processors.rename.replace]]
    field = "used_percent"
    dest = "memory_used_percent"
  [[processors.rename.replace]]
    tag = "host"
    dest = "hostname"

# Step 2: Convert field types for schema consistency
[[processors.converter]]
  order = 2
  [processors.converter.fields]
    float = ["cpu_idle_percent", "cpu_system_percent", "cpu_user_percent",
             "memory_used_percent", "latency_ms"]
    integer = ["available", "count"]

# Step 3: Extract date partitions from timestamp
[[processors.date]]
  order = 3
  tag_key = "partition_year"
  date_format = "2006"

[[processors.date]]
  order = 4
  tag_key = "partition_month"
  date_format = "01"

[[processors.date]]
  order = 5
  tag_key = "partition_day"
  date_format = "02"

# Step 4: Custom transformations (compute derived fields, flatten tags)
[[processors.starlark]]
  order = 6
  source = '''
load("time", "time")

def apply(metric):
    # Compute total CPU usage
    if metric.name == "cpu_metrics":
        idle = metric.fields.get("cpu_idle_percent", 0.0)
        metric.fields["cpu_total_usage_percent"] = round(100.0 - idle, 2)

    # Memory health flag
    if metric.name == "memory_metrics":
        used = metric.fields.get("memory_used_percent", 0.0)
        metric.fields["memory_critical"] = used > 95.0

    # Flatten all tags into fields for columnar storage
    for key, value in metric.tags.items():
        if not key.startswith("partition_"):
            metric.fields["tag_" + key] = value

    # Add metadata
    metric.fields["measurement"] = metric.name
    metric.fields["source_system"] = "influxdb"
    metric.fields["pipeline_version"] = "1.0"
    metric.fields["ingested_at"] = int(time.now().unix_nano / 1000000000)

    return metric
'''

# =============================================================================
# OUTPUT: Write to S3 with Hive-style partitioning
# =============================================================================
[[outputs.s3]]
  bucket = "${AWS_S3_BUCKET}"
  s3_key_prefix = "landing-zone/measurement={{.Name}}/year={{.Tag \"partition_year\"}}/month={{.Tag \"partition_month\"}}/day={{.Tag \"partition_day\"}}/"

  region = "${AWS_REGION}"

  ## Authentication (uses environment variables or instance role)
  # access_key = "${AWS_ACCESS_KEY_ID}"
  # secret_key = "${AWS_SECRET_ACCESS_KEY}"

  data_format = "json"
  json_timestamp_units = "1s"

  ## Batching
  metric_batch_size = 10000
  metric_buffer_limit = 100000
  flush_interval = "5m"
  flush_jitter = "30s"

  use_batch_format = true

# =============================================================================
# MONITORING: Internal Telegraf metrics
# =============================================================================
[[inputs.internal]]
  collect_memstats = true
  name_prefix = "telegraf_pipeline_"

[[outputs.file]]
  files = ["/var/log/telegraf/internal_metrics.json"]
  data_format = "json"
  namepass = ["telegraf_pipeline_*"]
  rotation_interval = "24h"
  rotation_max_archives = 7

The required environment variables are set as follows:

# /etc/default/telegraf or /etc/telegraf/telegraf.env
INFLUXDB_URL=http://localhost:8086
INFLUXDB_TOKEN=my-super-secret-token
INFLUXDB_ORG=my-org
INFLUXDB_BUCKET=metrics
AWS_S3_BUCKET=my-timeseries-lakehouse
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=secret...

The pipeline is started as follows:

# Test the configuration first
telegraf --config /etc/telegraf/telegraf-pipeline.conf --test

# Run in foreground for debugging
telegraf --config /etc/telegraf/telegraf-pipeline.conf

# Run as a service
sudo cp /etc/telegraf/telegraf-pipeline.conf /etc/telegraf/telegraf.conf
sudo systemctl restart telegraf
sudo systemctl status telegraf
sudo journalctl -u telegraf -f

Querying Iceberg Data with Athena

Once data are flowing into the Iceberg tables, they can be queried with standard SQL through Amazon Athena. Several practical queries for daily use are presented below.

Basic Analytical Queries

-- Average CPU usage per host over the last 24 hours
SELECT
    hostname,
    region,
    AVG(cpu_total_usage_percent) as avg_cpu_usage,
    MAX(cpu_total_usage_percent) as peak_cpu_usage,
    MIN(cpu_idle_percent) as min_idle_percent,
    COUNT(*) as data_points
FROM timeseries_db.cpu_metrics
WHERE timestamp >= current_timestamp - interval '24' hour
GROUP BY hostname, region
ORDER BY avg_cpu_usage DESC;

-- Hourly aggregation for dashboarding
SELECT
    date_trunc('hour', timestamp) as hour,
    hostname,
    AVG(cpu_total_usage_percent) as avg_cpu,
    APPROX_PERCENTILE(cpu_total_usage_percent, 0.95) as p95_cpu,
    APPROX_PERCENTILE(cpu_total_usage_percent, 0.99) as p99_cpu
FROM timeseries_db.cpu_metrics
WHERE timestamp >= current_timestamp - interval '7' day
GROUP BY 1, 2
ORDER BY 1 DESC, 2;

-- Memory alerts: find hosts with high memory usage
SELECT
    hostname,
    region,
    timestamp,
    used_percent,
    available / (1024*1024*1024) as available_gb
FROM timeseries_db.memory_metrics
WHERE used_percent > 90
  AND timestamp >= current_timestamp - interval '1' hour
ORDER BY used_percent DESC;

Time Travel Queries

One of Iceberg’s principal features is time travel: querying the data as they existed at a previous point in time:

-- Query data as it existed yesterday at noon
SELECT *
FROM timeseries_db.cpu_metrics
FOR TIMESTAMP AS OF TIMESTAMP '2026-04-02 12:00:00'
WHERE hostname = 'server01';

-- Compare current data with data from a week ago
SELECT
    current_data.hostname,
    current_data.avg_cpu as current_avg_cpu,
    historical.avg_cpu as week_ago_avg_cpu,
    current_data.avg_cpu - historical.avg_cpu as cpu_change
FROM (
    SELECT hostname, AVG(cpu_total_usage_percent) as avg_cpu
    FROM timeseries_db.cpu_metrics
    WHERE timestamp >= current_timestamp - interval '1' day
    GROUP BY hostname
) current_data
JOIN (
    SELECT hostname, AVG(cpu_total_usage_percent) as avg_cpu
    FROM timeseries_db.cpu_metrics
    FOR TIMESTAMP AS OF TIMESTAMP '2026-03-27 00:00:00'
    WHERE timestamp >= TIMESTAMP '2026-03-26' AND timestamp < TIMESTAMP '2026-03-27'
    GROUP BY hostname
) historical ON current_data.hostname = historical.hostname;

-- View table snapshot history
SELECT * FROM timeseries_db.cpu_metrics$snapshots ORDER BY committed_at DESC LIMIT 10;

-- View manifest files
SELECT * FROM timeseries_db.cpu_metrics$manifests;

Joining with Other Data Sources

-- Join CPU metrics with a server inventory table
SELECT
    c.hostname,
    c.region,
    s.instance_type,
    s.team,
    AVG(c.cpu_total_usage_percent) as avg_cpu,
    s.monthly_cost
FROM timeseries_db.cpu_metrics c
JOIN timeseries_db.server_inventory s ON c.hostname = s.hostname
WHERE c.timestamp >= current_timestamp - interval '7' day
GROUP BY c.hostname, c.region, s.instance_type, s.team, s.monthly_cost
HAVING AVG(c.cpu_total_usage_percent) < 10  -- Underutilized servers
ORDER BY s.monthly_cost DESC;

Athena Cost Optimization Tips

Tip: Athena charges $5 per TB of data scanned. With Iceberg's partition pruning and Parquet's columnar storage, costs can be reduced by 90 per cent or more compared with scanning raw JSON files. Partition columns should always be included in the WHERE clause, and only the columns required should be selected; SELECT * on large tables should be avoided.

Use partition predicates: WHERE timestamp >= ... triggers Iceberg partition pruning, scanning only the relevant Parquet files.
Select specific columns: Parquet is columnar, so SELECT hostname, cpu_total_usage_percent reads far less data than SELECT *.
Run compaction regularly: Small files degrade query performance and increase cost. Files should be kept between 128MB and 256MB.
Use CTAS for frequent queries: Materialise expensive queries as new Iceberg tables.

Alternative Pipeline: InfluxDB to Telegraf to Kafka to Spark to Iceberg

Organisations requiring true streaming ingestion with exactly-once semantics should consider a Kafka-based pipeline. The architecture is as follows.

InfluxDB → Telegraf → Kafka Topic → Spark Structured Streaming → Iceberg Table

When to Use Kafka Rather Than S3-Based

S3-based (this guide's main approach) is appropriate when batch processing is acceptable (minutes to hours), data volume is under 1TB per day, minimal infrastructure is desired, and cost is a priority.
Kafka-based is appropriate when sub-minute latency is required, data volume exceeds 1TB per day, a Kafka cluster is already operational, and exactly-once delivery guarantees are needed.

Telegraf Kafka Output Configuration

# telegraf.conf - Output: Kafka
[[outputs.kafka]]
  ## Kafka broker addresses
  brokers = ["kafka-broker-1:9092", "kafka-broker-2:9092", "kafka-broker-3:9092"]

  ## Topic for all metrics (or use topic_suffix for per-measurement topics)
  topic = "influxdb-metrics"

  ## Use measurement name as topic suffix for separate topics
  ## Creates topics like: influxdb-metrics-cpu_metrics, influxdb-metrics-memory_metrics
  # topic_suffix = {method = "measurement"}

  ## Compression
  compression_codec = "snappy"

  ## Required acks: 0=none, 1=leader, -1=all replicas
  required_acks = -1

  ## Max message size
  max_message_bytes = 1048576

  ## Data format
  data_format = "json"
  json_timestamp_units = "1ms"

  ## SASL authentication (if Kafka requires it)
  # sasl_mechanism = "SCRAM-SHA-512"
  # sasl_username = "${KAFKA_USERNAME}"
  # sasl_password = "${KAFKA_PASSWORD}"

  ## TLS
  # tls_ca = "/etc/telegraf/ca.pem"
  # tls_cert = "/etc/telegraf/cert.pem"
  # tls_key = "/etc/telegraf/key.pem"

The Spark Structured Streaming consumer is shown below:

# spark_kafka_iceberg.py - Spark Structured Streaming from Kafka to Iceberg
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *

spark = SparkSession.builder \
    .appName("Kafka-to-Iceberg-Streaming") \
    .config("spark.sql.catalog.glue_catalog", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.glue_catalog.warehouse", "s3://my-timeseries-lakehouse/iceberg-warehouse/") \
    .config("spark.sql.catalog.glue_catalog.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog") \
    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
    .getOrCreate()

# Define the schema matching our Telegraf JSON output
metrics_schema = StructType([
    StructField("name", StringType()),
    StructField("timestamp", LongType()),
    StructField("tags", MapType(StringType(), StringType())),
    StructField("fields", MapType(StringType(), DoubleType()))
])

# Read from Kafka
df_kafka = spark.readStream \
    .format("kafka") \
    .option("kafka.bootstrap.servers", "kafka-broker-1:9092") \
    .option("subscribe", "influxdb-metrics") \
    .option("startingOffsets", "latest") \
    .load()

# Parse JSON messages
df_parsed = df_kafka \
    .select(from_json(col("value").cast("string"), metrics_schema).alias("data")) \
    .select("data.*") \
    .withColumn("timestamp", to_timestamp(col("timestamp").cast("long"))) \
    .withColumn("hostname", col("tags")["hostname"]) \
    .withColumn("region", col("tags")["region"])

# Write to Iceberg using foreachBatch
def write_to_iceberg(batch_df, batch_id):
    batch_df.writeTo("glue_catalog.timeseries_db.all_metrics") \
        .option("merge-schema", "true") \
        .append()

query = df_parsed.writeStream \
    .foreachBatch(write_to_iceberg) \
    .option("checkpointLocation", "s3://my-timeseries-lakehouse/checkpoints/kafka-iceberg/") \
    .trigger(processingTime="1 minute") \
    .start()

query.awaitTermination()

Monitoring and Troubleshooting

A data pipeline is only as effective as its monitoring. The following describes how to maintain pipeline health.

Telegraf Internal Metrics

The inputs.internal plugin configured earlier provides important operational metrics:

# Check Telegraf metrics buffer status
cat /var/log/telegraf/internal_metrics.json | python3 -m json.tool | grep -E "metrics_gathered|metrics_written|buffer_size"

# Key metrics to monitor:
# - gather_errors: input plugin failures (InfluxDB connection issues)
# - metrics_gathered: total metrics collected per interval
# - metrics_written: total metrics sent to S3
# - buffer_size: current buffer usage (should stay well below buffer_limit)
# - write_errors: output plugin failures (S3 permission or network issues)

Common Issues and Resolutions

Issue	Symptoms	Resolution
InfluxDB connection failure	`gather_errors` increasing, no new metrics	Verify InfluxDB URL and token. Check network connectivity. Ensure InfluxDB is running.
S3 permission denied	`write_errors` increasing, `AccessDenied` in logs	Check IAM policy. Verify AWS credentials. Ensure bucket policy allows PutObject.
Schema mismatch in Glue	Athena queries return NULL or fail	Re-run Glue Crawler. Check that JSON field names match table column names. Verify type conversions in Telegraf processors.
Glue Crawler fails	Crawler stuck in RUNNING or FAILED state	Check Glue Crawler IAM role. Verify S3 path is correct. Look for malformed JSON files in landing zone.
Data type conflicts	Fields showing as wrong type in Athena	Use `processors.converter` to enforce types in Telegraf. InfluxDB may return integers as floats or vice versa.
Buffer overflow	`metrics_dropped` count increasing	Increase `metric_buffer_limit`. Reduce `flush_interval`. Check for S3 write latency issues.
Duplicate data in Iceberg	Row counts higher than expected	Implement idempotent ingestion with MERGE INTO instead of INSERT. Track processed files to avoid re-ingestion.
Too many small files	Athena queries slow and expensive	Increase Telegraf batch size. Run Iceberg compaction regularly. Target 128-256MB file sizes.

Data Validation Queries

-- Check data freshness: how recent is the latest data?
SELECT
    MAX(timestamp) as latest_data,
    current_timestamp as current_time,
    date_diff('minute', MAX(timestamp), current_timestamp) as minutes_behind
FROM timeseries_db.cpu_metrics;

-- Check for data gaps: are there any missing hours?
SELECT
    date_trunc('hour', timestamp) as hour,
    COUNT(*) as record_count
FROM timeseries_db.cpu_metrics
WHERE timestamp >= current_timestamp - interval '24' hour
GROUP BY 1
ORDER BY 1;

-- Validate data quality: check for NULLs and outliers
SELECT
    COUNT(*) as total_records,
    COUNT(hostname) as non_null_hostname,
    COUNT(cpu_total_usage_percent) as non_null_cpu,
    MIN(cpu_total_usage_percent) as min_cpu,
    MAX(cpu_total_usage_percent) as max_cpu,
    COUNT(CASE WHEN cpu_total_usage_percent > 100 THEN 1 END) as invalid_cpu_over_100,
    COUNT(CASE WHEN cpu_total_usage_percent < 0 THEN 1 END) as invalid_cpu_negative
FROM timeseries_db.cpu_metrics
WHERE timestamp >= current_timestamp - interval '1' hour;

Performance Optimisation

Establishing a functioning pipeline is one task; achieving good performance at scale is another. The key tuning parameters are discussed below.

Telegraf Buffer Tuning

The two most important Telegraf settings are metric_batch_size and metric_buffer_limit:

metric_batch_size: the number of metrics sent to the output plugin at a time. Larger batches reduce S3 API calls but increase memory usage and latency.
metric_buffer_limit: the maximum number of metrics held in memory. If the output is slow, metrics queue at this point; once the buffer is full, new metrics are dropped.

Recommended Settings by Data Volume

Setting	Small (<10K metrics/min)	Medium (10K-100K/min)	Large (>100K/min)
`metric_batch_size`	5,000	10,000	50,000
`metric_buffer_limit`	50,000	200,000	1,000,000
`flush_interval`	10m	5m	1m
`collection_interval`	1h	15m	5m
Target S3 file size	64-128 MB	128-256 MB	256-512 MB
Partition granularity	Day	Day	Hour
Telegraf RAM estimate	128 MB	512 MB	2-4 GB
Compaction frequency	Daily	Every 6 hours	Every 1-2 hours

Iceberg Compaction

Small files impair Iceberg performance. Compaction should be scheduled to merge them:

-- Run compaction via Athena (Athena v3 with Iceberg support)
OPTIMIZE timeseries_db.cpu_metrics REWRITE DATA USING BIN_PACK;

-- Or via Spark (more control over target file size)
-- In a Glue ETL job or EMR Spark session:
CALL glue_catalog.system.rewrite_data_files(
    table => 'timeseries_db.cpu_metrics',
    options => map(
        'target-file-size-bytes', '134217728',  -- 128MB
        'min-file-size-bytes', '67108864',       -- 64MB
        'max-file-size-bytes', '268435456'       -- 256MB
    )
);

-- Expire old snapshots to reclaim storage
CALL glue_catalog.system.expire_snapshots(
    table => 'timeseries_db.cpu_metrics',
    older_than => TIMESTAMP '2026-03-01 00:00:00',
    retain_last => 10
);

-- Remove orphan files
CALL glue_catalog.system.remove_orphan_files(
    table => 'timeseries_db.cpu_metrics',
    older_than => TIMESTAMP '2026-03-01 00:00:00'
);

Partitioning Best Practices for Time-Series Data

Partition by day for most workloads. This produces a manageable number of partitions and files.
Add a secondary partition on high-cardinality dimensions such as measurement when specific measurements are queried frequently.
Avoid over-partitioning. Partitioning by minute produces millions of tiny files that destroy performance.
Use Iceberg's hidden partitioning with day(timestamp) rather than creating explicit partition columns. Queries on timestamp then automatically trigger partition pruning without users needing to be aware of partitions.
Monitor partition sizes. If any partition contains fewer than ten files, or each file is under 10MB, the partitioning is too granular.

Cost Analysis

Concrete figures merit examination. The cost savings from moving time-series data from InfluxDB to Iceberg on S3 can be substantial, particularly at scale.

Data Volume	InfluxDB Cloud (storage + queries)	S3 + Iceberg + Athena	Monthly Savings
100 GB	~$200/mo (storage) + ~$50/mo (queries)	~$2.30 (S3) + ~$5 (Athena) + ~$10 (Glue)	~$233/mo (93% savings)
1 TB	~$2,000/mo + ~$200/mo	~$23 (S3) + ~$25 (Athena) + ~$20 (Glue)	~$2,132/mo (97% savings)
10 TB	~$20,000/mo + ~$500/mo	~$230 (S3) + ~$100 (Athena) + ~$50 (Glue)	~$20,120/mo (98% savings)

Caution: These cost estimates are approximations based on published pricing as of early 2026. InfluxDB Cloud costs vary by plan and usage patterns. Athena costs depend on query frequency and data scanned (Parquet with partition pruning substantially reduces scan costs). Self-hosted InfluxDB costs depend on individual infrastructure. A bespoke cost analysis with actual workload patterns should always be conducted before migration decisions are made.

Additional costs to consider include the following:

Telegraf compute: Runs on existing infrastructure. Minimal CPU and RAM are required for most workloads.
S3 API costs: PUT requests at $0.005 per 1,000. With batching, this is typically under $10 per month.
Glue Crawler: $0.44 per DPU-hour. A daily crawl typically costs $1 to $5 per month.
Glue ETL: $0.44 per DPU-hour. A daily ten-minute job with two DPUs costs approximately $13 per month.
Data transfer: Free within the same AWS region; cross-region transfer adds $0.02 per GB.

The break-even point is almost immediate. Even at 100GB, savings of more than $230 per month accrue from the move to S3 and Iceberg. The pipeline infrastructure (Telegraf, Glue) costs less than $30 per month for most workloads.

Concluding Remarks

Building a data pipeline from InfluxDB to Apache Iceberg through Telegraf is not only technically feasible but also a compelling architecture that addresses real problems. InfluxDB continues to perform its principal function—real-time monitoring and dashboards—while historical data are offloaded to a lakehouse that costs 90 to 98 per cent less and provides SQL analytics, ML pipelines, and proper data governance.

The architecture comprises the following elements:

Telegraf input plugins that retrieve data from InfluxDB v1.x or v2.x using four methods, ranging from simple pull-based queries to real-time push-based listeners.
Telegraf processors that transform InfluxDB's tag/field model into a flat columnar schema suitable for Iceberg, with type conversion, field renaming, computed fields, and date partitioning.
S3 output with Hive-style partitioning that lands data in formats AWS Glue can discover and catalogue.
Iceberg table creation via Athena DDL or Glue Crawlers, with appropriate partitioning for time-series workloads.
Automated ingestion using Glue ETL jobs, Athena INSERT INTO, Lambda triggers, or Spark on EMR.
A complete, production-ready telegraf.conf that can be deployed with minimal modification.

For organisations requiring real-time pattern detection on streaming data before it lands in the lakehouse, combining this pipeline with complex event processing using Apache Flink permits in-flight anomaly detection while still archiving all data to Iceberg. The principal merit of this architecture is its modularity. It is possible to begin simply—with JSON files on S3 and a Glue Crawler—and progress to Parquet with Spark streaming as requirements grow. Telegraf's plugin architecture permits the substitution of inputs and outputs without rewriting transformation logic, and Iceberg's partition evolution permits changes to partitioning strategy without rewriting any historical data.

For organisations with terabytes of time-series data in InfluxDB and rising storage bills, this pipeline provides a viable migration path. It can be set up over a weekend, validated with a week of dual-writing, and then used as the basis for reducing InfluxDB retention policies.

References

Telegraf Documentation — Official Telegraf plugin documentation and configuration guide
InfluxDB v2 Documentation — Flux query language and API reference
Apache Iceberg Documentation — Table format specification and engine integrations
Amazon Athena Iceberg Integration — Creating and querying Iceberg tables with Athena
AWS Glue Iceberg Support — Using Iceberg with Glue ETL jobs
Telegraf Plugin Directory — Complete list of input, processor, and output plugins
Amazon S3 Documentation — Storage classes, pricing, and lifecycle policies
Iceberg Spark Integration — Reading and writing Iceberg tables with Apache Spark

April 5, 2026

Complex Event Processing with Apache Flink: Building Real-Time CEP Pipelines from Scratch

Summary

What this post covers: A production-style guide to building Complex Event Processing pipelines with Apache Flink, including the Pattern API, three end-to-end Java examples (credit card fraud, IoT anomaly, stock pattern detection), event-time handling, Kafka connectors, deployment, and performance tuning.

Key insights:

CEP is fundamentally different from batch or per-event stream processing: it maintains stateful NFA pattern buffers across event sequences, which is why batch jobs and Kafka Streams cannot replace it for fraud detection or multi-step anomaly correlation.
Pattern contiguity choice dominates correctness and cost: use next() for strict sequences, followedBy() for relaxed matching, and avoid followedByAny() except when truly needed because it triggers combinatorial state growth.
Always drive CEP on event time with proper watermark strategies—processing time produces incorrect matches in any real system where events arrive out of order, and this single mistake breaks more production CEP jobs than any other.
Apply patterns to keyed streams so matches stay scoped to a logical entity (user, sensor, symbol); patterns on non-keyed streams quickly explode in state size and produce nonsensical cross-entity matches.
CEP is inherently stateful, so production readiness depends on RocksDB state backend, short time windows, TimedOutPartialMatchHandler to catch incomplete sequences, and active monitoring of state size to prevent runaway memory growth.

Main topics: What is Complex Event Processing (CEP)?, Why Apache Flink for CEP?, Setting Up Your Flink CEP Project, Understanding Flink CEP Pattern API, Hands-On Credit Card Fraud Detection, Hands-On IoT Sensor Anomaly Detection, Hands-On Stock Market Pattern Detection, Advanced CEP Techniques, Event Time vs Processing Time, Connecting to Real Data Sources, Deploying and Monitoring, Performance Optimization, Common Pitfalls and Troubleshooting, Final Thoughts, References.

Consider a scenario in which a single credit card is used at a gas station in Houston at 2:13 PM, and forty seconds later the same card number appears at an electronics store in Tokyo. Within those forty seconds, a payment-fraud system must ingest both events, correlate them across millions of concurrent transaction streams, recognise the physical impossibility, and emit a fraud alert before the Tokyo merchant finishes printing the receipt. The scenario is far from hypothetical. Visa processes more than 65,000 transactions per second at peak, and the speed of fraudulent activity continues to increase year on year. Traditional batch jobs executed overnight are of little value in such conditions. Complex Event Processing is required, and Apache Flink is among the strongest engines on which to implement it.

This guide presents the construction of real-time CEP pipelines from first principles. Rather than illustrative fragments, it provides complete, compilable Java code suitable for adaptation to production fraud detection, IoT monitoring, and financial market analysis. By the end of the guide, the reader will understand Flink’s CEP library in sufficient depth to design pattern-matching pipelines for any domain.

What is Complex Event Processing (CEP)?

Complex Event Processing is a methodology for detecting meaningful patterns across streams of events in real time. The defining term is patterns. Simple stream processing typically filters or transforms individual events, for example by returning all transactions above $1,000. CEP extends this scope by examining sequences, combinations, and temporal relationships across multiple events.

Simple Events vs Complex Events

A simple event is a single, atomic occurrence such as a temperature reading, a stock trade, or a log entry. A complex event is a higher-level pattern derived from multiple simple events. For example:

Simple event: “User #4821 made a $50 purchase at Starbucks.”
Complex event: “User #4821 made three purchases totalling over $2,000 within five minutes from three different countries.” This complex event exists only because a CEP engine recognised the pattern across the underlying simple events.

CEP Compared with Traditional Processing

Understanding where CEP fits relative to batch and stream processing is important:

Feature	Batch Processing	Stream Processing	CEP
Latency	Minutes to hours	Milliseconds to seconds	Milliseconds to seconds
Data Model	Bounded datasets	Unbounded streams	Unbounded streams with pattern state
Pattern Detection	Post-hoc analysis	Per-event transformations	Multi-event temporal patterns
State Management	Minimal (reprocess from scratch)	Windowed aggregations	Pattern match buffers with NFA
Use Case Example	Monthly reports	Real-time dashboards	Fraud detection, anomaly sequences
Tools	Spark, Hadoop MapReduce	Kafka Streams, Flink DataStream	Flink CEP, Esper, Siddhi

Real-World CEP Applications

CEP is not a niche technology. It underpins a number of important systems across industries:

Fraud Detection: Banks and payment processors use CEP to identify fraudulent transaction patterns in real time, including velocity checks, geographic impossibility, and unusual merchant categories.
IoT Monitoring: Manufacturing plants and smart buildings use CEP to detect equipment failure sequences before catastrophic breakdowns occur. For the data infrastructure behind IoT monitoring, see the guide on managing metadata and time-series data for facility sensor signals.
Algorithmic Trading: Hedge funds detect price-volume patterns across multiple securities within microsecond windows in order to trigger automated trades.
Network Security: SIEM platforms use CEP to correlate firewall logs, authentication events, and data transfer patterns and thereby detect multi-stage cyberattacks.
Supply Chain: Real-time tracking of shipment events allows operators to detect delays, rerouting needs, or customs anomalies before they cascade.

Why Apache Flink for CEP?

Several stream processing engines exist, but Flink occupies a distinct position for CEP workloads. The reasons are discussed below.

Flink’s Architecture for CEP

Flink was designed as a streaming-first engine. Unlike Spark, which added streaming capabilities to a batch framework, Flink treats streams as the fundamental data model. The distinction is consequential for CEP for several reasons:

DataStream API: The core API operates on unbounded streams and offers fine-grained control over event processing, keying, and windowing.
Event Time Processing: Flink natively supports event time semantics with watermarks, a feature that is essential for CEP. Matching patterns across events requires reasoning about when events actually occurred, not when they arrived at the processing system.
Watermarks: The watermark mechanism tracks the progress of event time through the stream and enables correct handling of out-of-order events, which are a routine occurrence in distributed systems.
Flink CEP Library (flink-cep): Flink ships a dedicated CEP library that implements a Non-deterministic Finite Automaton (NFA) for pattern matching. Patterns are defined declaratively, and the engine handles the associated state management internally.
Exactly-Once Semantics: The checkpointing mechanism guarantees exactly-once processing, ensuring that fraud alerts are never duplicated or lost.
Low Latency: Flink processes events within milliseconds rather than in micro-batches. For CEP workloads, where rapid pattern matching is essential, this property is non-negotiable.

Flink CEP Compared with Alternative Engines

Feature	Flink CEP	Kafka Streams	Esper	Spark Structured Streaming	Kinesis Analytics
Pattern Matching	Built-in NFA-based	Manual (no CEP library)	EPL query language	No native CEP	SQL-based only
Latency	True streaming (ms)	True streaming (ms)	In-memory (ms)	Micro-batch (100ms+)	Near real-time
Scalability	Distributed cluster	Embedded scaling	Single JVM	Distributed cluster	AWS managed
Exactly-Once	Yes	Yes	No	Yes	Yes
Fault Tolerance	Checkpointing + savepoints	Changelog topics	Limited	Checkpointing	Managed snapshots
Event Time Support	Native watermarks	Timestamp extractors	Limited	Native watermarks	Limited
Best For	Complex temporal patterns at scale	Simple event-driven microservices	Prototyping, embedded CEP	Batch + streaming hybrid	AWS-native SQL analytics

Key Takeaway: For workloads that require detection of complex temporal patterns across high-volume event streams with exactly-once guarantees, Flink CEP is the strongest choice. Kafka Streams is well suited to simpler event-driven architectures but lacks a built-in pattern matching engine. Esper offers strong CEP semantics yet does not scale horizontally. For a more detailed treatment of Kafka as the event backbone, see the Apache Kafka multivariate time-series engine guide.

Setting Up Your Flink CEP Project

Prerequisites

Before any code is written, the following components should be in place:

Java 11 or 17 (Flink 1.18+ supports both; Java 17 is recommended for new projects)
Maven 3.8+ or Gradle 7+
An IDE—IntelliJ IDEA with the Flink plugin is well suited
Docker (optional, for running Kafka and Flink locally)

Project Structure

The following layout is used throughout this guide:

flink-cep-pipeline/
├── pom.xml
├── src/main/java/com/example/cep/
│   ├── FlinkCEPApplication.java
│   ├── events/
│   │   ├── Transaction.java
│   │   ├── SensorReading.java
│   │   └── StockTick.java
│   ├── patterns/
│   │   ├── FraudPatterns.java
│   │   ├── IoTPatterns.java
│   │   └── StockPatterns.java
│   ├── processors/
│   │   ├── FraudAlertProcessor.java
│   │   ├── AnomalyAlertProcessor.java
│   │   └── TradingSignalProcessor.java
│   └── sources/
│       └── KafkaSourceBuilder.java
└── src/main/resources/
    └── log4j2.properties

Maven pom.xml

The following Maven configuration contains all required Flink CEP dependencies:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
         http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.example</groupId>
    <artifactId>flink-cep-pipeline</artifactId>
    <version>1.0.0</version>
    <packaging>jar</packaging>

    <properties>
        <flink.version>1.18.1</flink.version>
        <java.version>17</java.version>
        <kafka.version>3.6.1</kafka.version>
        <maven.compiler.source>${java.version}</maven.compiler.source>
        <maven.compiler.target>${java.version}</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

    <dependencies>
        <!-- Flink Core -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java</artifactId>
            <version>${flink.version}</version>
            <scope>provided</scope>
        </dependency>

        <!-- Flink CEP Library -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-cep</artifactId>
            <version>${flink.version}</version>
        </dependency>

        <!-- Flink Kafka Connector -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-kafka</artifactId>
            <version>3.1.0-1.18</version>
        </dependency>

        <!-- Flink JSON Format -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-json</artifactId>
            <version>${flink.version}</version>
        </dependency>

        <!-- Flink Clients (for local execution) -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients</artifactId>
            <version>${flink.version}</version>
            <scope>provided</scope>
        </dependency>

        <!-- Jackson for JSON serialization -->
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <version>2.16.1</version>
        </dependency>

        <!-- SLF4J + Log4j2 -->
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-slf4j-impl</artifactId>
            <version>2.22.1</version>
            <scope>runtime</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-api</artifactId>
            <version>2.22.1</version>
            <scope>runtime</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-core</artifactId>
            <version>2.22.1</version>
            <scope>runtime</scope>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>3.5.1</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals><goal>shade</goal></goals>
                        <configuration>
                            <transformers>
                                <transformer implementation=
                                    "org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                    <mainClass>com.example.cep.FlinkCEPApplication</mainClass>
                                </transformer>
                            </transformers>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
</project>

Gradle Alternative

For Gradle users, the equivalent build.gradle.kts is shown below:

plugins {
    java
    id("com.github.johnrengelman.shadow") version "8.1.1"
}

java {
    sourceCompatibility = JavaVersion.VERSION_17
    targetCompatibility = JavaVersion.VERSION_17
}

val flinkVersion = "1.18.1"

dependencies {
    compileOnly("org.apache.flink:flink-streaming-java:$flinkVersion")
    compileOnly("org.apache.flink:flink-clients:$flinkVersion")
    implementation("org.apache.flink:flink-cep:$flinkVersion")
    implementation("org.apache.flink:flink-connector-kafka:3.1.0-1.18")
    implementation("org.apache.flink:flink-json:$flinkVersion")
    implementation("com.fasterxml.jackson.core:jackson-databind:2.16.1")
    runtimeOnly("org.apache.logging.log4j:log4j-slf4j-impl:2.22.1")
    runtimeOnly("org.apache.logging.log4j:log4j-core:2.22.1")
}

Tip: The flink-streaming-java and flink-clients dependencies are marked as provided (Maven) or compileOnly (Gradle) because the Flink cluster already includes them. When running locally in an IDE, add them to the run configuration’s classpath.

Understanding Flink CEP Pattern API

The Flink CEP library provides a declarative API for defining event patterns. Internally, the library compiles each pattern definition into a Non-deterministic Finite Automaton (NFA) that matches patterns efficiently against the incoming event stream. Each major concept is examined in turn below.

Pattern Basics

Every pattern starts with Pattern.begin() and chains additional states:

// Strict contiguity: events must be directly adjacent
Pattern<Event, ?> strict = Pattern.<Event>begin("start")
    .where(new SimpleCondition<Event>() {
        @Override
        public boolean filter(Event event) {
            return event.getType().equals("login_failed");
        }
    })
    .next("second")  // MUST be the very next event
    .where(new SimpleCondition<Event>() {
        @Override
        public boolean filter(Event event) {
            return event.getType().equals("login_failed");
        }
    })
    .next("third")
    .where(new SimpleCondition<Event>() {
        @Override
        public boolean filter(Event event) {
            return event.getType().equals("login_failed");
        }
    });

// Relaxed contiguity: allows non-matching events in between
Pattern<Event, ?> relaxed = Pattern.<Event>begin("start")
    .where(/* ... */)
    .followedBy("end")  // matching events can have other events between them
    .where(/* ... */);

// Non-deterministic relaxed contiguity:
// matches all possible combinations
Pattern<Event, ?> nonDeterministic = Pattern.<Event>begin("start")
    .where(/* ... */)
    .followedByAny("end")  // considers ALL matching events, not just first
    .where(/* ... */);

Contiguity: Strict, Relaxed, Non-Deterministic

Contiguity is one of the most important concepts in Flink CEP. Consider a scenario in which the event stream contains A, C, B1, B2 and the pattern is “A followed by B”:

next()—Strict: No match. C appears between A and B1, which breaks strict contiguity.
followedBy()—Relaxed: Matches {A, B1}. C is skipped, and the first matching B is selected.
followedByAny()—Non-deterministic relaxed: Matches both {A, B1} and {A, B2}, since all possible matching events are considered.

Quantifiers

// Exactly N times
Pattern<Event, ?> exactly3 = Pattern.<Event>begin("failures")
    .where(condition)
    .times(3);  // exactly 3 matching events

// N or more times
Pattern<Event, ?> atLeast3 = Pattern.<Event>begin("failures")
    .where(condition)
    .timesOrMore(3);  // 3 or more matching events

// Range
Pattern<Event, ?> range = Pattern.<Event>begin("failures")
    .where(condition)
    .times(2, 5);  // between 2 and 5 matching events

// One or more (greedy)
Pattern<Event, ?> oneOrMore = Pattern.<Event>begin("failures")
    .where(condition)
    .oneOrMore()
    .greedy();  // match as many as possible

// Optional
Pattern<Event, ?> withOptional = Pattern.<Event>begin("start")
    .where(startCondition)
    .next("middle")
    .where(middleCondition)
    .optional()  // this state may or may not match
    .next("end")
    .where(endCondition);

Conditions

// Simple condition — checks current event only
.where(new SimpleCondition<Event>() {
    @Override
    public boolean filter(Event event) {
        return event.getAmount() > 1000.0;
    }
})

// Iterative condition — can reference previously matched events
.where(new IterativeCondition<Event>() {
    @Override
    public boolean filter(Event event, Context<Event> ctx) {
        // Compare with previously matched event
        for (Event prev : ctx.getEventsForPattern("start")) {
            if (!event.getLocation().equals(prev.getLocation())) {
                return true;  // different location than start event
            }
        }
        return false;
    }
})

// OR condition
.where(new SimpleCondition<Event>() {
    @Override
    public boolean filter(Event event) {
        return event.getType().equals("withdrawal");
    }
})
.or(new SimpleCondition<Event>() {
    @Override
    public boolean filter(Event event) {
        return event.getType().equals("transfer");
    }
})

// Until condition (stop condition for looping patterns)
.oneOrMore()
.until(new SimpleCondition<Event>() {
    @Override
    public boolean filter(Event event) {
        return event.getType().equals("logout");
    }
})

Time Constraints

// The entire pattern must complete within 5 minutes
Pattern<Event, ?> timedPattern = Pattern.<Event>begin("first")
    .where(/* ... */)
    .followedBy("second")
    .where(/* ... */)
    .followedBy("third")
    .where(/* ... */)
    .within(Time.minutes(5));

Caution: The within() constraint applies to the entire pattern and is measured from the first matching event. If the first event matches at T=0 and within(Time.minutes(5)) is configured, the entire pattern must complete before T=5min. Partially matched patterns that time out are discarded, although they may be captured via timeout handling, which is discussed later.

Hands-On: Credit Card Fraud Detection Pipeline

The first complete CEP pipeline considered here is a credit card fraud detection system. The use case is canonical for CEP, and three distinct fraud patterns are implemented.

The Transaction Event Class

package com.example.cep.events;

public class Transaction implements java.io.Serializable {
    private String transactionId;
    private String userId;
    private double amount;
    private long timestamp;
    private String location;
    private String merchantCategory;
    private String cardNumber;

    // Default constructor for serialization
    public Transaction() {}

    public Transaction(String transactionId, String userId, double amount,
                       long timestamp, String location, String merchantCategory,
                       String cardNumber) {
        this.transactionId = transactionId;
        this.userId = userId;
        this.amount = amount;
        this.timestamp = timestamp;
        this.location = location;
        this.merchantCategory = merchantCategory;
        this.cardNumber = cardNumber;
    }

    // Getters and setters
    public String getTransactionId() { return transactionId; }
    public void setTransactionId(String transactionId) { this.transactionId = transactionId; }
    public String getUserId() { return userId; }
    public void setUserId(String userId) { this.userId = userId; }
    public double getAmount() { return amount; }
    public void setAmount(double amount) { this.amount = amount; }
    public long getTimestamp() { return timestamp; }
    public void setTimestamp(long timestamp) { this.timestamp = timestamp; }
    public String getLocation() { return location; }
    public void setLocation(String location) { this.location = location; }
    public String getMerchantCategory() { return merchantCategory; }
    public void setMerchantCategory(String mc) { this.merchantCategory = mc; }
    public String getCardNumber() { return cardNumber; }
    public void setCardNumber(String cardNumber) { this.cardNumber = cardNumber; }

    @Override
    public String toString() {
        return String.format("Transaction{id=%s, user=%s, amount=%.2f, loc=%s, time=%d}",
            transactionId, userId, amount, location, timestamp);
    }
}

The Fraud Alert Class

package com.example.cep.events;

import java.util.List;

public class FraudAlert implements java.io.Serializable {
    private String alertId;
    private String userId;
    private String patternType;
    private String description;
    private List<Transaction> matchedTransactions;
    private long detectedAt;

    public FraudAlert(String alertId, String userId, String patternType,
                      String description, List<Transaction> matchedTransactions) {
        this.alertId = alertId;
        this.userId = userId;
        this.patternType = patternType;
        this.description = description;
        this.matchedTransactions = matchedTransactions;
        this.detectedAt = System.currentTimeMillis();
    }

    // Getters
    public String getAlertId() { return alertId; }
    public String getUserId() { return userId; }
    public String getPatternType() { return patternType; }
    public String getDescription() { return description; }
    public List<Transaction> getMatchedTransactions() { return matchedTransactions; }
    public long getDetectedAt() { return detectedAt; }

    @Override
    public String toString() {
        return String.format("FRAUD ALERT [%s] User: %s | Pattern: %s | %s | Transactions: %d",
            alertId, userId, patternType, description, matchedTransactions.size());
    }
}

Defining Fraud Patterns

The core logic of the system is captured by three fraud detection patterns, defined below:

package com.example.cep.patterns;

import com.example.cep.events.Transaction;
import org.apache.flink.cep.pattern.Pattern;
import org.apache.flink.cep.pattern.conditions.IterativeCondition;
import org.apache.flink.cep.pattern.conditions.SimpleCondition;
import org.apache.flink.streaming.api.windowing.time.Time;

public class FraudPatterns {

    /**
     * Pattern 1: Geographic Impossibility
     * Three transactions over $500 within 5 minutes from different locations.
     * Spending observed in New York, then London, then Tokyo within 5 minutes
     * is highly indicative of fraudulent activity.
     */
    public static Pattern<Transaction, ?> geographicImpossibility() {
        return Pattern.<Transaction>begin("first")
            .where(new SimpleCondition<Transaction>() {
                @Override
                public boolean filter(Transaction tx) {
                    return tx.getAmount() > 500.0;
                }
            })
            .followedBy("second")
            .where(new IterativeCondition<Transaction>() {
                @Override
                public boolean filter(Transaction tx, Context<Transaction> ctx) {
                    if (tx.getAmount() <= 500.0) return false;
                    for (Transaction first : ctx.getEventsForPattern("first")) {
                        if (!tx.getLocation().equals(first.getLocation())) {
                            return true;
                        }
                    }
                    return false;
                }
            })
            .followedBy("third")
            .where(new IterativeCondition<Transaction>() {
                @Override
                public boolean filter(Transaction tx, Context<Transaction> ctx) {
                    if (tx.getAmount() <= 500.0) return false;
                    for (Transaction first : ctx.getEventsForPattern("first")) {
                        for (Transaction second : ctx.getEventsForPattern("second")) {
                            if (!tx.getLocation().equals(first.getLocation())
                                && !tx.getLocation().equals(second.getLocation())) {
                                return true;
                            }
                        }
                    }
                    return false;
                }
            })
            .within(Time.minutes(5));
    }

    /**
     * Pattern 2: Card Testing Attack
     * A small "test" transaction ($0.01–$5.00) followed by a large transaction
     * ($1000+) within 1 minute. Fraudsters frequently test stolen cards with
     * very small purchases before attempting larger ones.
     */
    public static Pattern<Transaction, ?> cardTestingAttack() {
        return Pattern.<Transaction>begin("test_charge")
            .where(new SimpleCondition<Transaction>() {
                @Override
                public boolean filter(Transaction tx) {
                    return tx.getAmount() >= 0.01 && tx.getAmount() <= 5.0;
                }
            })
            .followedBy("big_charge")
            .where(new SimpleCondition<Transaction>() {
                @Override
                public boolean filter(Transaction tx) {
                    return tx.getAmount() >= 1000.0;
                }
            })
            .within(Time.minutes(1));
    }

    /**
     * Pattern 3: Transaction Velocity
     * More than 5 transactions within 2 minutes. Even legitimate users
     * rarely conduct this many purchases in such a short interval.
     */
    public static Pattern<Transaction, ?> highVelocity() {
        return Pattern.<Transaction>begin("transactions")
            .where(new SimpleCondition<Transaction>() {
                @Override
                public boolean filter(Transaction tx) {
                    return tx.getAmount() > 0;
                }
            })
            .timesOrMore(5)
            .within(Time.minutes(2));
    }
}

Processing Matched Patterns

package com.example.cep.processors;

import com.example.cep.events.FraudAlert;
import com.example.cep.events.Transaction;
import org.apache.flink.cep.functions.PatternProcessFunction;
import org.apache.flink.util.Collector;

import java.util.*;

public class FraudAlertProcessor
        extends PatternProcessFunction<Transaction, FraudAlert> {

    private final String patternType;

    public FraudAlertProcessor(String patternType) {
        this.patternType = patternType;
    }

    @Override
    public void processMatch(Map<String, List<Transaction>> match,
                             Context ctx,
                             Collector<FraudAlert> out) {
        // Collect all matched transactions from all pattern states
        List<Transaction> allTransactions = new ArrayList<>();
        match.values().forEach(allTransactions::addAll);

        // Extract user ID from first transaction
        String userId = allTransactions.get(0).getUserId();

        // Build a description
        String description = buildDescription(match);

        // Generate alert
        String alertId = UUID.randomUUID().toString();
        FraudAlert alert = new FraudAlert(
            alertId, userId, patternType, description, allTransactions
        );

        out.collect(alert);
    }

    private String buildDescription(Map<String, List<Transaction>> match) {
        StringBuilder sb = new StringBuilder();
        sb.append("Matched pattern '").append(patternType).append("': ");

        double total = 0;
        Set<String> locations = new HashSet<>();
        int count = 0;

        for (List<Transaction> txList : match.values()) {
            for (Transaction tx : txList) {
                total += tx.getAmount();
                locations.add(tx.getLocation());
                count++;
            }
        }

        sb.append(count).append(" transactions, ");
        sb.append(String.format("total $%.2f, ", total));
        sb.append("locations: ").append(locations);

        return sb.toString();
    }
}

The Complete Fraud Detection Pipeline

The pipeline below is wired together end to end, from Kafka source to fraud alert output:

package com.example.cep;

import com.example.cep.events.FraudAlert;
import com.example.cep.events.Transaction;
import com.example.cep.patterns.FraudPatterns;
import com.example.cep.processors.FraudAlertProcessor;

import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.cep.CEP;
import org.apache.flink.cep.PatternStream;
import org.apache.flink.cep.pattern.Pattern;
import org.apache.flink.connector.kafka.source.KafkaSource;
import org.apache.flink.connector.kafka.source.enumerator.initializer.OffsetsInitializer;
import org.apache.flink.connector.kafka.sink.KafkaSink;
import org.apache.flink.connector.kafka.sink.KafkaRecordSerializationSchema;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

import com.fasterxml.jackson.databind.ObjectMapper;

import java.time.Duration;

public class FraudDetectionPipeline {

    public static void main(String[] args) throws Exception {
        // 1. Set up the streaming execution environment
        StreamExecutionEnvironment env =
            StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(4);

        // Enable checkpointing for exactly-once semantics
        env.enableCheckpointing(60_000); // checkpoint every 60 seconds

        // 2. Create Kafka source for transactions
        KafkaSource<String> kafkaSource = KafkaSource.<String>builder()
            .setBootstrapServers("localhost:9092")
            .setTopics("transactions")
            .setGroupId("fraud-detection-group")
            .setStartingOffsets(OffsetsInitializer.latest())
            .setValueOnlyDeserializer(new SimpleStringSchema())
            .build();

        // 3. Read from Kafka with event time watermarks
        ObjectMapper mapper = new ObjectMapper();

        DataStream<Transaction> transactions = env
            .fromSource(kafkaSource, WatermarkStrategy
                .<String>forBoundedOutOfOrderness(Duration.ofSeconds(5))
                .withTimestampAssigner((event, timestamp) -> {
                    try {
                        return mapper.readValue(event, Transaction.class)
                            .getTimestamp();
                    } catch (Exception e) {
                        return timestamp;
                    }
                }), "Kafka Transactions")
            .map(json -> mapper.readValue(json, Transaction.class))
            .keyBy(Transaction::getUserId);  // Key by user for per-user patterns

        // 4. Apply Pattern 1: Geographic Impossibility
        Pattern<Transaction, ?> geoPattern = FraudPatterns.geographicImpossibility();
        PatternStream<Transaction> geoPatternStream = CEP.pattern(
            transactions, geoPattern);

        DataStream<FraudAlert> geoAlerts = geoPatternStream.process(
            new FraudAlertProcessor("GEOGRAPHIC_IMPOSSIBILITY"));

        // 5. Apply Pattern 2: Card Testing Attack
        Pattern<Transaction, ?> testPattern = FraudPatterns.cardTestingAttack();
        PatternStream<Transaction> testPatternStream = CEP.pattern(
            transactions, testPattern);

        DataStream<FraudAlert> testAlerts = testPatternStream.process(
            new FraudAlertProcessor("CARD_TESTING_ATTACK"));

        // 6. Apply Pattern 3: High Velocity
        Pattern<Transaction, ?> velocityPattern = FraudPatterns.highVelocity();
        PatternStream<Transaction> velocityPatternStream = CEP.pattern(
            transactions, velocityPattern);

        DataStream<FraudAlert> velocityAlerts = velocityPatternStream.process(
            new FraudAlertProcessor("HIGH_VELOCITY"));

        // 7. Union all alerts and sink to Kafka
        DataStream<FraudAlert> allAlerts = geoAlerts
            .union(testAlerts)
            .union(velocityAlerts);

        // Print to console (for development)
        allAlerts.print("FRAUD ALERT");

        // Sink to Kafka alerts topic
        KafkaSink<String> alertSink = KafkaSink.<String>builder()
            .setBootstrapServers("localhost:9092")
            .setRecordSerializer(
                KafkaRecordSerializationSchema.builder()
                    .setTopic("fraud-alerts")
                    .setValueSerializationSchema(new SimpleStringSchema())
                    .build()
            )
            .build();

        allAlerts
            .map(alert -> mapper.writeValueAsString(alert))
            .sinkTo(alertSink);

        // 8. Execute the pipeline
        env.execute("Credit Card Fraud Detection CEP Pipeline");
    }
}

Key Takeaway: The pipeline applies multiple independent patterns to the same keyed stream. Each CEP.pattern() call creates a separate NFA instance per key (per user), so patterns are evaluated independently and do not interfere with one another. The keyBy(Transaction::getUserId) call is essential because it ensures that patterns match only those events belonging to the same user.

Hands-On: IoT Sensor Anomaly Detection

The second pipeline detects anomalies in IoT sensor data. The target pattern is a sensor reporting three consecutive rising temperature readings above a threshold within one minute, followed by a pressure drop. The sequence frequently indicates an impending equipment failure. In a production setting, the detected anomalies would be persisted in a time-series database optimised for preprocessed data, and the underlying sensor readings could be supplied to forecasting models for predictive maintenance.

Sensor Event Class

package com.example.cep.events;

public class SensorReading implements java.io.Serializable {
    private String sensorId;
    private double temperature;
    private double pressure;
    private long timestamp;
    private String location;

    public SensorReading() {}

    public SensorReading(String sensorId, double temperature, double pressure,
                         long timestamp, String location) {
        this.sensorId = sensorId;
        this.temperature = temperature;
        this.pressure = pressure;
        this.timestamp = timestamp;
        this.location = location;
    }

    public String getSensorId() { return sensorId; }
    public void setSensorId(String sensorId) { this.sensorId = sensorId; }
    public double getTemperature() { return temperature; }
    public void setTemperature(double temperature) { this.temperature = temperature; }
    public double getPressure() { return pressure; }
    public void setPressure(double pressure) { this.pressure = pressure; }
    public long getTimestamp() { return timestamp; }
    public void setTimestamp(long timestamp) { this.timestamp = timestamp; }
    public String getLocation() { return location; }
    public void setLocation(String location) { this.location = location; }

    @Override
    public String toString() {
        return String.format("Sensor{id=%s, temp=%.1f, pressure=%.1f, time=%d}",
            sensorId, temperature, pressure, timestamp);
    }
}

Complete IoT Anomaly Pipeline

package com.example.cep;

import com.example.cep.events.SensorReading;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.cep.CEP;
import org.apache.flink.cep.PatternStream;
import org.apache.flink.cep.functions.PatternProcessFunction;
import org.apache.flink.cep.pattern.Pattern;
import org.apache.flink.cep.pattern.conditions.IterativeCondition;
import org.apache.flink.cep.pattern.conditions.SimpleCondition;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.util.Collector;

import java.time.Duration;
import java.util.*;

public class IoTAnomalyDetectionPipeline {

    private static final double TEMP_THRESHOLD = 85.0; // degrees Celsius
    private static final double PRESSURE_DROP_THRESHOLD = 10.0; // PSI

    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env =
            StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(2);
        env.enableCheckpointing(30_000);

        // Simulated sensor data source (replace with Kafka in production)
        DataStream<SensorReading> sensorStream = env
            .addSource(new SimulatedSensorSource()) // your custom source
            .assignTimestampsAndWatermarks(
                WatermarkStrategy
                    .<SensorReading>forBoundedOutOfOrderness(Duration.ofSeconds(3))
                    .withTimestampAssigner((reading, ts) -> reading.getTimestamp())
            )
            .keyBy(SensorReading::getSensorId);

        // Pattern: 3 consecutive high-temp readings, then a pressure drop
        Pattern<SensorReading, ?> anomalyPattern = Pattern
            .<SensorReading>begin("rising_temp_1")
            .where(new SimpleCondition<SensorReading>() {
                @Override
                public boolean filter(SensorReading reading) {
                    return reading.getTemperature() > TEMP_THRESHOLD;
                }
            })
            .next("rising_temp_2")
            .where(new IterativeCondition<SensorReading>() {
                @Override
                public boolean filter(SensorReading reading,
                                      Context<SensorReading> ctx) {
                    if (reading.getTemperature() <= TEMP_THRESHOLD) return false;
                    for (SensorReading prev : ctx.getEventsForPattern("rising_temp_1")) {
                        return reading.getTemperature() > prev.getTemperature();
                    }
                    return false;
                }
            })
            .next("rising_temp_3")
            .where(new IterativeCondition<SensorReading>() {
                @Override
                public boolean filter(SensorReading reading,
                                      Context<SensorReading> ctx) {
                    if (reading.getTemperature() <= TEMP_THRESHOLD) return false;
                    for (SensorReading prev : ctx.getEventsForPattern("rising_temp_2")) {
                        return reading.getTemperature() > prev.getTemperature();
                    }
                    return false;
                }
            })
            .followedBy("pressure_drop")
            .where(new IterativeCondition<SensorReading>() {
                @Override
                public boolean filter(SensorReading reading,
                                      Context<SensorReading> ctx) {
                    for (SensorReading prev : ctx.getEventsForPattern("rising_temp_1")) {
                        double pressureDiff = prev.getPressure() - reading.getPressure();
                        return pressureDiff > PRESSURE_DROP_THRESHOLD;
                    }
                    return false;
                }
            })
            .within(Time.minutes(1));

        // Apply pattern and process matches
        PatternStream<SensorReading> patternStream =
            CEP.pattern(sensorStream, anomalyPattern);

        DataStream<String> anomalyAlerts = patternStream.process(
            new PatternProcessFunction<SensorReading, String>() {
                @Override
                public void processMatch(Map<String, List<SensorReading>> match,
                                         Context ctx,
                                         Collector<String> out) {
                    SensorReading first = match.get("rising_temp_1").get(0);
                    SensorReading second = match.get("rising_temp_2").get(0);
                    SensorReading third = match.get("rising_temp_3").get(0);
                    SensorReading drop = match.get("pressure_drop").get(0);

                    String alert = String.format(
                        "ANOMALY DETECTED | Sensor: %s | Location: %s | " +
                        "Temps: %.1f -> %.1f -> %.1f (threshold: %.1f) | " +
                        "Pressure drop: %.1f -> %.1f (delta: %.1f)",
                        first.getSensorId(), first.getLocation(),
                        first.getTemperature(), second.getTemperature(),
                        third.getTemperature(), TEMP_THRESHOLD,
                        first.getPressure(), drop.getPressure(),
                        first.getPressure() - drop.getPressure()
                    );

                    out.collect(alert);
                }
            }
        );

        anomalyAlerts.print("IOT ALERT");
        env.execute("IoT Sensor Anomaly Detection Pipeline");
    }
}

Tip: The pipeline uses next() (strict contiguity) for the three rising temperature readings because they must be consecutive. By contrast, followedBy() (relaxed contiguity) is used for the pressure drop, since other normal readings may occur between the temperature spike and the pressure change.

Hands-On: Stock Market Pattern Detection

The third pipeline detects potential trading signals, specifically a price drop greater than 5% followed by a high volume spike within 10 seconds. The pattern can indicate panic selling followed by institutional buying, which may represent a potential buy signal.

StockTick Event Class

package com.example.cep.events;

public class StockTick implements java.io.Serializable {
    private String symbol;
    private double price;
    private long volume;
    private long timestamp;
    private double previousClose;

    public StockTick() {}

    public StockTick(String symbol, double price, long volume,
                     long timestamp, double previousClose) {
        this.symbol = symbol;
        this.price = price;
        this.volume = volume;
        this.timestamp = timestamp;
        this.previousClose = previousClose;
    }

    public String getSymbol() { return symbol; }
    public void setSymbol(String symbol) { this.symbol = symbol; }
    public double getPrice() { return price; }
    public void setPrice(double price) { this.price = price; }
    public long getVolume() { return volume; }
    public void setVolume(long volume) { this.volume = volume; }
    public long getTimestamp() { return timestamp; }
    public void setTimestamp(long timestamp) { this.timestamp = timestamp; }
    public double getPreviousClose() { return previousClose; }
    public void setPreviousClose(double pc) { this.previousClose = pc; }

    public double getPriceChangePercent() {
        if (previousClose == 0) return 0;
        return ((price - previousClose) / previousClose) * 100.0;
    }

    @Override
    public String toString() {
        return String.format("StockTick{sym=%s, price=%.2f, vol=%d, change=%.2f%%}",
            symbol, price, volume, getPriceChangePercent());
    }
}

Complete Stock Market Detection Pipeline

package com.example.cep;

import com.example.cep.events.StockTick;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.cep.CEP;
import org.apache.flink.cep.PatternStream;
import org.apache.flink.cep.functions.PatternProcessFunction;
import org.apache.flink.cep.pattern.Pattern;
import org.apache.flink.cep.pattern.conditions.IterativeCondition;
import org.apache.flink.cep.pattern.conditions.SimpleCondition;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.util.Collector;

import java.time.Duration;
import java.util.*;

public class StockPatternDetectionPipeline {

    private static final double PRICE_DROP_THRESHOLD = -5.0; // percent
    private static final double VOLUME_SPIKE_MULTIPLIER = 3.0; // 3x average

    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env =
            StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(4);
        env.enableCheckpointing(10_000);

        // Assume a Kafka source producing StockTick JSON
        // (using simulated source for this example)
        DataStream<StockTick> tickStream = env
            .addSource(new SimulatedStockSource())
            .assignTimestampsAndWatermarks(
                WatermarkStrategy
                    .<StockTick>forBoundedOutOfOrderness(Duration.ofSeconds(2))
                    .withTimestampAssigner((tick, ts) -> tick.getTimestamp())
            )
            .keyBy(StockTick::getSymbol);

        // Pattern: Price drop > 5% followed by volume spike within 10 seconds
        Pattern<StockTick, ?> buySignalPattern = Pattern
            .<StockTick>begin("price_drop")
            .where(new SimpleCondition<StockTick>() {
                @Override
                public boolean filter(StockTick tick) {
                    return tick.getPriceChangePercent() < PRICE_DROP_THRESHOLD;
                }
            })
            .followedBy("volume_spike")
            .where(new IterativeCondition<StockTick>() {
                @Override
                public boolean filter(StockTick tick, Context<StockTick> ctx) {
                    for (StockTick drop : ctx.getEventsForPattern("price_drop")) {
                        // Volume must be at least 3x the volume during the drop
                        if (tick.getVolume() > drop.getVolume() * VOLUME_SPIKE_MULTIPLIER) {
                            return true;
                        }
                    }
                    return false;
                }
            })
            .within(Time.seconds(10));

        // Apply pattern
        PatternStream<StockTick> patternStream =
            CEP.pattern(tickStream, buySignalPattern);

        DataStream<String> signals = patternStream.process(
            new PatternProcessFunction<StockTick, String>() {
                @Override
                public void processMatch(Map<String, List<StockTick>> match,
                                         Context ctx,
                                         Collector<String> out) {
                    StockTick drop = match.get("price_drop").get(0);
                    StockTick spike = match.get("volume_spike").get(0);

                    String signal = String.format(
                        "BUY SIGNAL | %s | Drop: %.2f%% (price $%.2f) | " +
                        "Volume spike: %d -> %d (%.1fx) | " +
                        "Current price: $%.2f",
                        drop.getSymbol(),
                        drop.getPriceChangePercent(),
                        drop.getPrice(),
                        drop.getVolume(),
                        spike.getVolume(),
                        (double) spike.getVolume() / drop.getVolume(),
                        spike.getPrice()
                    );

                    out.collect(signal);
                }
            }
        );

        signals.print("TRADING SIGNAL");
        env.execute("Stock Market Pattern Detection Pipeline");
    }
}

Caution: The example above illustrates pattern detection for educational purposes and does not constitute investment advice. Production algorithmic trading systems incorporate substantially more signals, risk management, and regulatory safeguards. Trading decisions should not be made on the basis of a single CEP pattern.

Advanced CEP Techniques

Once the fundamentals are in place, the following advanced techniques bring CEP pipelines to production quality.

Dynamic Patterns from External Configuration

Hard-coded patterns are acceptable during initial development, but production systems must update rules without redeployment. One approach is to load pattern parameters from an external source:

// Load thresholds from a configuration source
public class DynamicFraudPatterns {

    public static Pattern<Transaction, ?> fromConfig(FraudRuleConfig config) {
        return Pattern.<Transaction>begin("test_charge")
            .where(new SimpleCondition<Transaction>() {
                @Override
                public boolean filter(Transaction tx) {
                    return tx.getAmount() >= config.getMinTestAmount()
                        && tx.getAmount() <= config.getMaxTestAmount();
                }
            })
            .followedBy("big_charge")
            .where(new SimpleCondition<Transaction>() {
                @Override
                public boolean filter(Transaction tx) {
                    return tx.getAmount() >= config.getLargeTransactionThreshold();
                }
            })
            .within(Time.minutes(config.getTimeWindowMinutes()));
    }
}

// Configuration POJO loaded from database, file, or broadcast stream
public class FraudRuleConfig implements java.io.Serializable {
    private double minTestAmount = 0.01;
    private double maxTestAmount = 5.0;
    private double largeTransactionThreshold = 1000.0;
    private int timeWindowMinutes = 1;

    // getters and setters...
}

Tip: For fully dynamic pattern updates without restarting the Flink job, Flink’s Broadcast State can be used to distribute new rule configurations to all parallel instances. The CEP library itself does not support changing patterns at runtime, but a custom operator can re-create patterns when new configurations arrive via a broadcast stream.

Side Outputs for Timeout Handling

When a partial pattern match times out, that is, when the within() window expires before the pattern completes, the timed-out partial matches can be captured using TimedOutPartialMatchHandler:

import org.apache.flink.cep.functions.PatternProcessFunction;
import org.apache.flink.cep.functions.TimedOutPartialMatchHandler;
import org.apache.flink.util.OutputTag;

public class FraudAlertWithTimeout
        extends PatternProcessFunction<Transaction, FraudAlert>
        implements TimedOutPartialMatchHandler<Transaction> {

    // Side output for timed-out partial matches
    public static final OutputTag<String> TIMEOUT_TAG =
        new OutputTag<String>("timed-out-patterns") {};

    @Override
    public void processMatch(Map<String, List<Transaction>> match,
                             Context ctx,
                             Collector<FraudAlert> out) {
        // Process fully matched pattern (same as before)
        // ...
    }

    @Override
    public void processTimedOutMatch(Map<String, List<Transaction>> match,
                                     Context ctx) {
        // A partial match timed out — log it for analysis
        StringBuilder sb = new StringBuilder("PARTIAL MATCH TIMEOUT: ");
        for (Map.Entry<String, List<Transaction>> entry : match.entrySet()) {
            sb.append(entry.getKey()).append("=")
              .append(entry.getValue().size()).append(" events; ");
        }

        // Output to side output
        ctx.output(TIMEOUT_TAG, sb.toString());
    }
}

// In your pipeline, capture the side output:
SingleOutputStreamOperator<FraudAlert> alerts = patternStream
    .process(new FraudAlertWithTimeout());

DataStream<String> timedOutPatterns = alerts
    .getSideOutput(FraudAlertWithTimeout.TIMEOUT_TAG);

timedOutPatterns.print("TIMEOUT");

Scaling CEP Jobs

CEP pattern matching is stateful because the NFA maintains partial match buffers per key. The principal scaling considerations are summarised below:

Key Partitioning: The stream should be passed through keyBy() before CEP patterns are applied. This ensures that events for the same entity (user, sensor, stock symbol) are routed to the same parallel instance.
Parallelism: Parallelism should be selected on the basis of key cardinality. For 10,000 users, a parallelism of 8–16 is generally sufficient. Flink distributes keys across parallel instances using hash partitioning.
State Size: Each active partial match consumes memory. With long time windows or high-cardinality patterns, state size should be monitored carefully.

// Set different parallelism for different pipeline stages
DataStream<Transaction> transactions = env
    .fromSource(kafkaSource, watermarkStrategy, "source")
    .setParallelism(8)  // match Kafka partitions
    .map(json -> mapper.readValue(json, Transaction.class))
    .setParallelism(8)
    .keyBy(Transaction::getUserId);

// CEP pattern matching — can be different parallelism
PatternStream<Transaction> patternStream = CEP.pattern(
    transactions.setParallelism(16),  // more parallelism for CPU-heavy matching
    fraudPattern
);

State Management and Checkpointing

import org.apache.flink.contrib.streaming.state.EmbeddedRocksDBStateBackend;
import org.apache.flink.streaming.api.CheckpointingMode;
import org.apache.flink.streaming.api.environment.CheckpointConfig;

// Configure robust checkpointing
env.setStateBackend(new EmbeddedRocksDBStateBackend());
env.enableCheckpointing(60_000, CheckpointingMode.EXACTLY_ONCE);

CheckpointConfig checkpointConfig = env.getCheckpointConfig();
checkpointConfig.setMinPauseBetweenCheckpoints(30_000);
checkpointConfig.setCheckpointTimeout(120_000);
checkpointConfig.setMaxConcurrentCheckpoints(1);
checkpointConfig.setTolerableCheckpointFailureNumber(3);

// Retain checkpoints on cancellation (for savepoint-like recovery)
checkpointConfig.setExternalizedCheckpointCleanup(
    CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION
);

Event Time and Processing Time

The distinction between event time and processing time is of central importance for CEP. Event time is the moment at which the event actually occurred, as embedded in the event data. Processing time is the moment at which the Flink operator processes the event. Under ideal conditions, the two values would coincide. In practice, events arrive late, out of order, and at variable rates.

Why Event Time Matters for CEP

Consider a fraud detection pattern defined as “three transactions within 5 minutes.” If transaction #2 arrives at the system 10 seconds late owing to network congestion, processing time would register a gap that does not actually exist. Event time correctly identifies that the three transactions occurred within the 5-minute window, irrespective of when they arrived.

Watermark Strategies

import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.eventtime.WatermarkGenerator;
import org.apache.flink.api.common.eventtime.WatermarkOutput;
import org.apache.flink.api.common.eventtime.WatermarkGeneratorSupplier;

// Strategy 1: Bounded out-of-orderness (most common)
// Assumes events can arrive up to 5 seconds late
WatermarkStrategy<Transaction> strategy1 = WatermarkStrategy
    .<Transaction>forBoundedOutOfOrderness(Duration.ofSeconds(5))
    .withTimestampAssigner((tx, recordTimestamp) -> tx.getTimestamp());

// Strategy 2: Monotonous timestamps (events always in order)
// Only use if you can guarantee ordering
WatermarkStrategy<Transaction> strategy2 = WatermarkStrategy
    .<Transaction>forMonotonousTimestamps()
    .withTimestampAssigner((tx, recordTimestamp) -> tx.getTimestamp());

// Strategy 3: Custom watermark generator for complex scenarios
WatermarkStrategy<Transaction> strategy3 = WatermarkStrategy
    .<Transaction>forGenerator(context -> new WatermarkGenerator<Transaction>() {
        private long maxTimestamp = Long.MIN_VALUE;
        private static final long MAX_DELAY = 10_000L; // 10 seconds

        @Override
        public void onEvent(Transaction tx, long eventTimestamp,
                            WatermarkOutput output) {
            maxTimestamp = Math.max(maxTimestamp, tx.getTimestamp());
        }

        @Override
        public void onPeriodicEmit(WatermarkOutput output) {
            output.emitWatermark(
                new org.apache.flink.api.common.eventtime.Watermark(
                    maxTimestamp - MAX_DELAY
                )
            );
        }
    })
    .withTimestampAssigner((tx, recordTimestamp) -> tx.getTimestamp());

Key Takeaway: For most CEP applications, forBoundedOutOfOrderness() with a bound of 5–10 seconds is the appropriate choice. A bound that is too low causes late events to be missed, while a bound that is too high delays pattern matching by the same amount, since Flink cannot process an event-time window until the watermark passes it.

Connecting to Real Data Sources

Kafka Source Connector

Most production CEP pipelines read from Apache Kafka. For a Python-focused treatment of Kafka consumer implementation, see the Apache Kafka consumer implementation guide in Python. A complete, production-ready Kafka source setup in Java is shown below:

import org.apache.flink.api.common.serialization.DeserializationSchema;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.connector.kafka.source.KafkaSource;
import org.apache.flink.connector.kafka.source.enumerator.initializer.OffsetsInitializer;
import com.fasterxml.jackson.databind.ObjectMapper;

// Custom deserializer for Transaction events
public class TransactionDeserializer
        implements DeserializationSchema<Transaction> {

    private transient ObjectMapper mapper;

    @Override
    public Transaction deserialize(byte[] message) {
        if (mapper == null) mapper = new ObjectMapper();
        try {
            return mapper.readValue(message, Transaction.class);
        } catch (Exception e) {
            // Log and skip malformed events
            System.err.println("Failed to deserialize: " + new String(message));
            return null;
        }
    }

    @Override
    public boolean isEndOfStream(Transaction nextElement) {
        return false;
    }

    @Override
    public TypeInformation<Transaction> getProducedType() {
        return TypeInformation.of(Transaction.class);
    }
}

// Build the Kafka source
KafkaSource<Transaction> source = KafkaSource.<Transaction>builder()
    .setBootstrapServers("kafka-broker-1:9092,kafka-broker-2:9092")
    .setTopics("transactions")
    .setGroupId("fraud-detection-v2")
    .setStartingOffsets(OffsetsInitializer.latest())
    .setValueOnlyDeserializer(new TransactionDeserializer())
    .setProperty("security.protocol", "SASL_SSL")
    .setProperty("sasl.mechanism", "PLAIN")
    .setProperty("sasl.jaas.config",
        "org.apache.kafka.common.security.plain.PlainLoginModule required " +
        "username=\"api-key\" password=\"api-secret\";")
    .build();

Kafka Sink for Alerts

import org.apache.flink.connector.kafka.sink.KafkaSink;
import org.apache.flink.connector.kafka.sink.KafkaRecordSerializationSchema;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.connector.base.DeliveryGuarantee;

KafkaSink<String> alertSink = KafkaSink.<String>builder()
    .setBootstrapServers("kafka-broker-1:9092")
    .setRecordSerializer(
        KafkaRecordSerializationSchema.builder()
            .setTopic("fraud-alerts")
            .setValueSerializationSchema(new SimpleStringSchema())
            .build()
    )
    .setDeliveryGuarantee(DeliveryGuarantee.EXACTLY_ONCE)
    .setTransactionalIdPrefix("fraud-alert-sink")
    .build();

// Wire it up
allAlerts
    .map(alert -> mapper.writeValueAsString(alert))
    .sinkTo(alertSink);

JDBC Connector for Enrichment

It is often necessary to enrich events with data from a database, for example by looking up a customer’s risk score before CEP patterns are applied. Flink’s asynchronous I/O is well suited to this purpose:

import org.apache.flink.streaming.api.functions.async.AsyncFunction;
import org.apache.flink.streaming.api.functions.async.ResultFuture;
import org.apache.flink.streaming.api.datastream.AsyncDataStream;
import java.util.concurrent.TimeUnit;

// Async enrichment function
public class CustomerEnrichment
        extends RichAsyncFunction<Transaction, EnrichedTransaction> {

    private transient DataSource dataSource;

    @Override
    public void open(Configuration parameters) {
        // Initialize connection pool
        dataSource = createConnectionPool();
    }

    @Override
    public void asyncInvoke(Transaction tx,
                            ResultFuture<EnrichedTransaction> resultFuture) {
        CompletableFuture.supplyAsync(() -> {
            try (Connection conn = dataSource.getConnection();
                 PreparedStatement stmt = conn.prepareStatement(
                     "SELECT risk_score, account_age FROM customers WHERE id = ?")) {
                stmt.setString(1, tx.getUserId());
                ResultSet rs = stmt.executeQuery();
                if (rs.next()) {
                    return new EnrichedTransaction(tx,
                        rs.getDouble("risk_score"),
                        rs.getInt("account_age"));
                }
                return new EnrichedTransaction(tx, 0.5, 0);
            } catch (Exception e) {
                return new EnrichedTransaction(tx, 0.5, 0);
            }
        }).thenAccept(result -> resultFuture.complete(
            Collections.singleton(result)));
    }
}

// Apply async enrichment before CEP
DataStream<EnrichedTransaction> enriched = AsyncDataStream
    .unorderedWait(
        transactionStream,
        new CustomerEnrichment(),
        30, TimeUnit.SECONDS, // timeout
        100 // max concurrent requests
    );

Flink also supports connectors for Apache Pulsar, Amazon Kinesis, and many other systems through its connector ecosystem. The setup is broadly similar: define a source, assign watermarks, and feed the stream into the CEP patterns.

Deploying and Monitoring

Running Locally for Development

The simplest development workflow is to run the job directly within an IDE. Flink will then create a local mini-cluster automatically:

// This works out of the box in your IDE
StreamExecutionEnvironment env =
    StreamExecutionEnvironment.getExecutionEnvironment();
// Flink automatically creates a local mini-cluster

Docker Compose for Local Flink and Kafka

For integration testing, the following Docker Compose configuration provides a local Flink and Kafka environment:

# docker-compose.yml
version: '3.8'

services:
  zookeeper:
    image: confluentinc/cp-zookeeper:7.5.3
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
    ports:
      - "2181:2181"

  kafka:
    image: confluentinc/cp-kafka:7.5.3
    depends_on:
      - zookeeper
    ports:
      - "9092:9092"
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_AUTO_CREATE_TOPICS_ENABLE: "true"

  flink-jobmanager:
    image: flink:1.18.1-java17
    ports:
      - "8081:8081"  # Flink Web UI
    command: jobmanager
    environment:
      FLINK_PROPERTIES: |
        jobmanager.rpc.address: flink-jobmanager
        state.backend: rocksdb
        state.checkpoints.dir: file:///tmp/flink-checkpoints
        state.savepoints.dir: file:///tmp/flink-savepoints

  flink-taskmanager:
    image: flink:1.18.1-java17
    depends_on:
      - flink-jobmanager
    command: taskmanager
    scale: 2  # Run 2 task managers
    environment:
      FLINK_PROPERTIES: |
        jobmanager.rpc.address: flink-jobmanager
        taskmanager.numberOfTaskSlots: 4
        taskmanager.memory.process.size: 2048m

Deploying to a Flink Cluster

The fat JAR should be built and submitted to the cluster:

# Build the fat JAR
mvn clean package -DskipTests

# Submit to standalone cluster
./bin/flink run \
  -c com.example.cep.FraudDetectionPipeline \
  target/flink-cep-pipeline-1.0.0.jar

# Submit to YARN cluster
./bin/flink run -m yarn-cluster \
  -yn 4 \       # 4 TaskManagers
  -ys 8 \       # 8 slots per TaskManager
  -yjm 2048m \  # JobManager memory
  -ytm 4096m \  # TaskManager memory
  -c com.example.cep.FraudDetectionPipeline \
  target/flink-cep-pipeline-1.0.0.jar

# Submit to Kubernetes (using Flink Kubernetes Operator)
kubectl apply -f flink-cep-deployment.yaml

Monitoring the Pipeline

The Flink Web UI (default port 8081) is the primary monitoring interface. The most important metrics are summarised below:

Checkpoint Duration: If checkpoints take longer than the configured interval, cascading delays appear. Checkpoint duration should be kept below 50% of the checkpoint interval.
Backpressure: When a downstream operator cannot keep pace, backpressure propagates upstream. The Web UI indicates this with colour-coded task states, where red signals a problem.
Throughput (records/second): Input and output rates for each operator should be monitored. A sudden drop in output rate with constant input suggests a processing bottleneck.
State Size: CEP patterns maintain partial match buffers. State size should be observed over time, since unbounded growth indicates a pattern or key-space problem.

Performance Optimisation

Making a CEP pipeline functional is one matter; making it handle production volumes efficiently is another. The principal tuning levers are described below.

Choosing the Right Parallelism

Parallelism controls the number of parallel instances of each operator that Flink runs. For CEP pipelines, the following guidelines apply:

Source parallelism: Should match the number of Kafka partitions. If the topic has 16 partitions, source parallelism should be set to 16.
CEP operator parallelism: Depends on key cardinality and pattern complexity. A reasonable starting point is the same parallelism as the source, with subsequent increases if backpressure appears on the CEP operator.
Sink parallelism: Typically lower than CEP parallelism because alert volume is substantially lower than input volume.

State Backend Selection

State Backend	State Size	Speed	Best For
HashMapStateBackend (Heap)	Limited by JVM heap	Fastest	Small state, low latency requirements
EmbeddedRocksDBStateBackend	Limited by disk	Slower (disk I/O)	Large state, long time windows

For CEP workloads specifically, the heap state backend is adequate when patterns have short time windows (seconds to minutes) and moderate key cardinality. For long time windows on the order of hours, or millions of keys with active partial matches, RocksDB is the safer option.

Recommended Settings by Use Case

Setting	Fraud Detection	IoT Monitoring	Market Data
Parallelism	8–32	4–16	16–64
Checkpoint Interval	60s	30s	10s
State Backend	RocksDB	Heap or RocksDB	Heap
Watermark Bound	5s	3s	1s
TaskManager Memory	4–8 GB	2–4 GB	8–16 GB
Serialization	Avro or Protobuf	Avro	Protobuf (smallest size)

Serialisation Considerations

Flink’s default Java serialisation is slow and produces large state snapshots. For production CEP pipelines, event types should be registered with Flink’s type system or serialised efficiently:

// Register types for efficient serialization
env.getConfig().registerTypeWithKryoSerializer(
    Transaction.class, ProtobufSerializer.class);

// Or use Flink's POJO serialization (automatic for well-formed POJOs)
// Ensure your classes:
// 1. Have a no-arg constructor
// 2. Have public getters/setters for all fields
// 3. Implement Serializable

// For Avro serialization, use Flink's Avro format
// Add dependency: flink-avro
// Then use AvroDeserializationSchema:
import org.apache.flink.formats.avro.AvroDeserializationSchema;

KafkaSource<Transaction> avroSource = KafkaSource.<Transaction>builder()
    .setBootstrapServers("localhost:9092")
    .setTopics("transactions-avro")
    .setGroupId("fraud-detection")
    .setValueOnlyDeserializer(
        AvroDeserializationSchema.forSpecific(Transaction.class))
    .build();

Common Pitfalls and Troubleshooting

The most frequently encountered issues are summarised below:

Problem	Cause	Solution
Pattern never matches	Events arrive out of order; `within()` window too tight; using `next()` when `followedBy()` is needed	Check event ordering, increase time window, switch contiguity mode
Too many matches (false positives)	Pattern conditions too loose; using `followedByAny()` generating combinatorial explosion	Add tighter conditions, switch to `followedBy()`, shorten time window
OutOfMemoryError	Large NFA state from long time windows, high key cardinality, or `followedByAny()` with `oneOrMore()`	Switch to RocksDB state backend, shorten time windows, add `until()` conditions
Checkpoint failures	State too large to snapshot within timeout; backpressure causing delays	Increase checkpoint timeout, enable incremental checkpointing with RocksDB, reduce state size
Watermark stalling (no progress)	One Kafka partition has no data—its watermark stays at `Long.MIN_VALUE`, blocking global watermark	Use `withIdleness(Duration.ofMinutes(1))` on watermark strategy
Duplicate alerts after restart	Reprocessing events without checkpointed state	Always restart from savepoint/checkpoint, enable exactly-once on sinks
ClassNotFoundException at runtime	`flink-cep` not in the fat JAR; marked as `provided` by mistake	Ensure `flink-cep` is not marked as `provided`—only `flink-streaming-java` and `flink-clients` should be

Fixing Watermark Stalling

Watermark stalling is among the most difficult issues to diagnose. If a single Kafka partition ceases to produce events, its watermark remains at negative infinity, which blocks the global watermark for the entire job. The remedy is straightforward:

WatermarkStrategy<Transaction> strategy = WatermarkStrategy
    .<Transaction>forBoundedOutOfOrderness(Duration.ofSeconds(5))
    .withTimestampAssigner((tx, ts) -> tx.getTimestamp())
    .withIdleness(Duration.ofMinutes(1));  // Mark source as idle after 1 min

Debugging Pattern Matches

When patterns do not match as expected, a pass-through select can be inserted before the CEP operator in order to verify that events are flowing and correctly keyed:

// Debug: print events as they enter the CEP operator
transactions
    .map(tx -> {
        System.out.println("CEP INPUT: " + tx);
        return tx;
    })
    .keyBy(Transaction::getUserId);

// Also: check that your conditions actually match
// by testing them in a unit test
@Test
public void testFraudCondition() {
    Transaction tx = new Transaction("1", "user1", 600.0,
        System.currentTimeMillis(), "NYC", "electronics", "1234");
    assertTrue(tx.getAmount() > 500.0);  // Verify condition logic
}

Final Thoughts

Complex Event Processing with Apache Flink supports the detection of sophisticated patterns across millions of events per second with millisecond latency and exactly-once guarantees. The present guide has covered considerable ground, from the fundamentals of CEP and the Flink pattern API to three complete, production-style pipelines for fraud detection, IoT monitoring, and financial market analysis.

The principal lessons may be summarised as follows:

Select the appropriate contiguity: next() for strict sequences, followedBy() for relaxed matching, and followedByAny() sparingly, given its computational cost.
Always use event time with appropriate watermark strategies. Processing time produces incorrect pattern matches in any real-world system where events arrive out of order.
Key the streams: CEP patterns should almost always be applied to keyed streams so that matches remain scoped to a logical entity such as a user, sensor, or stock symbol.
Handle timeouts: Implementing TimedOutPartialMatchHandler allows partial matches that do not complete within the time window to be captured and analysed.
Monitor state size: CEP is inherently stateful. RocksDB is recommended for large state, time windows should remain as short as possible, and combinatorial explosion in non-deterministic patterns should be monitored.
Start simple and iterate: An initial implementation should begin with a single pattern on a small data sample, verified for correctness before complexity or scale are increased.

Flink’s CEP library is among the most capable pattern-matching engines in the open-source ecosystem. The patterns and techniques presented here provide the foundation required to build a first production CEP pipeline. For reproducible deployment of Flink applications, containerisation with Docker simplifies both local development and production rollout. The fraud detection example offers a suitable starting point that can be adapted to the target domain and scaled accordingly.

References

Apache Flink CEP Documentation (v1.18)
Flink Event Time and Watermarks Guide
Flink Kafka Connector Documentation
Flink State Backends Reference
Flink Docker Deployment Guide
Stream Processing with Apache Flink (O’Reilly) by Fabian Hueske and Vasiliki Kalavri
Apache Flink: Stream and Batch Processing in a Single Engine—original Flink research paper

April 5, 2026

The U.S. Interest Rate Cut Outlook in 2026: What It Means for the Stock Market

Disclaimer: This article is for informational purposes only and does not constitute investment advice. Always consult a qualified financial advisor before making investment decisions. Past performance is not indicative of future results.

Summary

What this post covers: A 2026 outlook for U.S. interest rates and equity markets, covering where the Fed stands after recent cuts, the case for more cuts versus a pause, scenario probabilities, historical patterns from past cycles, sector implications, and concrete portfolio strategies.

Key insights:

The base case is 2-3 additional cuts in 2026 taking the federal funds rate from 4.00-4.25% to roughly 3.25-3.75%, broadly bullish for rate-sensitive equities, but the path will not be smooth and consensus positioning itself is now a risk factor.
Historical analysis distinguishes “insurance cuts” (gentle easing into a soft economy, bullish for stocks) from “emergency cuts” (aggressive easing during recession, bearish until the bottom); current conditions resemble the former, which is why equities have rallied.
Small caps, REITs, and long-duration bonds are the most leveraged plays on falling rates because they were the most punished during the 2022-2024 hiking cycle and have the cheapest relative valuations.
Markets price rate cuts in advance: by the time the Fed actually moves, much of the equity response is already done, so positioning ahead of consensus matters more than reacting to FOMC statements.
Sticky services inflation, tariff-driven price shocks, large deficits, and geopolitical risks could all force the Fed to hold or even reverse, so diversification across rate-cut and rate-hold scenarios is essential rather than concentrating on the consensus path.

Main topics: Introduction, The Federal Reserve’s Current Position, The Case for Further Easing, The Case for a Pause or Hold, Rate Cut Scenarios and Timeline for 2026, Historical Patterns of Rate Cuts and Equity Returns, Sector-by-Sector Analysis, Investment Strategies for a Rate-Cutting Environment, Risks and What Could Disrupt the Thesis, Conclusion, References.

Introduction

In March 2020, the Federal Reserve reduced interest rates to near zero within a matter of weeks. Two years later, it reversed course and initiated the most aggressive tightening cycle in four decades. By late 2024, the policy stance shifted once again, as the Fed began cutting rates for the first time since the pandemic emergency. In early 2026, investors confront a question that arguably dominates global markets: how far, and at what pace, will the Federal Reserve continue to ease policy?

The question carries material consequences. The answer will influence whether a diversified portfolio appreciates by 20 percent or declines by 15 percent over the year. It will shape whether technology equities advance to new highs or correct under the weight of elevated valuations. It will determine whether the housing market gradually thaws or remains frozen. The answer will also influence whether the United States achieves the rare soft landing that many market participants anticipate, or instead enters a recession that the consensus has failed to forecast.

The federal funds rate, presently within the 4.00 to 4.25 percent range after a sequence of cuts during late 2024 and 2025, remains well above the levels that investors became accustomed to during the 2010s. The era of near-zero rates that fueled the post-2008 bull market now appears distant. Nevertheless, the direction of policy is more consequential than the destination. Markets do not wait for the Fed to complete its easing cycle; they move in anticipation. Investors who position their portfolios in advance of expected policy shifts are typically rewarded for the foresight.

The following analysis examines the Fed’s current stance, evaluates the arguments for and against further cuts, outlines the most plausible scenarios for 2026, reviews how past rate-cutting cycles have unfolded in the equity market, identifies the sectors most likely to benefit and the sectors most likely to face headwinds, and proposes specific portfolio strategies. The objective is to provide both experienced investors and newer participants with a framework for navigating the coming twelve months.

The Federal Reserve’s Current Position

To understand where interest rates are heading, it is first necessary to understand the trajectory by which they arrived at present levels. The Federal Reserve’s experience over the past four years has been notably volatile, moving from emergency stimulus to aggressive tightening and now toward gradual easing.

The Rate Cycle: From Zero to 5.50 Percent and Back

The current cycle began in March 2022, when the Fed raised rates from the zero lower bound for the first time since the COVID-19 crisis. What followed was the fastest tightening cycle since the early 1980s under Paul Volcker. Over a span of 16 months, the federal funds rate rose from 0.00 to 0.25 percent to 5.25 to 5.50 percent, a cumulative increase of more than 500 basis points that produced substantial repricing across virtually every asset class.

Date	Action	Federal Funds Rate	Change (bps)
Mar 2022	First hike	0.25–0.50%	+25
Jun 2022	Jumbo hike	1.50–1.75%	+75
Nov 2022	Fourth 75bp hike	3.75–4.00%	+75
Feb 2023	Pace slows	4.50–4.75%	+25
Jul 2023	Final hike	5.25–5.50%	+25
Sep 2024	First cut	4.75–5.00%	-50
Nov 2024	Second cut	4.50–4.75%	-25
Dec 2024	Third cut	4.25–4.50%	-25
Q1 2025	Pause / gradual cuts	4.00–4.25%	-25 to -50
Early 2026	Current level	~4.00–4.25%	,

The September 2024 cut was notable, as the Fed opened with a 50-basis-point reduction, signalling confidence that inflation was approaching the target. Subsequent cuts have been more measured at 25 basis points each, reflecting a central bank that prefers gradual easing rather than a rapid return to an accommodative stance.

The Dual Mandate: Inflation and Employment

Every Federal Reserve decision is interpreted through its dual mandate of maximum employment and price stability, the latter of which is defined as 2 percent annual inflation. For most of the tightening cycle, inflation was the dominant concern. The Consumer Price Index peaked at 9.1 percent in June 2022, the highest reading in more than 40 years, leaving the Fed with little alternative to aggressive action.

The inflation picture in early 2026 is considerably different. Headline CPI has fallen to the 2.5 to 3.0 percent range. The Fed’s preferred measure, the Personal Consumption Expenditures (PCE) price index, is near 2.4 to 2.7 percent. Core PCE, which excludes volatile food and energy prices, remains somewhat persistent in the 2.6 to 2.8 percent range. The progress is substantial, but the convergence to the 2 percent target is not yet complete.

On the employment side, the labour market has shown notable resilience. The unemployment rate is near 4.1 to 4.2 percent, elevated from the 3.4 percent lows of early 2023 but still healthy by historical standards. Nonfarm payrolls continue to expand, though the pace has slowed from the monthly gains in excess of 300,000 observed during 2022 and 2023 to a more sustainable range of 150,000 to 200,000. Wage growth has moderated to approximately 3.5 to 4.0 percent year-over-year, down from readings above 5 percent that previously concerned the Fed.

Key Takeaway: The Fed has made significant progress on inflation, but the final stage of bringing the rate from approximately 2.5 percent down to the 2.0 percent target is proving the most difficult. The labour market is cooling gradually rather than contracting sharply. This combination provides the Fed with the latitude to proceed patiently.

The Case for Further Easing

Despite the cautious tone of FOMC communications, there are substantive reasons to expect the Fed to continue cutting rates throughout 2026. The economic data, while mixed, increasingly supports the case for additional easing.

Inflation Continues to Decelerate

The disinflationary trend that began in mid-2023 has persisted, although at a slower pace. The principal components of inflation provide an encouraging picture. Goods prices have been outright deflationary for several months, depressed by normalising supply chains, declining used car prices, and weak global demand. Food inflation has receded significantly from the peaks observed in 2022. Energy prices remain volatile but are not contributing to sustained upward pressure.

The shelter component, which accounts for approximately one-third of CPI, is the most important variable. Shelter inflation, which lags the actual housing market by 12 to 18 months, has been declining gradually as the surge in rents observed during 2021 and 2022 works its way through the data. Most economists expect this deceleration to continue through 2026, which could move headline inflation meaningfully closer to the 2 percent target.

Gradual Cooling of the Labour Market

Although the unemployment rate has not increased sharply, the labour market is clearly softer than it was a year ago. Job openings, as measured by the JOLTS survey, have declined from a peak above 12 million to roughly 7.5 to 8.0 million. The quits rate, a measure of worker confidence, has normalised. Temporary staffing, often a leading indicator of broader labour trends, has been declining for more than a year.

These are the kinds of signals that increase the Fed’s comfort with rate cuts. The labour market is rebalancing without rupture. Employers are slowing hiring rather than conducting widespread layoffs. This is the soft-landing scenario in practice, and it supports the case for continuing to reduce the restrictiveness of monetary policy.

Manufacturing Weakness and Global Headwinds

The ISM Manufacturing PMI has spent more months below 50 (the contraction threshold) than above it over the past two years. While the services sector has shown greater resilience, even services PMI readings have decelerated. New orders, a forward-looking component, have been particularly weak.

Globally, the picture is more concerning. China’s economy continues to contend with a property-sector downturn, weak consumer confidence, and deflationary pressures. Europe remains close to stagnation, with Germany, the continent’s industrial engine, in or near recession. Japan, despite its own monetary policy normalisation, faces structural headwinds. These global cross-currents argue for lower U.S. rates to prevent the dollar from strengthening excessively and to support an economy that cannot fully decouple from the rest of the world.

Real Interest Rates Remain Restrictive

Arguably the most compelling argument for further cuts rests on the concept of real interest rates, defined as the nominal rate minus inflation. With the federal funds rate at 4.00 to 4.25 percent and inflation near 2.5 to 2.7 percent, the real rate is approximately 1.5 percent. The Fed estimates the neutral real rate, the rate at which monetary policy neither stimulates nor restricts the economy, at roughly 0.5 to 1.0 percent. Monetary policy therefore remains meaningfully restrictive and continues to dampen economic activity even at current levels.

Tip: When Fed officials refer to “moving toward neutral,” they are acknowledging that rates may need to fall by another 100 to 150 basis points to reach a level that is neither restrictive nor accommodative. This is the fundamental reason that the cutting cycle is likely to continue.

Yield Curve Normalisation

The Treasury yield curve was inverted for the longest period on record, with the 2-year yield exceeding the 10-year yield for more than two years. The curve has begun to normalise as the Fed cuts short-term rates, but the process is incomplete. Further cuts would help to fully normalise the curve, improving credit conditions for banks and reducing the recessionary signal that has concerned economists.

The Case for a Pause or Hold

For each argument in favour of further cuts, a credible counterargument exists. The Fed confronts genuine risks from moving too quickly, and several factors could prompt it to pause or even halt the cutting cycle.

Persistent Services Inflation

While goods prices have cooperated, services inflation has proven persistent. Shelter costs are declining, but only slowly. Healthcare costs have reaccelerated, driven by rising insurance premiums, hospital costs, and pharmaceutical prices. Auto insurance remains elevated, reflecting the higher replacement costs of modern vehicles. Financial services inflation has also increased.

The “supercore” measure, defined as core services excluding housing, which Fed Chair Powell has highlighted as a key indicator, remains persistently above 3 percent. Until this measure shows convincing progress toward 2 percent, the Fed has legitimate grounds for proceeding cautiously. Cutting too aggressively while services inflation remains elevated risks unanchoring inflation expectations, which would be substantially more damaging over the long run than keeping rates higher for additional months.

Tariff-Driven Inflation Pressures

The ongoing U.S.-China trade dispute and the broader tariff regime add a distinctive consideration to the Fed’s calculus. Tariffs imposed in 2025 on Chinese goods, along with reciprocal tariffs from other trading partners, function as a tax on imported goods. The first-round effects of tariffs are technically a one-time adjustment to the price level rather than ongoing inflation, but they can feed into inflation expectations and produce second-round effects if businesses pass costs to consumers and workers demand higher wages in response.

Fed officials have repeatedly stated that they will “look through” one-time tariff effects, but the practical reality is more nuanced. If tariffs broaden and intensify, which remains a plausible outcome given the current geopolitical environment, they could add 0.3 to 0.5 percentage points to core inflation, meaningfully complicating the Fed’s path toward the 2 percent target.

Caution: Tariffs represent a genuine source of uncertainty for 2026 monetary policy. An escalation in trade tensions could simultaneously slow economic growth (arguing for cuts) and boost inflation (arguing against cuts). This stagflationary configuration is particularly difficult for the Fed, and there is no straightforward policy response.

Continued Labour Market Resilience

Despite the cooling trend, the labour market has consistently surprised to the upside throughout this cycle. On each occasion economists predicted a sharp deterioration, the jobs data exceeded expectations. If this pattern persists, with unemployment remaining below 4.5 percent and payroll growth holding steady, the Fed will face less urgency to cut. A strong labour market suggests, by definition, that current rates are not excessively restrictive.

Asset Price Inflation and Financial Conditions

The S&P 500 is near all-time highs. Bitcoin has appreciated significantly. Home prices, despite elevated mortgage rates, have held firm in most markets. Corporate credit spreads are tight. Financial conditions are therefore loose by historical standards, even before additional rate cuts. The Fed risks fuelling a more substantial asset bubble if it cuts too aggressively while markets are already exuberant.

The concern is not abstract. The wealth effect from rising stock and home prices supports consumer spending, which in turn supports services inflation. The Fed must weigh the stimulus provided by rate cuts against the stimulus that already exists from buoyant asset markets.

Lessons from the 1970s

Federal Reserve officials are students of history, and the 1970s feature prominently in their collective memory. During that decade, the Fed cut rates prematurely on multiple occasions, believing that inflation was under control. Each time, inflation returned more strongly, ultimately requiring the severe Volcker rate hikes of 1979 to 1982 that drove unemployment above 10 percent and produced two recessions.

The lesson is clear: it is preferable to err on the side of keeping rates higher for longer than to cut too early and allow inflation to re-entrench. Fed Chair Powell has explicitly referenced this history, and it appears to influence the FOMC’s bias toward patience.

The Fed Dot Plot and FOMC Signals

The most recent Summary of Economic Projections (the “dot plot”) indicates that FOMC members anticipate a median federal funds rate of 3.50 to 3.75 percent by the end of 2026, implying approximately two to three additional cuts from current levels. However, the dots are widely dispersed; some members project rates as low as 3.00 percent, while others project rates above 4.00 percent. This disagreement reflects genuine uncertainty about the economic outlook and should caution investors against assuming any specific outcome.

Rate Cut Scenarios and Timeline for 2026

Given the cross-currents described above, the following analysis outlines three plausible scenarios for how the Fed’s rate-cutting cycle may unfold in 2026. Each scenario carries distinct implications for portfolio allocation.

Scenario 1: Aggressive Cuts (4 to 6 Cuts in 2026)

Probability: 15 to 20 percent.

In this scenario, the economy weakens more than expected. A recession, perhaps triggered by a consumer spending pullback, a credit event, or an escalation of trade tensions, forces the Fed’s hand. The unemployment rate rises above 5 percent, corporate earnings decline, and the Fed responds with cuts of 25 basis points at nearly every meeting, potentially including one or more 50-basis-point reductions.

The federal funds rate would end 2026 within the 2.50 to 3.00 percent range. This scenario would initially be painful for equities, as recession fears would drive a significant correction. The aggressive monetary response would, however, set the stage for a recovery, particularly in rate-sensitive sectors.

Triggers to monitor: Unemployment rising above 4.5 percent, negative GDP prints, widening credit spreads, and a significant increase in initial jobless claims above 300,000.

Scenario 2: Gradual Cuts (2 to 3 Cuts in 2026)

Probability: 55 to 60 percent.

This is the base case and the scenario most consistent with current Fed guidance and incoming economic data. Inflation continues its slow descent toward 2 percent, the labour market cools gradually, and GDP growth remains positive but below trend at 1.5 to 2.0 percent. The Fed cuts once or twice in the first half of the year, pauses to assess, and potentially delivers one additional cut in the autumn.

The federal funds rate would end 2026 within the 3.25 to 3.75 percent range. This is the soft-landing scenario that markets have been pricing, and it is broadly supportive of equities, particularly growth and quality names. It represents the continuation of the current benign environment.

Triggers to monitor: Core PCE declining below 2.5 percent, stable unemployment within the 4.0 to 4.3 percent range, and GDP growth between 1.5 and 2.5 percent.

Scenario 3: Extended Pause or Reversal

Probability: 20 to 25 percent.

In this scenario, inflation proves more persistent than expected, perhaps due to tariff escalation, a commodity price shock, or a reacceleration in wage growth. The Fed pauses its cutting cycle and holds rates at 4.00 to 4.25 percent for most or all of 2026. In an extreme case, a resurgence of inflation could compel the Fed to consider additional hikes, although this outcome remains a tail risk.

This scenario would be negative for rate-sensitive sectors such as REITs, utilities, and small caps, as well as for long-duration bonds. Growth stocks could also struggle if higher-for-longer rates produce valuation compression. Value and quality stocks would likely outperform in this environment.

Triggers to monitor: Core PCE reaccelerating above 3 percent, wage growth above 4.5 percent, a significant escalation in tariffs, or oil prices above $100 per barrel.

Scenario	Probability	Total Cuts	Year-End Rate	Stock Market Impact
Aggressive	15-20%	4-6 cuts (100-150 bps)	2.50–3.00%	Short-term bearish, then rally
Gradual (Base Case)	55-60%	2-3 cuts (50-75 bps)	3.25–3.75%	Moderately bullish
Pause / Reversal	20-25%	0-1 cuts (0-25 bps)	4.00–4.25%	Bearish for growth/rate-sensitive

CME FedWatch and Market Pricing

The CME FedWatch tool, which derives rate expectations from federal funds futures contracts, currently prices in approximately two to three cuts for 2026, closely aligned with the base case described above. It is important to recognise, however, that market pricing can shift considerably on the basis of a single data release. A higher-than-expected CPI print can remove a previously expected cut within hours, while a weak employment report can add two cuts to the implied path overnight. The FedWatch tool reflects a snapshot of market expectations rather than a forecast.

Investors should not follow market pricing uncritically. It is more useful as a measure of consensus expectations and as a reference point for identifying opportunities where an independent assessment diverges from the prevailing view.

Historical Patterns of Rate Cuts and Equity Returns

History does not repeat itself, but it tends to rhyme. Examining past rate-cutting cycles provides valuable context for the likely path in 2026 and highlights an important distinction that many investors overlook.

S&P 500 Performance During Past Rate-Cutting Cycles

Cutting Cycle	First Cut Date	Context	S&P 500—6 Months	S&P 500—12 Months	S&P 500,24 Months
1995 “Insurance”	Jul 1995	Soft landing	+12.3%	+22.4%	+46.0%
2001 Recession	Jan 2001	Dot-com bust	-7.2%	-15.6%	-22.1%
2007 Recession	Sep 2007	Financial crisis	-12.8%	-20.7%	-30.5%
2019 “Insurance”	Jul 2019	Mid-cycle adjustment	+8.5%	+16.3%*	N/A (COVID)
2024 Current	Sep 2024	Soft landing?	+7-10%	In progress	TBD

*2019 12-month return excludes COVID crash. Returns are approximate and measured from the date of the first cut.

The Important Distinction Between Insurance Cuts and Emergency Cuts

The most important lesson from the historical record is one that many investors overlook: not all rate cuts are equivalent. The context in which cuts occur is determinative.

Insurance cuts, sometimes referred to as mid-cycle adjustments, occur when the economy is still growing but the Fed wishes to provide a cushion against a potential slowdown. The 1995 and 2019 cycles are textbook examples. In both cases, the economy avoided recession, and equities rallied strongly in the 12 to 24 months following the first cut.

Emergency cuts occur when the economy is already in or entering a recession. The 2001 and 2007 cycles serve as cautionary illustrations. In both cases, rate cuts could not prevent a significant decline in equity markets because the underlying economic damage was too severe. The Fed was cutting rates into a worsening crisis, and equities fell despite the monetary stimulus.

Key Takeaway: The question is not simply whether the Fed will cut rates, but why it is cutting them. If the cuts are insurance in a growing economy, equities tend to rally. If the cuts represent an emergency response to recession, further downside is likely before any recovery emerges. The current cycle most closely resembles the 1995 and 2019 insurance scenarios, which is bullish, but continued vigilance is warranted.

Average Returns Following Rate Cuts

Averaging across all rate-cutting cycles since 1980, including both insurance and recession cuts, the S&P 500 has delivered the following returns:

6 months after the first cut: +2.5 percent, with wide dispersion.
12 months after the first cut: +7.8 percent, with wide dispersion.
24 months after the first cut: +14.2 percent, skewed by strong insurance-cut cycles.

When the sample is restricted to soft-landing or insurance-cut cycles, the returns increase substantially: approximately +11 percent at 6 months, +20 percent at 12 months, and more than +35 percent at 24 months. If the economy avoids recession, the historical precedent argues strongly for equity outperformance in 2026.

Sector-by-Sector Analysis

Rate cuts do not affect all sectors uniformly. Some sectors benefit substantially, while others face genuine headwinds. Understanding these dynamics is essential for portfolio positioning.

Technology and Growth Stocks

Growth stocks are the clearest beneficiaries of lower interest rates. The reason is mathematical: the value of a growth stock depends heavily on its future cash flows, which are discounted to the present using prevailing interest rates. Lower rates imply a lower discount rate, which increases the present value of those future cash flows. This is why technology stocks were under pressure during the 2022 tightening cycle and rebounded during the 2024 cuts.

Companies such as NVIDIA (NVDA), Apple (AAPL), Microsoft (MSFT), Alphabet (GOOGL), and Amazon (AMZN) are positioned to benefit. The AI infrastructure buildout, still in its early stages, provides a substantial secular growth tailwind that rate cuts would amplify. A lower cost of capital also makes it easier for technology companies to fund research and development, acquisitions, and share buybacks.

Risk: Technology valuations are already elevated. The Nasdaq trades at high forward price-to-earnings multiples, and a portion of the expected rate-cut benefit may already be priced. Any disappointment on the rate front could trigger a sharp correction.

The Financial Sector

Banks and financial companies have a complicated relationship with interest rates. On one hand, falling rates compress net interest margins (NIMs), defined as the spread between what banks earn on loans and what they pay on deposits. This is a direct headwind for the most important revenue line at traditional banks such as JPMorgan Chase (JPM), Bank of America (BAC), and Wells Fargo (WFC).

On the other hand, lower rates stimulate loan demand, drive mortgage refinancing activity, and improve credit quality by reducing the burden on borrowers. Investment banking activity (mergers, acquisitions, and IPOs) also tends to recover in a lower-rate environment, benefiting firms such as Goldman Sachs (GS) and Morgan Stanley (MS).

On balance, financials tend to register a mixed initial reaction to rate cuts, followed by positive performance if the economy remains healthy. The key variable is credit losses; if rate cuts are accompanied by rising defaults, banks will suffer despite the lower rates.

Real Estate and REITs

Real Estate Investment Trusts (REITs) are among the most direct beneficiaries of rate cuts. REITs are capital-intensive businesses that rely substantially on debt financing. Lower rates reduce their borrowing costs, support property valuations, and make their dividend yields more attractive relative to bonds.

The Vanguard Real Estate ETF (VNQ), Realty Income (O), and American Tower (AMT) are all positioned to benefit. In addition, lower mortgage rates could thaw the frozen housing market, benefiting homebuilders such as D.R. Horton (DHI) and Lennar (LEN).

Utilities

Utilities are classic bond proxies, purchased primarily for their stable dividends. When interest rates fall, utility stocks become more attractive because their yields compare more favourably to declining Treasury yields. The Utilities Select Sector SPDR (XLU), NextEra Energy (NEE), and Southern Company (SO) typically outperform during rate-cutting cycles.

An additional consideration in 2026 is the AI data centre buildout, which is driving substantial electricity demand growth. Utilities serving data centre markets could benefit simultaneously from rate-cut tailwinds and secular demand growth.

Consumer Discretionary

Lower rates reduce the cost of auto loans, credit card debt, and home equity lines of credit. The effect is to free disposable income and to encourage spending on large-ticket items. Companies such as Amazon (AMZN), Home Depot (HD), and Tesla (TSLA) benefit from this dynamic. The housing-related consumer discretionary subsector, including appliances, furniture, and home improvement, is particularly rate-sensitive.

Small Caps and the Largest Opportunity

Small-cap stocks, represented by the Russell 2000 and tracked by the iShares Russell 2000 ETF (IWM), may offer the most compelling opportunity in a rate-cutting environment. Small caps have underperformed large caps significantly since 2022, in part because smaller companies rely more heavily on floating-rate debt, making them acutely sensitive to interest rate increases.

The Russell 2000’s valuation discount to the S&P 500 has widened to near-historic levels. If rates decline, small caps receive a double benefit: lower borrowing costs directly support profitability, and the valuation gap provides room for re-rating. Historically, small caps have outperformed large caps by 5 to 10 percentage points in the 12 months following the start of a rate-cutting cycle in non-recession scenarios.

Bonds and Fixed Income

Although this article focuses on equities, any discussion of rate cuts requires attention to bonds. When rates fall, bond prices rise, since they move inversely. Long-duration Treasuries, held in instruments such as the iShares 20+ Year Treasury Bond ETF (TLT) and the PIMCO 25+ Year Zero Coupon US Treasury Index ETF (ZROZ), stand to gain the most. A 100-basis-point decline in long-term rates could generate capital gains in excess of 15 to 20 percent for TLT holders.

Sector	Rate Cut Impact	Key Mechanism	Top Picks	Expected Benefit
Tech / Growth	Strongly Positive	Lower discount rate boosts valuations	NVDA, AAPL, MSFT, GOOGL	High
Financials	Mixed	Margin compression vs. loan demand	JPM, GS, MS	Moderate
REITs	Strongly Positive	Lower borrowing costs, yield appeal	VNQ, O, AMT, DHI	High
Utilities	Positive	Bond proxy, dividend yield appeal	XLU, NEE, SO	Moderate-High
Consumer Disc.	Positive	Lower borrowing costs, more spending	AMZN, HD, TSLA	Moderate
Small Caps	Strongly Positive	Floating-rate debt relief, valuation gap	IWM, Russell 2000	Very High
Long-Duration Bonds	Strongly Positive	Price appreciation as yields fall	TLT, ZROZ, IEF	High

Investment Strategies for a Rate-Cutting Environment

Understanding the macroeconomic backdrop matters, but the more important task is translating that understanding into actionable portfolio decisions. The following seven strategies are worth consideration for 2026, together with specific implementation ideas.

Strategy 1: A Tilt Toward Growth Over Value

In a falling-rate environment, growth stocks tend to outperform value stocks. This is not merely a theoretical proposition; the historical data are clear. Across the past five rate-cutting cycles, growth has outperformed value by an average of 8 percentage points in the 12 months following the first cut, excluding recession cycles.

The Vanguard Growth ETF (VUG) and the Invesco QQQ Trust (QQQ) provide broad growth exposure. For more concentrated exposure to the AI theme, the VanEck Semiconductor ETF (SMH) and individual names such as NVIDIA, AMD, and Broadcom are worth examining.

Strategy 2: Adding Small Cap Exposure

As discussed in the sector analysis, small caps are the most rate-sensitive segment of the equity market. The Russell 2000 has underperformed the S&P 500 by a historic margin over the past three years. Rate cuts could provide the catalyst that closes this gap.

The iShares Russell 2000 ETF (IWM) is the most liquid vehicle for this theme. For a quality-screened approach, the iShares Russell 2000 Value ETF (IWN) and the Avantis U.S. Small Cap Value ETF (AVUV) filter for smaller companies with stronger fundamentals.

Strategy 3: Increasing REIT Allocation

REITs have been pressured by elevated rates. Many quality REITs trade at significant discounts to their net asset values (NAVs) and historical valuations. Rate cuts provide a clear catalyst for re-rating. An allocation of 5 to 10 percent of the portfolio to REITs through VNQ or specific names such as Realty Income (O), Prologis (PLD), or Digital Realty Trust (DLR) is worth considering. DLR may benefit from both rate cuts and AI-driven data centre demand.

Strategy 4: Extending Bond Duration

For investors holding bonds, and most diversified portfolios should, the present moment is suitable for considering an extension of duration. Short-term bonds and money market funds have delivered attractive yields during the high-rate period, but their returns will decline as the Fed cuts. Shifting a portion of the fixed-income allocation into intermediate Treasuries (IEF, covering 7 to 10 years) or long-duration Treasuries (TLT, covering 20 years and beyond) positions a portfolio to capture capital gains as rates fall.

Caution: Long-duration bonds can deliver substantial returns if rates fall, but they carry meaningful downside as well. If inflation surprises to the upside and rate cuts are delayed, TLT could decline by 10 to 15 percent quickly. The position should be sized appropriately and treated as a tactical trade rather than a core holding.

Strategy 5: Dividend Growth Stocks

As rates fall, the relative attractiveness of dividend-paying stocks increases. Investors who were comfortable earning more than 5 percent in money market funds will begin rotating back into dividend stocks as money market yields decline. The focus should be on dividend growth rather than simple high yield, as companies that consistently raise their dividends tend to outperform over time.

The Vanguard Dividend Appreciation ETF (VIG), the Schwab U.S. Dividend Equity ETF (SCHD), and individual names such as Johnson & Johnson (JNJ), Procter & Gamble (PG), and Microsoft (MSFT) offer compelling dividend growth profiles.

Strategy 6: International Diversification

U.S. rate cuts tend to weaken the dollar, which benefits international equities when translated back into USD terms. In addition, many international markets trade at significant valuation discounts to the United States. The Vanguard FTSE Developed Markets ETF (VEA) and iShares MSCI EAFE ETF (EFA) provide broad developed-market exposure. For more targeted exposure, the iShares MSCI Emerging Markets ETF (EEM) provides access to emerging markets, although the asset class carries higher risk.

Strategy 7: Maintaining Hedges

No investment strategy is complete without risk management. Even in a favourable rate-cutting environment, unexpected shocks can produce significant drawdowns. Maintaining 5 to 10 percent of the portfolio in cash or short-term Treasuries as dry powder is prudent. For more active hedging, put options on the S&P 500 (SPY puts) or a small allocation to gold (GLD), which tends to perform well when real rates are declining, may be considered.

Model Portfolio Allocations

Asset Class	Scenario 1: Aggressive Cuts	Scenario 2: Gradual Cuts (Base)	Scenario 3: Pause / Hold
U.S. Large Cap Growth	25%	30%	20%
U.S. Large Cap Value	10%	15%	25%
U.S. Small Caps	15%	10%	5%
REITs	10%	8%	3%
International Developed	10%	10%	10%
Long-Duration Bonds (TLT)	15%	10%	5%
Intermediate Bonds	5%	7%	12%
Gold / Commodities	5%	5%	5%
Cash / Short-Term Treasuries	5%	5%	15%

Tip: These model portfolios are starting points rather than prescriptions. The ideal allocation for any investor depends on age, risk tolerance, investment horizon, and personal financial circumstances. The important insight is that the direction of allocation shifts (toward growth, small caps, REITs, and duration) is consistent across scenarios, even where the magnitude differs.

Risks and What Could Disrupt the Thesis

No analysis is complete without an honest assessment of what could derail the constructive case. The following risks could materially alter the rate trajectory and equity market performance in 2026.

Inflation Reacceleration

The most direct threat to the rate-cutting thesis is a resurgence of inflation. If CPI or PCE begins trending back above 3.5 percent, the Fed would almost certainly pause all cuts and markets would reprice aggressively. The most likely catalysts for reacceleration include a commodity price spike (particularly in oil), an escalation in tariffs, or a reacceleration in wage growth driven by a tighter-than-expected labour market.

Geopolitical Shock

An oil price spike above $100 per barrel, triggered by an escalation in Middle East conflict, OPEC+ production cuts, or disruption to key shipping lanes, would be stagflationary. Oil at $120 or above would almost certainly push the economy toward recession while simultaneously boosting inflation, producing the most challenging environment for the Fed and for investors.

A Recession Deeper Than Expected

The soft-landing consensus may prove incorrect. If the lagged effects of more than 500 basis points of rate hikes prove more powerful than expected, the economy could enter recession. In that scenario, rate cuts would arrive faster (matching Scenario 1), but they would not prevent initial equity losses. Earnings would decline, defaults would rise, and the S&P 500 could fall 20 to 30 percent before monetary easing stabilises the situation.

Dollar Weakness and Capital Outflows

Aggressive rate cuts combined with large fiscal deficits could weaken the U.S. dollar significantly. While a weaker dollar supports U.S. exporters and international equities, an uncontrolled decline could trigger capital outflows, rising import prices, and a crisis of confidence. The dollar’s status as the global reserve currency provides a buffer, but the buffer is not unlimited.

A Disappointment in AI Monetisation

The AI investment cycle has driven a substantial portion of equity market gains since 2023. If AI monetisation disappoints, or if the substantial capital expenditures by major technology companies fail to generate proportional revenue, a correction in AI-adjacent stocks could pull the broader market lower. The risk is amplified because rate cuts tend to expand growth stock valuations further. An AI disappointment coinciding with the late stage of the rate-cutting cycle could produce a “buy the rumour, sell the news” dynamic.

Fiscal Policy Uncertainty

With the United States running historically large deficits during a period of full employment, fiscal policy represents an additional source of uncertainty. Potential policy changes, whether tax reform, spending cuts, or new fiscal stimulus, could alter the economic trajectory in ways that complicate the Fed’s task. Bond markets in particular may demand higher yields to absorb increasing Treasury issuance, potentially offsetting the effects of Fed rate cuts on long-term rates.

Caution: The most significant risk for many investors is not any single scenario but overconfidence in the consensus view. The consensus position (soft landing, gradual cuts, equities higher) is well-known and broadly held. When agreement is widespread, the probability of a consensus-breaking surprise increases. Maintaining appropriate diversification and avoiding excessive concentration in any single outcome is essential.

Conclusion

The U.S. interest rate outlook for 2026 presents a complex but ultimately navigable environment for investors. The base case of two to three additional cuts bringing the federal funds rate to the 3.25 to 3.75 percent range by year-end is supported by moderating inflation, a cooling but resilient labour market, and a Fed that has clearly signalled its intention to move toward neutral. This scenario is broadly positive for equities, particularly for rate-sensitive sectors such as technology, small caps, REITs, and long-duration bonds.

The path, however, will not be smooth. Persistent services inflation, tariff uncertainties, geopolitical risks, and the continued possibility of a recession all introduce genuine volatility risks. The distinction between insurance cuts and emergency cuts, a framework drawn from five decades of historical data, should guide expectations. The current cycle has the characteristics of an insurance cut, which is constructive, but continuous monitoring of economic data is essential.

The following are actionable conclusions:

Tilt growth over value while retaining a meaningful value allocation. Balance is preferable to concentration.
Add small cap exposure. The valuation gap to large caps is near historic levels, and rate cuts are the likely catalyst.
Increase REIT allocation. The sector has been pressured by elevated rates and is positioned for recovery.
Extend bond duration tactically. Capture capital gains from declining rates while sizing the position to reflect the associated risk.
Focus on dividend growth. As money market yields decline, quality dividend growers will attract capital.
Diversify internationally. A weakening dollar tends to support international returns.
Maintain risk management. Hold cash reserves and consider hedges. Overconfidence is the principal enemy.

The Federal Reserve’s rate decisions will continue to dominate financial headlines throughout 2026. It is worth remembering, however, that markets are forward-looking. By the time the Fed cuts rates, much of the move may already be reflected in prices. The appropriate time to position a portfolio is not after the announcement of a cut but ahead of it. Investors who understand the interplay between monetary policy, economic data, and market dynamics tend to be better positioned over a complete cycle.

Staying informed, staying diversified, and remaining disciplined are the priorities. The rate-cutting cycle can be supportive, provided that the associated risks are respected.

Disclaimer: This article is for informational purposes only and does not constitute investment advice. All investments carry risk, including the potential loss of principal. Past performance is not indicative of future results. The specific securities, ETFs, and scenarios discussed are for illustrative purposes only and should not be construed as recommendations to buy or sell any security. Always consult a qualified financial advisor before making investment decisions.

References

Federal Reserve, FOMC Statements and Meeting Minutes: federalreserve.gov
Bureau of Labor Statistics—Consumer Price Index: bls.gov/cpi
Bureau of Economic Analysis—Personal Consumption Expenditures Price Index: bea.gov
CME FedWatch Tool: cmegroup.com
Federal Reserve Economic Data (FRED): fred.stlouisfed.org
S&P Dow Jones Indices, Historical S&P 500 Data: spglobal.com
ISM Manufacturing and Services PMI: ismworld.org
Bureau of Labor Statistics—Employment Situation: bls.gov
Federal Reserve—Summary of Economic Projections (Dot Plot): federalreserve.gov

April 4, 2026

The Best AI Agents and Tools for Office Workers in 2026: A Complete Productivity Guide

Summary

What this post covers: A curated 2026 buyer’s guide to the AI agents and tools that produce a meaningful effect for office workers, organised by daily task category — chat assistants, email, writing, slides, spreadsheets, meetings, scheduling, project management, research, and code.

Key insights:

The average knowledge worker spends 58% of the workday on “work about work”—the McKinsey 2025 study shows well-chosen AI stacks reclaim 8–14 hours per week, while poorly matched stacks actually destroy productivity through context-switching and unreliable outputs.
Among general-purpose assistants, Claude leads on long-document analysis and nuanced reasoning, ChatGPT wins on the custom-GPT ecosystem and multimodal breadth, and Gemini is the only credible choice if your team lives inside Google Workspace.
The biggest ROI categories are meeting transcription (Otter, Fireflies), calendar/task automation (Reclaim, Motion), and email triage (Superhuman, Spark)—they save the most minutes per dollar because the underlying tasks are repetitive and high-frequency.
Enterprise rollouts fail when IT skips the privacy/security review—data residency, retention policies, and SOC 2 status matter more than feature checkboxes, and tools that train on customer data should be banned for anything touching legal, HR, or financial workflows.
The right strategy in 2026 is a small stack (one general assistant + 2–3 specialized agents) deployed to a pilot team first, with measurable time-saved targets, before any company-wide license commitment.

Main topics: Introduction: The AI-Powered Office Is Already Here, AI Assistants and Chatbots: Your New Digital Coworkers, AI for Email and Communication, AI for Documents and Writing, AI for Presentations, AI for Spreadsheets and Data Analysis, AI for Meetings and Scheduling, AI for Project Management, AI for Research and Knowledge Management, AI Coding Assistants for Technical Office Workers, Master Comparison Table, Implementation Strategy: Rolling AI Out to Your Team, ROI Analysis: How Much Time Can You Actually Save, Privacy and Security Considerations for Enterprise, Future Outlook: Where AI Office Tools Are Heading.

Introduction: The Current State of AI in the Office

This post examines which AI tools deliver meaningful productivity gains for office workers in 2026, organised by the daily task categories that consume the most time. Recent research indicates that the average office worker now spends 58% of the workday on “work about work” — status updates, email triage, information search, document formatting, and meeting scheduling. That amounts to nearly five hours every day expended on activity that produces no original thinking. In 2026, the situation is no longer immovable; it is a matter of deliberate choice.

Over the past eighteen months, AI tools for office productivity have moved from novelty to necessity. What was once a single chatbot window opened to rephrase an awkward paragraph has matured into a full ecosystem of AI agents — autonomous systems that draft emails, summarise meetings, build slide decks, analyse spreadsheets, and manage project boards while the user concentrates on substantive work. The transition is not pending; it has already occurred, and the gap between teams that have adopted these tools and those that have not is widening each quarter.

An important caveat applies: there are now hundreds of AI productivity tools on the market, ranging from genuinely transformative to thinly disguised autocompletion wrapped in a subscription fee. Choosing the wrong stack wastes money and, more importantly, wastes the time the tools were meant to save. A McKinsey study published in late 2025 estimated that knowledge workers using well-chosen AI tools reclaim between 8 and 14 hours per week, while those who adopt poorly matched tools lose productivity through context-switching overhead and unreliable outputs.

This guide cuts through the noise. It tests, compares, and categorises the best AI agents and tools available to office workers in 2026, organised by the tasks performed every day. Whether the reader is an executive assistant managing a CEO’s calendar, a marketing manager writing campaign briefs, a financial analyst processing quarterly data, or a developer shipping code alongside non-technical teammates, the guide provides a clear, actionable toolkit and a strategy for deploying it without overburdening the IT department.

The discussion follows.

AI Assistants and Chatbots: The General-Purpose Layer

The general-purpose AI assistant is the foundation of any AI-powered office workflow. It functions as the multi-purpose tool that is reached for before any specialised one. In 2026, four major platforms dominate this space, each with distinct strengths.

Claude (Anthropic)

Anthropic’s Claude has rapidly become the preferred assistant for professionals who require nuance, long-form reasoning, and reliability rather than novelty. The Claude family now includes three distinct products that serve different office needs.

Claude.ai is the conversational interface most users encounter first. It excels at long-document analysis (it can process entire books or contract sets in a single conversation), nuanced writing, and careful reasoning through complex problems. Claude consistently outperforms competitors in its ability to follow detailed instructions without drifting, which makes it particularly valuable for legal review, policy analysis, and technical writing.

Claude Cowork represents Anthropic’s move into agentic office work. Rather than waiting for prompts, Cowork operates as a persistent collaborator that can browse the web, create and edit documents, build presentations, and work through multi-step tasks autonomously. For office workers, this constitutes a significant shift; an entire research brief or competitive analysis can be delegated, with the polished deliverable returned upon completion.

Claude Code is the developer-focused CLI tool, but it warrants mention here because technical office workers (data analysts, DevOps engineers, product managers who code) increasingly rely on it for scripting, automation, and building internal tools. It is covered in greater detail in the coding section below.

Pricing: Free tier available. Pro plan at $20/month. Team plan at $30/user/month with admin controls and higher usage limits.

Best for: Long-document analysis, careful reasoning, writing that requires nuance, agentic workflows via Cowork.

ChatGPT (OpenAI)

ChatGPT remains the most widely recognised AI assistant and holds the largest user base globally. The GPT-4o model delivers fast, capable responses across text, image, and audio inputs, and OpenAI has invested heavily in producing a seamless conversational experience.

The principal office-productivity advantage of ChatGPT is custom GPTs — specialised versions of the model that teams can build for specific workflows. A sales team might create a GPT trained on its product catalogue and objection-handling playbook. A finance team might build one that knows its reporting templates and can generate formatted quarterly summaries on demand. The GPT Store provides thousands of pre-built options, though quality varies significantly.

ChatGPT’s integration with DALL-E for image generation and its browsing capabilities make it particularly useful for marketing teams that need to ideate, write, and create visual assets in a single workflow.

Pricing: Free tier available. Plus at $20/month. Team at $30/user/month. Enterprise with custom pricing.

Best for: Broad versatility, custom GPTs for team workflows, multimodal tasks (text + image + audio), users who want the largest ecosystem of plugins and integrations.

Google Gemini

Google Gemini has one distinctive advantage: native integration with Google Workspace. If an organisation operates in Gmail, Google Docs, Sheets, Slides, and Meet, Gemini is not merely an AI assistant; it is an AI assistant that already has access to the organisation’s data, calendar, inbox, and files.

Gemini can summarise email threads in Gmail, draft responses in the user’s writing style, generate formulas in Sheets, create presentation outlines in Slides, and take notes during Google Meet calls. The “Help me write” and “Help me organize” features are integrated directly into the applications the team already uses, which dramatically reduces the adoption friction that undermines most AI rollouts.

Pricing: Included with Google Workspace Business plans (starting at $14/user/month). Gemini Advanced standalone at $20/month.

Best for: Teams already embedded in Google Workspace. Lowest friction to adoption. Strong at cross-app workflows within the Google ecosystem.

Microsoft Copilot

Microsoft Copilot is the AI layer across the entire Microsoft 365 suite — Word, Excel, PowerPoint, Outlook, Teams, and others. For enterprises running on Microsoft, Copilot is the most deeply integrated AI assistant available. It can draft documents in Word, build presentations in PowerPoint, analyse data in Excel, summarise Teams meetings, and triage the Outlook inbox — all without leaving the applications already in use.

Copilot’s enterprise data integration through Microsoft Graph permits the assistant to draw context from across the organisation’s files, emails, chats, and meetings to generate more relevant outputs. This capability is powerful but raises the security considerations discussed later in this guide.

Pricing: Copilot Pro at $20/user/month (requires Microsoft 365 subscription). Copilot for Microsoft 365 at $30/user/month for enterprise features.

Best for: Enterprises running Microsoft 365. Deep integration across Office apps. Organizations that need enterprise-grade security and compliance.

Key Takeaway: A team running Google Workspace should begin with Gemini. A team running Microsoft 365 should begin with Copilot. For the strongest standalone reasoning and writing, Claude is the appropriate choice. For the broadest ecosystem and custom GPTs, ChatGPT is the appropriate choice. Many power users maintain subscriptions to two of these tools.

AI for Email and Communication

Email remains the single largest time sink for most office workers, consuming an average of 2.5 hours per day. AI email tools do more than help users write faster; the best of them fundamentally change how an inbox is processed, prioritised, and answered.

Superhuman AI

Superhuman was already the fastest email client on the market before AI, and the addition of AI features has widened its lead for high-volume email users. Superhuman AI can draft complete replies that match the user’s writing tone (it learns from sent mail), summarise long threads instantly, and auto-triage the inbox by importance. The “Instant Reply” feature generates one-tap response options that become remarkably accurate after a few weeks of pattern learning.

Pricing: $30/month. Best for: Executives, salespeople, and anyone processing 100+ emails per day.

Spark Mail AI

Spark Mail offers a more affordable alternative with surprisingly capable AI features. Its “+AI” assistant can compose emails, adjust tone, fix grammar, and summarise threads. Spark’s team features — shared inboxes, email delegation, and collaborative drafting — combined with AI make it a strong choice for teams rather than individuals.

Pricing: Free for individuals. Premium at $8/user/month. Best for: Teams on a budget who want AI email features without paying Superhuman prices.

Gmail AI Features and Outlook Copilot

Both Gmail’s Gemini integration and Outlook’s Copilot now offer inline AI drafting, thread summarisation, and smart replies. The advantage is zero additional cost when Google Workspace or Microsoft 365 is already in use. The disadvantage is that these built-in features are generally less sophisticated than dedicated AI email tools; summarisation is solid, but drafting can feel generic compared with Superhuman’s learned tone matching.

Grammarly

Grammarly has evolved far beyond spell-checking. Its AI writing assistant now operates across email clients, offering tone detection, full message rewriting, and context-aware suggestions. The enterprise version learns the company’s style guide and brand voice, ensuring that every email leaving the organisation sounds consistent and professional.

Pricing: Free basic tier. Premium at $12/month. Business at $15/user/month. Best for: Teams where writing quality and brand consistency across all communications is critical.

Tip: The highest-ROI email AI configuration for most professionals is to use the platform’s built-in AI (Gmail or Outlook) for basic drafting and summarisation, then layer Grammarly on top for quality assurance. An upgrade to Superhuman is appropriate only for very high email volumes.

AI for Documents and Writing

Document creation is where AI delivers perhaps its most visible productivity gains. Activities that previously required hours — first drafts, formatting, research synthesis — can now be completed in minutes. The quality gap between tools is, however, significant.

Notion AI

Notion AI is tightly integrated into one of the most widely used workspace tools for modern teams. It can generate drafts, summarise pages, extract action items from meeting notes, translate content, and answer questions about the entire Notion workspace. Its principal advantage is contextual awareness: Notion AI can reference the team’s existing documentation, project notes, and knowledge base when generating new content, producing dramatically more relevant outputs than a standalone AI tool.

Pricing: Included in Notion plans starting at $10/user/month (AI add-on at $8/user/month for legacy plans). Best for: Teams already using Notion who want AI that understands their existing knowledge base.

Google Docs with Gemini

Google Docs’ “Help me write” feature, powered by Gemini, permits content to be generated, rewritten, and refined directly within the document. It can change tone, expand or shorten text, and generate content based on prompts. The integration is smooth and feels native, although it currently lacks the workspace-wide context awareness that Notion AI offers.

Pricing: Included with Google Workspace plans. Best for: Google Workspace teams who want AI writing without switching apps.

Microsoft Word Copilot

Word Copilot can draft documents from prompts, rewrite sections, summarise long documents, and — importantly for enterprise users — generate content that references information from across the Microsoft 365 environment. It can pull data from Excel files, reference email threads, and cite Teams conversations. For organisations with deep Microsoft integration, this cross-application awareness is particularly powerful.

Pricing: Requires Copilot for Microsoft 365 ($30/user/month). Best for: Enterprise teams in the Microsoft ecosystem who need cross-app document generation.

Jasper, Copy.ai, and Writesonic

These three platforms occupy the marketing-focused AI writing niche. Jasper ($49/month) leads for brand-aware content; it learns the brand voice, maintains style guides, and generates marketing copy that sounds consistent with the company rather than generic. Copy.ai ($49/month) has pivoted toward workflow automation, connecting AI writing to CRM and marketing tools. Writesonic ($16/month) offers the best value for teams that need high-volume content generation without heavy customisation.

Best for: Marketing teams that generate high volumes of blog posts, ad copy, social media content, and email campaigns.

Caution: AI-generated documents should always be reviewed by a human before distribution. Even the best tools occasionally produce subtle factual errors, awkward phrasing, or content that does not align with the organisation’s position. AI is appropriate for first drafts, not final drafts.

AI for Presentations

Among office tasks, building slide decks is one of the most uniformly disliked. AI presentation tools have made notable progress, although none have fully resolved the challenge of generating presentations that are both informative and well designed.

Gamma.app

Gamma has emerged as the leader in AI-native presentations. The user describes the desired output — a pitch deck, a project update, a training module — and Gamma generates a complete, visually polished presentation in seconds. The designs are modern and professional without the cookie-cutter feel of basic templates. Gamma also supports interactive elements such as embedded videos, live data, and clickable prototypes, making it more versatile than traditional slide tools.

Pricing: Free tier with watermark. Plus at $10/month. Business at $20/user/month. Best for: Quick, visually appealing presentations. Startups, consultants, and anyone who values design quality.

Beautiful.ai

Beautiful.ai takes a different approach: rather than generating content from scratch, it applies intelligent design rules to existing content as it is created. Each time text or data is added, the layout adjusts automatically to maintain visual balance and a professional appearance. The AI does not write the presentation; it ensures that the presentation looks coherent regardless of the input.

Pricing: Pro at $12/month. Team at $40/user/month. Best for: Teams that already have content but struggle with design consistency.

Microsoft PowerPoint Copilot

PowerPoint Copilot can generate entire presentations from a prompt or a Word document, apply an organisation’s branded templates, add speaker notes, and restructure existing decks. Its primary advantage is integration with the Microsoft ecosystem: it can pull charts from Excel, reference data from other documents, and adhere to the company’s slide master templates.

Pricing: Requires Copilot for Microsoft 365 ($30/user/month). Best for: Enterprise users who need presentations that match corporate branding and pull data from Microsoft 365 sources.

Claude Cowork for Presentations

Claude Cowork can build presentations through its agentic workspace, creating slide content with structured layouts, speaker notes, and supporting research. Although it does not match dedicated presentation tools for visual polish, its strength lies in the quality of the content — the strategic thinking, argument structure, and narrative flow that make presentations persuasive rather than merely attractive.

Pricing: Included with Claude Pro/Team subscriptions. Best for: Content-heavy presentations where the quality of the argument matters more than visual flair.

Tome

Tome pioneered AI-generated presentations and continues to offer a fast, AI-first experience. Its strength is speed; an idea can become a finished deck in under a minute. However, Tome’s designs can feel repetitive across presentations, and the customisation options are more limited than those of Gamma or Beautiful.ai.

Pricing: Free tier available. Professional at $16/month. Best for: Quick internal presentations where speed matters more than design uniqueness.

AI for Spreadsheets and Data Analysis

Data analysis is one of the domains where AI tools deliver the most dramatic time savings. Tasks that previously required advanced Excel skills or Python scripting are now accessible to anyone who can describe the desired result in plain English.

Microsoft Excel Copilot

Excel Copilot transforms the way users interact with spreadsheets. Requests such as “create a pivot table showing sales by region and quarter,” “highlight all rows where revenue declined more than 10%,” or “write a formula that calculates the rolling 30-day average” can be issued directly. The system generates formulas, creates charts, builds pivot tables, and applies conditional formatting — all from natural-language requests. For the many office workers who know what they want from a spreadsheet but cannot recall the VLOOKUP syntax, Copilot represents a genuine improvement in accessibility.

Pricing: Requires Copilot for Microsoft 365 ($30/user/month). Best for: Business users who work in Excel daily but are not spreadsheet power users.

Google Sheets AI

Google Sheets’ Gemini integration offers similar natural-language formula generation and data-organisation features. The “Help me organize” feature can structure messy data, create charts, and generate templates. Although slightly less feature-rich than Excel Copilot for complex data analysis, it is more than sufficient for most office data tasks and is included with Google Workspace.

Pricing: Included with Google Workspace. Best for: Google Workspace users who need quick data organization and formula help.

Julius AI

Julius AI is a standalone data-analysis platform that accepts spreadsheets, CSVs, databases, and even PDFs, then permits data to be analysed through natural-language conversation. It can generate visualisations, run statistical analyses, clean messy data, and export results. Julius is particularly strong for ad-hoc analysis — the scenarios in which a user needs to understand a dataset within ten minutes that arise constantly in office work.

Pricing: Free tier. Pro at $20/month. Teams at $35/user/month. Best for: Non-technical users who need to analyze data without learning Python or SQL.

Obviously AI

Obviously AI brings predictive analytics to non-data-scientists. A dataset is uploaded, the target variable is specified, and the platform builds and evaluates machine-learning models automatically. Sales teams use it to predict deal outcomes, marketing teams to forecast campaign performance, and operations teams to anticipate demand. Results are presented in plain English with confidence intervals.

Pricing: Starts at $75/month. Best for: Business teams that need predictive analytics without hiring data scientists.

Rows.com

Rows reimagines the spreadsheet as an AI-native tool. It combines traditional spreadsheet functionality with built-in AI analysis, data enrichment from external sources, and the ability to build interactive dashboards. The AI can be asked to analyse trends, summarise data, and generate insights — all within the spreadsheet interface.

Pricing: Free tier. Pro at $9/user/month. Best for: Teams that want a modern, AI-first spreadsheet alternative.

AI for Meetings and Scheduling

The average office worker attends 15.5 meetings per week. AI meeting tools address this problem from two angles: making the meetings actually attended more efficient, and eliminating those that are not required.

Otter.ai

Otter.ai is the most established AI meeting assistant. It joins Zoom, Google Meet, or Teams calls automatically, transcribes everything in real time, identifies speakers, and generates summaries with action items. The AI can answer questions about what was discussed (“What did Sarah say about the Q3 budget?”), and the new OtterPilot agent can participate in meetings on the user’s behalf, providing updates and answering questions based on briefing notes.

Pricing: Free tier (limited). Pro at $17/month. Business at $30/user/month. Best for: Teams that need comprehensive meeting records and actionable summaries.

Fireflies.ai

Fireflies offers similar transcription and summarisation capabilities with a focus on CRM integration. It automatically logs meeting notes and action items to Salesforce, HubSpot, and other CRMs, making it particularly valuable for sales and customer-success teams. Its AskFred AI chatbot allows querying across the user’s entire meeting history.

Pricing: Free tier. Pro at $18/month. Business at $29/user/month. Best for: Sales teams that need automated CRM updates from meetings.

Grain

Grain focuses on shareable meeting highlights rather than full transcriptions. It automatically identifies key moments — decisions, action items, questions, objections — and creates short, shareable video clips. This is particularly useful for product teams that need to share customer feedback and for managers who wish to review meeting outcomes without watching full recordings.

Pricing: Free tier. Business at $19/user/month. Best for: Product and UX teams that need to capture and share specific meeting moments.

Reclaim.ai, Clockwise, and Motion

AI scheduling tools represent a different approach. Rather than making meetings more efficient, they optimise the user’s entire calendar to protect productive time.

Reclaim.ai ($10/user/month) automatically defends focus time, schedules habits (such as lunch breaks and exercise), and intelligently reschedules meetings when conflicts arise. Clockwise ($7/user/month) optimises team calendars collectively, creating aligned focus blocks and minimising meeting fragmentation. Motion ($19/month) goes further by combining calendar management with task management; it automatically schedules the to-do list based on priority, deadlines, and available time.

Tip: The combination of a meeting-transcription tool (Otter or Fireflies) with an AI scheduling tool (Reclaim or Clockwise) can recover five to eight hours per week. The transcription tool permits meetings that need not be attended live to be skipped, and the scheduling tool protects the reclaimed time.

AI for Project Management

Project management tools were already moving toward automation before the recent AI wave. AI features are now transforming these platforms from passive tracking systems into active project collaborators.

Asana AI

Asana’s AI features include smart status updates (project status reports generated from task progress), goal tracking, workflow recommendations, and natural-language task creation. The AI can identify at-risk projects before they go off track and suggest task assignments based on team workload and expertise. Asana’s structured approach to AI — focusing on project intelligence rather than attempting to do everything — makes it one of the more mature implementations.

Pricing: Premium at $11/user/month. Business at $26/user/month (AI features in Business and above). Best for: Cross-functional teams that need AI-powered project insights and automated status reporting.

Monday.com AI

Monday.com’s AI assistant can generate tasks from project descriptions, compose project updates, build formulas, summarise boards, and create automations through natural language. Its visual, highly customisable interface combined with AI makes it approachable for non-technical teams while remaining powerful enough for complex project-management needs.

Pricing: Standard at $12/seat/month. Pro at $20/seat/month (AI features in Pro and above). Best for: Teams that value visual project management and customization.

ClickUp AI

ClickUp AI is integrated across the entire ClickUp platform — docs, tasks, whiteboards, chat. It can generate task descriptions, write documents, summarise threads, create subtasks, and build project timelines. ClickUp’s advantage is breadth: it aspires to be the all-in-one workspace, and its AI features span every surface of the product. The disadvantage is that this breadth can render the platform overwhelming for simple project-tracking needs.

Pricing: AI available as an add-on at $7/user/month on top of standard ClickUp plans. Best for: Teams that want a single platform for project management, docs, and communication with AI across all of them.

Linear AI

Linear has become a favoured tool among engineering and product teams, and its AI features reflect that focus. Linear AI can auto-triage bugs, suggest issue priorities, generate issue descriptions from brief inputs, and provide project-cycle insights. It is leaner and faster than the alternatives, deliberately trading feature breadth for speed and developer experience.

Pricing: Free for small teams. Standard at $8/user/month. Best for: Engineering and product teams that want a fast, focused project management tool with intelligent automation.

AI for Research and Knowledge Management

Locating information — whether from the internet, academic papers, or an organisation’s internal knowledge base — consumes an enormous amount of office time. A new category of AI tools is dramatically accelerating this process.

Perplexity AI

Perplexity AI has redefined the way professionals search for information. Unlike traditional search engines that return links, Perplexity provides synthesised, cited answers. Every claim includes a source reference, making findings easy to verify and share. The Pro tier permits documents to be uploaded, data to be analysed, and deep research to be conducted across multiple threads of inquiry. For competitive research, market analysis, and due diligence, Perplexity has become indispensable.

Pricing: Free tier. Pro at $20/month. Enterprise at $40/user/month. Best for: Professionals who need fast, cited research across any topic.

Elicit and Consensus

Elicit and Consensus are specialised for academic and scientific research. Elicit uses AI to search, summarise, and extract data from academic papers, rendering literature reviews that previously took weeks achievable in hours. Consensus searches more than 200 million scientific papers and indicates whether the research agrees or disagrees with a given claim. Both are invaluable for teams that require evidence-based decision-making.

Pricing: Elicit: Free tier, Plus at $12/month. Consensus: Free tier, Premium at $9/month. Best for: Research teams, healthcare, pharma, policy—anyone who needs scientific evidence synthesis.

NotebookLM (Google)

NotebookLM is Google’s underappreciated tool for knowledge work. The user uploads sources — documents, websites, YouTube videos, audio files — and NotebookLM creates an interactive AI that answers questions only on the basis of the provided sources. This source-grounded approach dramatically reduces hallucination, rendering it trustworthy for professional use. The Audio Overview feature can even generate a podcast-style discussion of the materials, which is surprisingly useful for absorbing complex information during commutes.

Pricing: Free (with Google account). NotebookLM Plus at $15/month. Best for: Anyone who needs to deeply understand a specific set of documents—legal review, board prep, competitive intelligence, training material creation.

Key Takeaway: Perplexity should be paired with NotebookLM — the former for broad internet research, the latter for deep analysis of specific sources. This combination covers 90% of office research needs and produces more reliable results than using a general chatbot for research.

AI Coding Assistants for Technical Office Workers

Full-time developers are not the only beneficiaries of AI coding tools. Data analysts writing SQL, product managers prototyping, marketers building automation scripts, and operations teams managing internal tools all write code, and AI coding assistants make that code dramatically better and faster.

Claude Code

Claude Code is Anthropic’s command-line coding agent that operates directly in the terminal. Its distinguishing feature is agentic capability. Rather than merely suggesting code completions, Claude Code can understand the entire codebase, plan multi-file changes, execute commands, run tests, and iterate on solutions autonomously. It excels at complex refactoring, debugging difficult issues, and building new features that span multiple files and systems. For technical office workers, Claude Code is particularly valuable for building internal tools, automating workflows, and writing data-processing scripts.

Pricing: Included with Claude Pro ($20/month) and Max subscriptions. Best for: Complex coding tasks, multi-file changes, automation scripts, and developers who prefer terminal-based workflows.

GitHub Copilot

GitHub Copilot is the most widely adopted AI coding assistant, with deep integration into VS Code, JetBrains IDEs, and other editors. Copilot provides inline code suggestions during typing, can generate entire functions from comments, and the Copilot Chat feature answers coding questions within the IDE. The new Copilot Workspace feature extends this capability further by permitting changes to be described in natural language while the AI plans and implements them across the repository.

Pricing: Individual at $10/month. Business at $19/user/month. Enterprise at $39/user/month. Best for: Day-to-day coding assistance, inline completions, teams standardized on GitHub.

Cursor

Cursor is an AI-first code editor built from the ground up around AI assistance. Rather than adding AI to an existing editor, Cursor designed every interaction — file navigation, search, editing, debugging — to operate with AI. Its “Composer” feature can make coordinated changes across multiple files, and “Cmd+K” inline editing permits changes to be described in natural language within the code. Many developers report that Cursor has fundamentally changed how they write code.

Pricing: Free tier (limited). Pro at $20/month. Business at $40/user/month. Best for: Developers who want the most AI-native editing experience and are willing to switch editors.

Windsurf

Windsurf (formerly Codeium) has positioned itself as the agentic IDE — a code editor in which AI does not merely suggest code but actively participates in development. Its Cascade feature combines multi-step reasoning with tool use, permitting the system to search the codebase, read documentation, run terminal commands, and make changes across files. Windsurf is particularly strong for developers working on large, complex codebases where understanding context is as important as writing code.

Pricing: Free tier. Pro at $15/month. Teams at $35/user/month. Best for: Developers working on large codebases who want an agentic coding experience at a competitive price point.

Master Comparison Table

The following table provides a comprehensive comparison of every tool covered in this guide.

Tool	Category	Pricing (from)	Best For	Platform
Claude	AI Assistant	Free / $20/mo	Long-form reasoning, writing, agentic work	Web, API, CLI
ChatGPT	AI Assistant	Free / $20/mo	Versatility, custom GPTs, multimodal	Web, Mobile, API
Google Gemini	AI Assistant	$14/user/mo	Google Workspace integration	Web, Workspace
Microsoft Copilot	AI Assistant	$20/user/mo	Microsoft 365 integration	Microsoft 365
Superhuman	Email	$30/mo	High-volume email users	Web, Mac, Mobile
Spark Mail	Email	Free / $8/user/mo	Team email on a budget	Web, Mac, Mobile
Grammarly	Email / Writing	Free / $12/mo	Writing quality and consistency	Cross-platform
Notion AI	Documents	$10/user/mo	Knowledge-base-aware writing	Web, Desktop, Mobile
Jasper	Marketing Writing	$49/mo	Brand-consistent marketing content	Web
Gamma.app	Presentations	Free / $10/mo	Quick, polished presentations	Web
Beautiful.ai	Presentations	$12/mo	Design-consistent slides	Web
Excel Copilot	Spreadsheets	$30/user/mo	Natural-language data analysis	Microsoft 365
Julius AI	Data Analysis	Free / $20/mo	Ad-hoc data analysis for non-coders	Web
Otter.ai	Meetings	Free / $17/mo	Meeting transcription and summaries	Web, Mobile
Fireflies.ai	Meetings	Free / $18/mo	Meeting notes + CRM integration	Web
Reclaim.ai	Scheduling	Free / $10/user/mo	Calendar optimization and focus time	Web, Calendar
Motion	Scheduling	$19/mo	Task + calendar AI scheduling	Web, Mobile
Asana AI	Project Mgmt	$26/user/mo	Cross-functional project intelligence	Web, Mobile
Linear AI	Project Mgmt	Free / $8/user/mo	Engineering and product teams	Web, Desktop
Perplexity AI	Research	Free / $20/mo	Fast, cited internet research	Web, Mobile
NotebookLM	Knowledge Mgmt	Free / $15/mo	Source-grounded document analysis	Web
Claude Code	Coding	$20/mo	Complex, multi-file coding tasks	Terminal / CLI
GitHub Copilot	Coding	$10/mo	Inline code completions	VS Code, JetBrains
Cursor	Coding	Free / $20/mo	AI-native code editing	Desktop (Editor)
Windsurf	Coding	Free / $15/mo	Agentic IDE for large codebases	Desktop (Editor)

Implementation Strategy: Rolling AI Out to Your Team

Having the right tools is of no use if the team does not actually use them. AI tool adoption fails more often due to poor rollout strategy than to poor tool selection. The following is a production-proven framework for introducing AI tools to an organisation without triggering resistance or chaos.

Phase One: Begin with Champions (Weeks 1–2)

A company-wide AI initiative should not be announced on day one. Instead, three to five AI champions should be identified across different departments — individuals who are naturally curious about technology and influential among their peers. They should be given access to the tools, a brief training session, and a clear goal: identify three tasks in their daily workflow where AI saves at least 15 minutes. These champions become internal case studies and advocates.

Phase Two: Departmental Pilots (Weeks 3–6)

Based on champion feedback, one or two departments should be selected for a structured pilot. Specific use cases should be defined (e.g., “marketing will use Claude for first-draft blog posts and Gamma for presentation creation”), measurable success metrics should be set (time saved, output quality ratings), and dedicated support should be provided. This phase is where real-world friction points emerge — integrations that do not work, workflows that require redesign, and training gaps that must be addressed.

Phase Three: Broad Rollout with Guardrails (Weeks 7–12)

With pilot learnings incorporated, the rollout can be extended to the broader organisation with clear guidelines: which tools are approved, what data may and may not be shared with AI tools, quality-review requirements for AI-generated content, and how to obtain support. A shared channel (Slack, Teams) where employees share AI tips and successes should be created. Social proof from colleagues is far more effective than any top-down mandate.

Tip: The single most important factor in AI-adoption success is not the tool selected; it is whether managers themselves model AI usage. When a VP openly states “I used Claude to draft this strategy memo and then refined it,” the entire team receives implicit permission to do the same.

ROI Analysis: Realistic Time Savings

The return on investment merits specific examination. Based on aggregated data from productivity studies and enterprise deployments reported through early 2026, the following table presents realistic time savings by category.

Task Category	Hours/Week (Before AI)	Hours/Week (With AI)	Time Saved	Key Tool
Email Processing	12.5	7.0	-5.5 hrs (44%)	Superhuman / Gmail AI
Document Creation	8.0	3.5	-4.5 hrs (56%)	Claude / Notion AI
Meeting Overhead	6.0	3.0	-3.0 hrs (50%)	Otter.ai / Reclaim
Data Analysis	5.0	2.0	-3.0 hrs (60%)	Excel Copilot / Julius AI
Presentations	3.0	1.0	-2.0 hrs (67%)	Gamma / PowerPoint Copilot
Research	4.0	1.5	-2.5 hrs (63%)	Perplexity / NotebookLM
Project Updates	3.0	1.0	-2.0 hrs (67%)	Asana AI / ClickUp AI
Total	41.5	19.0	-22.5 hrs (54%)	—

The figure of 22.5 hours per week appears almost too high, and for most workers it is — at least initially. A more realistic expectation for the first three months is 8–12 hours per week of reclaimed time, increasing to 15–20 hours as proficiency develops. The remaining gap reflects the learning curve, the time spent reviewing AI outputs, and tasks that still resist automation.

In monetary terms, if the average knowledge worker’s fully loaded cost is $75 per hour, saving ten hours per week represents $750 per week or $39,000 per year per employee. Against a typical AI tool cost of $50–100 per month per user, the ROI is often 30x to 60x within the first year.

Key Takeaway: The ROI on AI productivity tools is not hypothetical; it is measurable and substantial. The gains compound over time as users develop better prompting habits and discover new applications. Monthly tracking of time savings supports the business case for broader adoption.

Privacy and Security Considerations for Enterprise

Adopting AI tools at scale introduces real privacy and security concerns that IT and legal teams must address proactively. Ignoring these issues does not eliminate them; it simply ensures that they surface as incidents rather than planned decisions.

Data Handling and Training

The most important question for any AI tool is whether the provider uses customer data to train its models. Most enterprise tiers of major AI tools (Claude Team/Enterprise, ChatGPT Enterprise, Copilot for Microsoft 365, Gemini for Workspace) explicitly do not train on customer data. Free and individual tiers, however, often do, or at least reserve the right to. A clear policy should be established: enterprise tools for work data, personal tiers reserved for non-sensitive experimentation.

Compliance and Regulatory Frameworks

AI tools should comply with relevant regulations — GDPR for European data, HIPAA for healthcare, SOC 2 for SaaS companies handling customer data, and industry-specific requirements. Most major AI providers now offer SOC 2 Type II compliance, data processing agreements (DPAs), and data-residency options. Claude, ChatGPT, and Microsoft Copilot all offer enterprise agreements with contractual data-protection guarantees.

Access Controls and Data-Loss Prevention

AI tools that have access to an organisation’s data (such as Microsoft Copilot through Microsoft Graph) can surface information that employees might not otherwise find. This is powerful but can also expose sensitive documents to people who should not see them. Before enabling these features, an audit of the organisation’s file permissions and access controls is required. AI does not create new security holes; it reveals existing ones that were hidden by obscurity.

Caution: Sensitive data — customer PII, financial records, proprietary source code, legal documents — should never be pasted into free-tier AI tools. Data-handling policies should be verified before any confidential information is shared. When in doubt, data should be anonymised first.

Enterprise AI Security Checklist

Before deploying any AI tool at scale, the following items should be addressed:

Data processing agreement signed with the AI provider
Training opt-out confirmed (your data is not used to train models)
SSO integration enabled for centralized access control
Audit logging available for compliance and monitoring
Data residency confirmed to meet regional requirements
Usage policies documented and communicated to all employees
Incident response plan updated to include AI-related data exposure scenarios
Regular access reviews scheduled for AI tool permissions

Future Outlook: Where AI Office Tools Are Heading

The AI tools covered in this guide represent the state of play in early 2026. The pace of development is rapid, and several trends will reshape the landscape over the next 12 to 18 months.

Agentic AI as the Default

The most significant shift under way is the move from AI as a tool that is used to AI as an agent that works alongside the user. Claude Cowork, ChatGPT’s operator mode, and Microsoft Copilot’s agent features all point toward a future in which AI does not merely answer questions but executes multi-step workflows, coordinates across applications, and proactively identifies tasks requiring attention. By mid-2027, the chatbot model will appear as dated as a DOS command prompt.

Platform Consolidation

The current proliferation of specialised tools is not sustainable. Teams cannot maintain subscriptions to fifteen different AI products. Aggressive consolidation is to be expected: the major platforms (Microsoft, Google, Anthropic, OpenAI) will absorb or replicate the features of standalone tools. Specialised tools will survive only if they offer dramatically better performance in their niche or integrate seamlessly into the major ecosystems.

Personal AI Aware of the User’s Work

The next frontier is AI that builds a persistent, private model of the user’s work patterns, preferences, writing style, domain expertise, and organisational context. An AI assistant that has read every document the user has written, attended every meeting, and understands the user’s role and goals — not as a generic chatbot, but as a genuine cognitive extension — is now within reach. Early versions are appearing in Claude’s memory features, Copilot’s Graph integration, and Notion AI’s workspace awareness.

Voice-First AI Interfaces

As voice AI improves — and it is improving rapidly — a shift toward voice-first interactions with AI tools is to be expected. Dictating an email while driving, asking the AI to reschedule a meeting during a walk, or verbally briefing the AI on a project while making coffee — these scenarios are already technically possible and will become mainstream as latency and accuracy continue to improve.

Concluding Observations

The AI productivity toolkit for office workers in 2026 is remarkably capable, surprisingly affordable, and — perhaps most importantly — genuinely ready for mainstream adoption. The tools covered in this guide are not research prototypes or bleeding-edge experiments. They are production-ready products used by millions of professionals every day.

What separates the teams that thrive with AI from those that simply add another software subscription is intentionality. The winning strategy is not to adopt every tool that catches the eye. It is to identify the two or three highest-impact areas in which the team loses the most time, select the best tools for those specific pain points, and invest in proper onboarding and habit formation. Email and document creation are almost always the right starting points — they are high-frequency, high-time-cost tasks in which AI delivers immediate, visible results.

If one action is to follow from this guide, it should be the following: select one tool from this list, sign up for a free trial or starter plan, and commit to using it for every relevant task for two full weeks — not occasionally, not when remembered, but every single time. This is the means by which initial friction is overcome and the muscle memory that turns AI from a novelty into a genuine multiplier of professional capability is built.

The office workers who will thrive in the next decade are not those who work the longest hours. They are those who work with the most capable tools. The gap is opening now, and every week of delay is a week in which competitors gain ground.

The appropriate time to begin is now.

References

Anthropic. “Claude—AI Assistant.” anthropic.com/claude
OpenAI. “ChatGPT.” openai.com/chatgpt
Google. “Gemini for Google Workspace.” workspace.google.com/solutions/ai
Microsoft. “Microsoft Copilot for Microsoft 365.” microsoft.com/microsoft-365/copilot
Superhuman. “AI-Powered Email.” superhuman.com
Notion. “Notion AI.” notion.so/product/ai
Gamma. “AI Presentations.” gamma.app
Otter.ai. “AI Meeting Assistant.” otter.ai
Perplexity AI. “AI-Powered Search.” perplexity.ai
Google. “NotebookLM.” notebooklm.google.com
GitHub. “GitHub Copilot.” github.com/features/copilot
Cursor. “The AI Code Editor.” cursor.com
Reclaim.ai. “AI Calendar Management.” reclaim.ai
Asana. “Asana AI.” asana.com/product/ai
McKinsey & Company. “The State of AI in 2025.” mckinsey.com

April 4, 2026

Mastering Custom Commands in Claude Code: The Definitive Guide to Automating Your Development Workflow

Summary

What this post covers: A definitive guide to Claude Code custom commands, the Markdown files in .claude/commands/ that convert multi-step workflows into one-line slash commands, including anatomy, best practices, ten ready-to-use commands, advanced techniques, and the organization of a team library.

Key insights:

Custom commands require zero configuration: any .md file placed in .claude/commands/ or ~/.claude/commands/ becomes a slash command immediately, with no registration step or build process.
The project-versus-user distinction is the most important design decision: project commands are committed to git and standardize team workflows (deploy, review, scaffold), while user commands remain personal and codify individual preferences.
The most substantial productivity gains derive from the $ARGUMENTS placeholder combined with explicit constraints sections. Vague commands produce vague behaviour, so commands should read as detailed briefings containing checklists and failure-handling rules.
Custom commands are most valuable as encoded tribal knowledge: the deployment runbook held in one engineer’s mind becomes an executable file that the entire team uses, ensuring that deployments and reviews follow the same process each time.
Begin with three commands: the most frequent task, the most disliked task, and the team’s most significant pain point. Any instruction repeated three times should subsequently be converted into a new command.

Main topics: What Are Custom Commands?, Anatomy of a Command File, Best Practices for Writing Effective Commands, Practical Command Examples (10 Ready-to-Use Commands), Advanced Techniques, Project Commands and User Commands, Integration with CLAUDE.md, Organizing Commands for Large Projects, Common Mistakes and How to Avoid Them, Real-World Command Libraries by Technology Stack, Conclusion, References.

A developer at a mid-sized startup recently described an instructive change in routine: a workflow that previously required 45 minutes each morning (setting up the development environment, running tests, reviewing PRs, and scaffolding new features) now requires under 5 minutes. The mechanism was not a new DevOps pipeline or a new CI/CD tool, but seven carefully constructed custom commands in Claude Code, Anthropic’s AI-powered CLI for software development.

Most users of Claude Code are familiar with its ability to write code, debug issues, and answer questions about a codebase. A less prominent feature, however, transforms Claude Code from a useful assistant into an automated development partner: custom commands. These are Markdown files that convert complex, multi-step workflows into one-line slash commands available at any time.

Custom commands can be understood as macros at a higher level of abstraction. Rather than recording keystrokes, the developer writes natural-language instructions that Claude Code follows with full access to the codebase, the terminal, and project tools. A single command may review code for security vulnerabilities, check for style violations, and generate a summary. A separate command may scaffold an entire API endpoint with route, handler, validation, and tests within minutes.

Despite this capability, most developers exploit only the basic functionality. They may create one or two simple commands but fail to take advantage of the advanced patterns that make custom commands genuinely transformative: argument handling, conditional logic, multi-step workflows with checkpoints, and integration with project-level configuration. This guide addresses that gap. By its end, readers will have the material required to build a comprehensive command library that automates the most repetitive parts of a development workflow, together with ten complete, ready-to-use command files as a starting point.

What Are Custom Commands?

At their core, custom commands in Claude Code are Markdown files that reside in a specific directory structure. When a user types / in Claude Code, the tool scans these directories and presents every available command as a selectable option. When a command is invoked, Claude Code reads the Markdown content and treats it as its instruction set; effectively, the developer is providing Claude with a detailed prompt for a specific task, and Claude executes it with full project context.

Two Types of Commands

Claude Code recognizes commands in two locations, and understanding the distinction is important for team workflows:

Project commands reside in the project’s .claude/commands/ directory. Because they live inside the repository, they are committed to version control and shared with every team member. When a colleague clones the repository and opens Claude Code, they automatically see and can use every project command. This makes such commands appropriate for team-wide workflows such as deployment, code review, and feature scaffolding.

User commands reside in ~/.claude/commands/ within the user’s home directory. These are personal to the individual and are not shared via git. They are appropriate for productivity shortcuts, personal preferences, and workflows that are specific to a developer’s setup. Examples include a command that formats output in a preferred manner or one that interacts with internal tools used only by that individual.

Key Takeaway: Project commands (.claude/commands/) are shared with the team via git. User commands (~/.claude/commands/) are personal and remain on the individual machine. Project commands are appropriate for team workflows; user commands are appropriate for personal productivity.

How Claude Code Discovers Commands

When Claude Code is launched in a project directory, it performs a straightforward discovery procedure. It first checks .claude/commands/ relative to the project root, then checks ~/.claude/commands/ in the user’s home directory. Every .md file found in these directories becomes an available command, with the filename (minus the extension) becoming the command name. Thus .claude/commands/deploy.md becomes /deploy, and .claude/commands/write-post.md becomes /write-post.

This discovery occurs automatically; there is no registration step, no configuration file to update, and no CLI flag to set. A Markdown file placed in the correct directory becomes instantly available as a command, and removal causes the command to disappear. The simplicity of this mechanism is the source of its power: the barrier to creating a new command is effectively zero.

Anatomy of a Command File

A command file is a Markdown document, although its structure matters. The following sections examine each element, beginning with the basics and progressing to more complex patterns.

File Naming Conventions

Command files follow a simple naming scheme:

Use kebab-case for filenames: write-post.md, review-code.md, create-component.md
Always use the .md extension
The filename becomes the command name: deploy.md → /deploy
Names should be short and descriptive, since they will be typed frequently

The Markdown Structure

The content of a command file is the prompt that Claude Code receives when the command is invoked. Everything written in the file becomes Claude’s instructions. The file should therefore be written as a detailed briefing to a capable developer who has not previously seen the project.

The simplest possible command file illustrates the concept:

# File: .claude/commands/greet.md

Say hello to the user and tell them the current date and time.
List the top 3 most recently modified files in the project.

When /greet is typed in Claude Code, the tool reads this file and follows the instructions. Real-world commands, however, require considerably more structure. The following section examines a properly organized command.

The $ARGUMENTS Placeholder

One of the most useful features of custom commands is the $ARGUMENTS placeholder. When a command is invoked with additional text (for example, /deploy staging or /write-tests src/utils/parser.py), everything after the command name is substituted into the $ARGUMENTS placeholder in the Markdown file.

# File: .claude/commands/explain.md

Read the file or function specified by the user: $ARGUMENTS

Provide a detailed explanation that includes:
1. What the code does at a high level
2. Key algorithms or patterns used
3. Any potential issues or improvements
4. How it fits into the broader codebase

When /explain src/auth/middleware.py is typed, Claude Code receives the full instructions with $ARGUMENTS replaced by src/auth/middleware.py. This single mechanism enables flexible commands that adapt to whatever input is provided.

A Full Command File Example

The following well-structured command demonstrates all the key elements working together:

# File: .claude/commands/add-feature.md

You are a senior developer working on this project. Add a new feature
based on the following description: $ARGUMENTS

## Step 1: Understand the Request
- Parse the feature description from $ARGUMENTS
- Identify which parts of the codebase will be affected
- List the files you plan to modify or create

## Step 2: Plan the Implementation
- Outline the changes needed
- Identify any dependencies or prerequisites
- Check for existing patterns in the codebase to follow

## Step 3: Implement the Feature
- Write clean, well-documented code
- Follow existing code style and conventions in the project
- Add appropriate error handling

## Step 4: Write Tests
- Create unit tests for the new feature
- Ensure existing tests still pass by running: `npm test`

## Step 5: Summary
- List all files created or modified
- Describe the changes made
- Note any follow-up tasks or considerations

## Constraints
- Do NOT modify any configuration files without asking first
- Do NOT install new dependencies without listing them and explaining why
- Follow the project's existing code style exactly
- If $ARGUMENTS is empty, ask the user what feature they want to add

Several important patterns are present in this example: numbered steps provide Claude with a clear execution order, constraints establish boundaries on permissible actions, and the command handles the edge case in which no arguments are provided. This level of detail distinguishes a good command from an excellent one.

Tip: A command file should be treated as a detailed brief for a new team member. The more specific the description of what to do, what not to do, and what patterns to follow, the better the resulting behaviour.

Best Practices for Writing Effective Commands

After examination of numerous custom commands and observation of teams adopting them across different technology stacks, clear patterns have emerged for what makes commands reliable rather than unreliable. The distinction almost invariably reduces to the precision with which intent is communicated.

Be Specific and Explicit

Claude Code follows instructions literally. The instruction “clean up the code” will produce changes based on Claude’s best judgment. The instruction “remove unused imports, add type hints to all function signatures, and ensure all functions have docstrings following the Google style guide” produces precisely that. Specificity is not pedantry but precision.

Structure with Clear Steps

Numbered lists are particularly valuable in command files. They establish a natural execution order and make it straightforward for Claude to report progress. Each step should be a discrete, verifiable action. Rather than “set up the project,” the instruction should be decomposed into: (1) create the directory structure, (2) initialize the package manager, (3) install dependencies, (4) create the configuration file.

Include Constraints and Guardrails

This may be the single most important practice. Claude should always be informed of what not to do. Without constraints, Claude will make reasonable but potentially unwanted decisions. Explicit guardrails should be added, such as “do NOT modify the database schema,” “always create a backup before overwriting,” or “never commit directly to main.”

Specify Output Format

If the result is required in a specific format (a JSON file, a Markdown report, a formatted table in the terminal), this should be stated explicitly. Commands that end with “report what you did” tend to produce inconsistent output. Commands that end with “create a summary in the following format: [template]” produce consistent, useful results.

Include Error Handling Instructions

What should Claude do if a test fails, a file does not exist, or a build breaks? Without error-handling instructions, Claude will either stop and ask (slowing the workflow) or guess (potentially incorrectly). Explicit error handling should be included: “If the tests fail, analyse the failure, fix the issue, and re-run the tests. If they fail a second time, stop and report the errors.”

Reference Specific Files and Paths

When a command must operate on specific parts of the codebase, the targets should be referenced explicitly. Rather than “check the config file,” the instruction should be “read config/settings.py and extract the database URL.” This eliminates ambiguity and ensures that the command operates reliably as the project evolves.

Use Conditional Logic

Real workflows branch on conditions. Commands should do likewise: “If $ARGUMENTS contains ‘staging’, deploy to the staging server. If it contains ‘production’, deploy to production with additional safety checks. If no argument is provided, default to staging.”

Keep Commands Focused

A command that attempts to do everything performs no individual task well. The single-responsibility principle should be observed: one command, one job. A complex workflow should be decomposed into multiple commands that can be run in sequence. Separate /build, /test, and /deploy commands are preferable to a single monolithic /do-everything command.

Good and Bad Command Patterns

Pattern	Bad Example	Good Example
Instructions	“Fix the bugs”	“Run the test suite, identify failing tests, analyze each failure, and apply minimal fixes”
File references	“Update the config”	“Update `config/database.yml` and `.env.example`“
Error handling	(none)	“If tests fail, fix and re-run. After 2 failures, stop and report.”
Output format	“Tell me what changed”	“List changed files as a Markdown checklist with one-line descriptions”
Constraints	(none)	“Do NOT modify files outside `src/`. Do NOT add dependencies.”
Scope	One giant command for build + test + deploy + notify	Separate `/build`, `/test`, `/deploy`, and `/notify` commands

Practical Command Examples (10 Ready-to-Use Commands)

Theory is useful, but readers may benefit from commands that can be used directly. The following ten complete, production-proven command files cover the most common development workflows. Each is ready to copy into a .claude/commands/ directory for immediate use.

The /write-post Command: Blog Publishing Workflow

This is the command that supports the blog from which this guide is published. It orchestrates the entire workflow of selecting a topic, writing a full blog post, and publishing it to WordPress, all from a single slash command.

# File: .claude/commands/write-post.md

You are a professional tech and investment blog writer.
Write and publish a blog post using the following workflow:

## Step 1: Topic Selection
- If the user provides a topic in $ARGUMENTS, use that topic.
- Otherwise, run `uv run python -m src.main select-topic` to pick
  a random topic from the configured pool.
- Show the selected topic and its category to the user.

## Step 2: Write the Blog Post
Write a high-quality, engaging blog post as clean WordPress-ready HTML5.

**Writing Style:**
- Open with a powerful hook: a surprising fact, bold question, or
  real incident
- Conversational yet professional tone
- Target: 4,000-6,000 words minimum
- Structure: Table of Contents → Introduction → 3-5 body sections
  → Conclusion → References
- No <h1> tags, no <html>/<head>/<body> wrappers

## Step 3: Save and Publish
1. Save the HTML content to `posts/{slug}.html`
2. Run the publish command:
   ```
   uv run python -m src.main publish \
     --title "<title>" --slug "<slug>" \
     --category "<category>" \
     --content-file posts/{slug}.html \
     --status publish
   ```
3. Run `uv run python -m src.main record-usage "<topic>"`
4. Report the published post URL to the user.

## Constraints
- Do NOT use external LLM APIs — you are the writer
- For investment posts, include a disclaimer
- No numbered section headings

The /review-code Command: Comprehensive Code Review

# File: .claude/commands/review-code.md

Perform a thorough code review on the following: $ARGUMENTS

If $ARGUMENTS is a file path, review that specific file.
If $ARGUMENTS is a directory, review all source files in it.
If $ARGUMENTS is empty, review all staged changes (git diff --cached).

## Review Checklist

### Security
- [ ] No hardcoded secrets, API keys, or passwords
- [ ] Input validation on all user-facing inputs
- [ ] SQL injection / XSS vulnerabilities
- [ ] Proper authentication and authorization checks

### Code Quality
- [ ] Functions are under 50 lines (flag any that exceed this)
- [ ] No code duplication (DRY principle)
- [ ] Clear variable and function names
- [ ] Proper error handling (no bare except/catch blocks)

### Performance
- [ ] No N+1 query patterns
- [ ] Efficient data structures used
- [ ] No unnecessary loops or redundant computations
- [ ] Large datasets handled with pagination or streaming

### Testing
- [ ] New code has corresponding tests
- [ ] Edge cases are covered
- [ ] Test names clearly describe what they test

## Output Format
For each issue found, report:
1. **File and line number**
2. **Severity**: Critical / Warning / Suggestion
3. **Category**: Security / Quality / Performance / Testing
4. **Description**: What the issue is
5. **Fix**: Suggested code change

End with a summary table:
| Severity | Count |
|----------|-------|
| Critical | X     |
| Warning  | X     |
| Suggestion | X   |

## Constraints
- Do NOT modify any files — this is a review only
- If no issues are found, say so explicitly
- Be constructive, not just critical

The /create-component Command: Frontend Component Scaffolding

# File: .claude/commands/create-component.md

Create a new React component based on: $ARGUMENTS

## Step 1: Parse the Request
- Component name from $ARGUMENTS (e.g., "UserProfile" or "DataTable")
- If $ARGUMENTS includes additional description, use it for the
  component's functionality

## Step 2: Check Project Conventions
- Read the project's existing components to match the style
- Detect whether the project uses TypeScript or JavaScript
- Detect the CSS approach (CSS modules, Tailwind, styled-components)
- Check if the project uses a testing library (Jest, Vitest, etc.)

## Step 3: Create the Component
Create the following files:

1. **Component file**: `src/components/{ComponentName}/{ComponentName}.tsx`
   - Use functional component with hooks
   - Include proper TypeScript interfaces for props
   - Add JSDoc comments

2. **Test file**: `src/components/{ComponentName}/{ComponentName}.test.tsx`
   - Test rendering without errors
   - Test prop variations
   - Test user interactions if applicable

3. **Styles file**: `src/components/{ComponentName}/{ComponentName}.module.css`
   (or appropriate format for the project)

4. **Index file**: `src/components/{ComponentName}/index.ts`
   - Re-export the component as default and named export

## Step 4: Integration
- Add the component to any barrel export files if they exist
- Show a usage example in the terminal

## Constraints
- Match the EXACT coding style of existing components
- Do NOT install new packages
- If the component directory pattern differs in the project, follow
  the existing pattern instead

The /deploy Command: Deployment Workflow

# File: .claude/commands/deploy.md

Deploy the application to the specified environment: $ARGUMENTS

## Environment Detection
- If $ARGUMENTS is "staging" or "stage": deploy to staging
- If $ARGUMENTS is "production" or "prod": deploy to production
- If $ARGUMENTS is empty: default to staging

## Pre-Deployment Checks (ALL must pass)
1. Run `git status` — working directory must be clean
2. Run the full test suite — all tests must pass
3. Run the linter — no errors allowed (warnings are OK)
4. Verify the current branch:
   - Staging: any branch is fine
   - Production: must be on `main` or `master`

If ANY check fails, stop immediately and report the failure.
Do NOT proceed to deployment.

## Deployment Steps

### For Staging
1. Build the project: `npm run build` (or project equivalent)
2. Deploy: `npm run deploy:staging`
3. Run smoke tests: `npm run test:smoke -- --env=staging`
4. Report the staging URL

### For Production
1. Confirm with the user: "You are about to deploy to PRODUCTION.
   Continue? (y/n)"
2. Build: `npm run build`
3. Create a git tag: `git tag -a v{date} -m "Production deploy"`
4. Deploy: `npm run deploy:production`
5. Run smoke tests: `npm run test:smoke -- --env=production`
6. Report the production URL

## Post-Deployment
- Show the deployment summary (environment, commit SHA, timestamp)
- If smoke tests fail, immediately report and suggest rollback steps

## Constraints
- NEVER deploy to production without user confirmation
- NEVER skip the pre-deployment checks
- If this is a production deploy, ensure all staging tests passed first

The /fix-bug Command: Bug Investigation and Fix

# File: .claude/commands/fix-bug.md

Investigate and fix the following bug: $ARGUMENTS

## Step 1: Understand the Bug
- Parse the bug description from $ARGUMENTS
- If a file or line number is referenced, start there
- If an error message is provided, search the codebase for it

## Step 2: Reproduce
- Identify the conditions that trigger the bug
- Check if there is an existing test that should catch this
- If possible, write a failing test that demonstrates the bug

## Step 3: Root Cause Analysis
- Trace the code path that leads to the bug
- Identify the root cause (not just the symptom)
- Check if the same pattern exists elsewhere (similar bugs waiting
  to happen)

## Step 4: Fix
- Apply the minimal change that fixes the root cause
- Do NOT refactor unrelated code — stay focused on the bug
- Ensure the fix handles edge cases

## Step 5: Verify
- Run the failing test — it should now pass
- Run the full test suite — no regressions allowed
- If the fix touches an API, verify the API contract is maintained

## Step 6: Report
Provide a structured report:
- **Bug**: One-line description
- **Root Cause**: What was actually wrong
- **Fix**: What was changed and why
- **Files Modified**: List with brief descriptions
- **Test Coverage**: What tests were added or modified
- **Risk Assessment**: Low/Medium/High — could this fix break
  anything else?

## Constraints
- Do NOT make changes unrelated to the bug
- If the fix requires a database migration, flag it but do NOT run it
- If the bug cannot be fixed without breaking changes, stop and
  report your findings

The /refactor Command: Guided Refactoring

# File: .claude/commands/refactor.md

Refactor the specified code: $ARGUMENTS

If $ARGUMENTS is a file path, refactor that file.
If $ARGUMENTS is a description (e.g., "extract auth logic into
a service"), follow those instructions.

## Step 1: Analyze Current State
- Read the target code thoroughly
- Identify code smells: duplication, long functions, deep nesting,
  unclear naming, tight coupling
- List all functions and classes that will be affected
- Check test coverage for the target code

## Step 2: Plan the Refactoring
Present a plan BEFORE making any changes:
- What patterns will you apply (Extract Method, Move to Module, etc.)
- Which files will be created, modified, or deleted
- What is the expected impact on the public API
- Wait for user approval before proceeding

## Step 3: Execute (only after approval)
- Apply changes incrementally — one refactoring pattern at a time
- After each change, run tests to catch regressions early
- Preserve all existing behavior — this is a refactor, not a rewrite

## Step 4: Update Tests
- Adjust test imports and references as needed
- Add tests for any newly extracted functions or modules
- Run the full test suite and confirm everything passes

## Step 5: Summary
- List the refactoring patterns applied
- Show before/after metrics (function count, average length, etc.)
- Note any follow-up refactoring opportunities

## Constraints
- Do NOT change external behavior or public API
- Do NOT combine refactoring with feature changes
- Run tests after EVERY significant change
- If tests fail at any point, revert the last change and report

The /write-tests Command: Test Generation

# File: .claude/commands/write-tests.md

Write comprehensive tests for: $ARGUMENTS

$ARGUMENTS can be a file path, a function name, or a module name.

## Step 1: Analyze the Target
- Read the source code for $ARGUMENTS
- Identify all public functions, methods, and classes
- Map out the logic branches (if/else, try/catch, loops)
- Identify external dependencies that need mocking

## Step 2: Determine Testing Approach
- Detect the project's testing framework (pytest, jest, vitest, etc.)
- Match the existing test file naming convention
- Match the existing test style (describe/it, test(), class-based)

## Step 3: Write Tests
For each public function or method, write tests covering:

1. **Happy path**: Normal inputs producing expected outputs
2. **Edge cases**: Empty inputs, None/null, boundary values
3. **Error cases**: Invalid inputs, exceptions, error states
4. **Integration points**: Interactions with dependencies (mocked)

Test naming convention: `test_{function_name}_{scenario}_{expected_result}`
(or the project's existing convention if different)

## Step 4: Verify
- Run the new tests: they should all pass
- Run the full test suite: no regressions
- Check coverage if a coverage tool is configured

## Output
- Created test file path
- Number of test cases written
- Coverage summary (if available)

## Constraints
- Do NOT modify the source code being tested
- Mock external dependencies (database, APIs, file system)
- Each test must be independent — no shared mutable state
- Do NOT test private/internal functions unless critical

The /db-migration Command: Database Migration Workflow

# File: .claude/commands/db-migration.md

Create a database migration for: $ARGUMENTS

## Step 1: Understand the Change
- Parse the migration description from $ARGUMENTS
- Examples: "add email_verified column to users table",
  "create orders table with foreign key to users"

## Step 2: Detect the ORM and Migration Tool
- Check for: Alembic (Python), Prisma (Node), TypeORM, Knex,
  Django migrations, Rails ActiveRecord, or raw SQL
- Read existing migrations to understand the naming convention
  and style

## Step 3: Generate the Migration
Using the detected tool:

**For Alembic (Python/SQLAlchemy):**
```
alembic revision --autogenerate -m "$ARGUMENTS"
```
Then review and adjust the generated migration.

**For Prisma:**
Update `prisma/schema.prisma`, then run:
```
npx prisma migrate dev --name {migration_name}
```

**For Django:**
Update the model, then run:
```
python manage.py makemigrations --name {migration_name}
```

**For raw SQL:**
Create up and down migration files in the migrations directory.

## Step 4: Review the Migration
- Verify the UP migration does what was requested
- Verify the DOWN migration correctly reverses the change
- Check for:
  - Missing indexes on foreign keys
  - Missing NOT NULL constraints where appropriate
  - Missing default values
  - Data loss risks in column type changes

## Step 5: Test
- Run the migration UP
- Verify the schema change
- Run the migration DOWN
- Verify the schema is restored

## Constraints
- NEVER run migrations against production — local/dev only
- Always create both UP and DOWN migrations
- Flag any migration that could cause data loss
- If adding a NOT NULL column to an existing table, include a
  default value or a backfill step

The /api-endpoint Command: API Endpoint Scaffolding

# File: .claude/commands/api-endpoint.md

Create a new API endpoint: $ARGUMENTS

$ARGUMENTS format: "METHOD /path - description"
Examples:
- "POST /api/users - create a new user"
- "GET /api/orders/:id - get order details"
- "PUT /api/settings - update user settings"

## Step 1: Parse the Request
- Extract HTTP method, path, and description from $ARGUMENTS
- Identify path parameters (e.g., :id)
- Determine the resource name (e.g., users, orders, settings)

## Step 2: Detect the Framework
Check for: Express, FastAPI, Django REST, Flask, Gin, Fiber, etc.
Read existing routes to match the project's patterns.

## Step 3: Create the Endpoint

### Route/Handler file
- Add the route to the appropriate router file
- Create the handler function with:
  - Request validation (parse and validate input)
  - Business logic (or call to service layer)
  - Response formatting
  - Error handling with appropriate HTTP status codes

### Validation/Schema
- Create request body schema (for POST/PUT)
- Create response schema
- Add validation rules (required fields, types, formats)

### Service Layer (if the project uses one)
- Create or update the service with the business logic
- Keep the handler thin — it should only handle HTTP concerns

### Tests
Create tests for:
- Successful request (200/201)
- Validation error (400)
- Not found (404) — for endpoints with path params
- Unauthorized (401) — if auth is required
- Server error handling (500)

## Step 4: Update Documentation
- If the project has an OpenAPI/Swagger spec, update it
- If the project has API docs, add the new endpoint

## Step 5: Verify
- Start the dev server (if not running)
- Run the new tests
- Show a curl example for testing the endpoint manually

## Constraints
- Follow existing patterns EXACTLY — consistency is critical
- Include proper authentication middleware if other endpoints use it
- Use the project's error handling patterns
- Do NOT add new dependencies

The /changelog Command: Changelog Generation

# File: .claude/commands/changelog.md

Generate a changelog based on recent git history.

## Parameters
- If $ARGUMENTS contains a version tag (e.g., "v1.2.0"), generate
  the changelog since that tag
- If $ARGUMENTS contains "last-release", find the most recent tag
  and generate since then
- If $ARGUMENTS is empty, generate for the last 50 commits

## Step 1: Gather Commits
Run: `git log --oneline --no-merges {range}`
Read all commit messages in the specified range.

## Step 2: Categorize Changes
Group commits into these categories:
- **New Features**: commits mentioning "add", "feat", "new",
  "implement", "introduce"
- **Bug Fixes**: commits mentioning "fix", "bug", "resolve",
  "patch", "correct"
- **Performance**: commits mentioning "perf", "optimize", "speed",
  "cache"
- **Breaking Changes**: commits mentioning "breaking", "remove",
  "deprecate", "migrate"
- **Documentation**: commits mentioning "doc", "readme", "guide"
- **Other**: everything else

## Step 3: Generate the Changelog
Format as Markdown:

```
## [Version] - YYYY-MM-DD

### New Features
- Description of feature (commit hash)

### Bug Fixes
- Description of fix (commit hash)

### Performance
- Description of improvement (commit hash)

### Breaking Changes
- Description of breaking change (commit hash)

### Other
- Description (commit hash)
```

## Step 4: Save
- Save to `CHANGELOG.md` (append to top, keep existing content)
- Show the generated changelog in the terminal

## Constraints
- Do NOT modify commit history
- If a commit message is unclear, include it under "Other" with
  the full message
- Skip merge commits
- Include commit short hashes for reference

Tip: All ten commands above are ready to use. Copy any of them into a .claude/commands/ directory, adjust the project-specific details (test commands, directory paths, framework references), and use them immediately.

Advanced Techniques

Once the basics of writing custom commands are understood, several advanced patterns enable more capable workflows. These techniques distinguish simple automation from sophisticated development orchestration.

Chaining Commands

Although Claude Code does not provide a built-in command-chaining mechanism, the same effect can be achieved by writing a command that instructs Claude to execute the same steps as other commands. The pattern can be viewed as inlining multiple commands into a single master workflow.

# File: .claude/commands/ship-it.md

Execute the full ship-it workflow for: $ARGUMENTS

## Step 1: Code Review
Perform a thorough code review on all staged changes.
Check for security issues, code quality, and performance.
If any CRITICAL issues are found, stop and report them.

## Step 2: Write Tests
For any new or modified functions that lack test coverage,
write comprehensive tests following the project's conventions.
Run all tests and ensure they pass.

## Step 3: Generate Changelog
Categorize the changes being shipped and prepare a changelog entry.

## Step 4: Deploy
If all checks pass, deploy to staging.
Run smoke tests against staging.
Report the final status.

## If any step fails, stop immediately and report what went wrong.

Using Environment Context

Commands can instruct Claude to read environment files, configuration, and project metadata in order to make dynamic decisions. The result is that a single command can behave differently across different projects or environments.

# File: .claude/commands/setup-env.md

Set up the development environment for this project.

## Step 1: Detect the Project Type
- Check for `package.json` → Node.js project
- Check for `pyproject.toml` or `requirements.txt` → Python project
- Check for `go.mod` → Go project
- Check for `Cargo.toml` → Rust project

## Step 2: Install Dependencies
Based on the detected project type:
- **Node.js**: Run `npm install` or `yarn install` or `pnpm install`
  (check for lock files to determine which)
- **Python**: Run `uv sync` or `pip install -r requirements.txt`
- **Go**: Run `go mod download`
- **Rust**: Run `cargo build`

## Step 3: Configure Environment
- Check if `.env.example` exists but `.env` does not
- If so, copy `.env.example` to `.env` and tell the user to fill
  in the values
- Check for any other setup scripts in `scripts/` or `Makefile`

## Step 4: Verify
- Run a basic health check (test command, build, or lint)
- Report success or any issues found

Advanced Use of $ARGUMENTS

The $ARGUMENTS placeholder can convey considerably more than simple strings. Commands can be designed to parse complex argument patterns:

# File: .claude/commands/generate.md

Generate code based on the specification: $ARGUMENTS

## Argument Parsing
Parse $ARGUMENTS as: "{type} {name} [options]"

Examples:
- `/generate model User name:string email:string admin:boolean`
- `/generate controller OrdersController --crud`
- `/generate service PaymentService --with-tests --with-docs`
- `/generate middleware AuthMiddleware`

## Type handlers:

### model
- Create a database model with the specified fields
- Field format: `fieldname:type` (string, number, boolean, date)
- Generate a migration for the new model

### controller
- Create a controller/handler file
- If `--crud` is specified, include all CRUD operations
- Generate route registrations

### service
- Create a service class with dependency injection
- If `--with-tests` is specified, also generate test file
- If `--with-docs` is specified, add JSDoc/docstring comments

### middleware
- Create a middleware function
- Include next() call and error handling

## Constraints
- Match existing code style exactly
- Use the project's established patterns for each type

Multi-Step Workflows with Checkpoints

For complex workflows in which Claude should pause for confirmation at critical points, checkpoint patterns can be built into commands:

# File: .claude/commands/major-refactor.md

Perform a major refactoring: $ARGUMENTS

## CHECKPOINT 1: Analysis
- Analyze the current state of $ARGUMENTS
- Present findings: what needs to change and why
- List every file that will be affected
- Estimate the scope: Small (1-3 files) / Medium (4-10) / Large (11+)
**STOP and wait for user approval before proceeding.**

## CHECKPOINT 2: Plan
- Present a detailed, step-by-step refactoring plan
- Include rollback strategy for each step
- Highlight any risky operations
**STOP and wait for user approval before proceeding.**

## CHECKPOINT 3: Execute
- Execute the plan one step at a time
- Run tests after each step
- If tests fail, roll back the last step and report
- After all steps complete, present the final summary
**STOP and wait for user approval to finalize.**

## If the user says "abort" at any checkpoint:
- Roll back all changes made so far
- Report what was reverted

Commands That Read CLAUDE.md

Among the most useful advanced patterns is the writing of commands that explicitly reference a project’s CLAUDE.md file. Because CLAUDE.md is automatically loaded by Claude Code as project context, commands can rely on the conventions defined there without repeating them:

# File: .claude/commands/new-feature.md

Implement a new feature following all project conventions
defined in CLAUDE.md: $ARGUMENTS

## Instructions
- Read CLAUDE.md to understand the project's coding standards,
  directory structure, and conventions
- Follow every guideline specified there — CLAUDE.md is the
  source of truth for how code should be written in this project
- If CLAUDE.md specifies a testing approach, follow it exactly
- If CLAUDE.md specifies commit message formats, use them
- If any instruction here conflicts with CLAUDE.md, CLAUDE.md wins

## Implementation
1. Plan the feature based on $ARGUMENTS
2. Implement following CLAUDE.md conventions
3. Write tests following CLAUDE.md testing guidelines
4. Format code according to CLAUDE.md style rules
5. Summarize what was done

Key Takeaway: Advanced commands combine multiple techniques: argument parsing, environment detection, checkpoints for human approval, and integration with CLAUDE.md. The objective is to design workflows that are capable while retaining human control at critical decision points.

Project Commands and User Commands

The choice between project and user commands is a design decision that affects team workflow. The following detailed comparison clarifies where each type of command should reside.

Aspect	Project Commands	User Commands
Location	`.claude/commands/`	`~/.claude/commands/`
Version controlled	Yes—committed to git	No—local to your machine
Shared with team	Automatically via git	Never (unless manually shared)
Available across projects	Only in that project	In ALL projects
Best for	Team workflows, project-specific tasks	Personal productivity, cross-project utilities
Examples	/deploy, /create-component, /write-post	/explain, /summarize, /standup-notes

When to Use Project Commands

Project commands are appropriate when the command is specific to the project and useful to every team member. Deployment workflows, code scaffolding that follows project conventions, and review checklists that enforce team standards all belong as project commands. The principal advantage is consistency: a new developer joining the team obtains the same set of automated workflows as everyone else, configured for the specific project.

When to Use User Commands

User commands are appropriate for personal productivity and cross-project utilities. Examples include /explain (explain any code in detail), /summarize (summarize the day’s work), or /standup-notes (generate stand-up notes from recent git history). These commands are useful in every project but reflect personal workflow rather than a team standard.

A useful heuristic: if the command references specific files, directories, or tools within the project, it is a project command. If it operates generically with any codebase, it is a user command.

Integration with CLAUDE.md

The relationship between CLAUDE.md and custom commands is one of the most important architectural decisions in a Claude Code project. CLAUDE.md functions as a constitution and custom commands as laws: commands should implement and extend the principles defined in CLAUDE.md and never contradict them.

CLAUDE.md as the Source of Truth

CLAUDE.md is loaded automatically by Claude Code at the start of every session. It defines project-wide conventions: coding style, directory structure, testing approach, deployment targets, and constraints. Custom commands inherit this context automatically; when a command directs Claude to “follow the project’s conventions,” Claude has already obtained those conventions from CLAUDE.md.

The result is that commands can be shorter and more focused. Rather than repeating the coding-style guide in every command, the guide is defined once in CLAUDE.md and referenced from commands:

# In CLAUDE.md:
## Coding Standards
- Use TypeScript strict mode
- All functions must have return types
- Use Prettier with the project's .prettierrc
- Tests use Vitest with describe/it blocks
- Components use the Composition API (no Options API)

# Then in .claude/commands/create-feature.md:
Create a new feature: $ARGUMENTS

Follow all coding standards from CLAUDE.md exactly.
...

Example: CLAUDE.md and a Command Working Together

A concrete example illustrates how the two components complement one another. Suppose CLAUDE.md contains the following:

# CLAUDE.md
## Project Structure
- API routes go in `src/routes/`
- Business logic goes in `src/services/`
- Database queries go in `src/repositories/`
- Tests mirror the source structure in `tests/`

## API Conventions
- All endpoints return JSON with `{ data, error, meta }` structure
- Use Zod for request validation
- Authentication via Bearer token in Authorization header
- Rate limiting on all public endpoints

The corresponding /api-endpoint command can then be considerably simpler because it relies on these conventions:

# .claude/commands/api-endpoint.md

Create a new API endpoint: $ARGUMENTS

Follow the project structure and API conventions defined in CLAUDE.md.

1. Create the route handler in the appropriate file under src/routes/
2. Create or update the service in src/services/
3. Create or update the repository in src/repositories/ if DB access
   is needed
4. Add Zod validation schemas for request/response
5. Create tests mirroring the source structure in tests/
6. Ensure the endpoint returns the standard { data, error, meta }
   response format

All conventions from CLAUDE.md apply — do not deviate.

The command is concise because CLAUDE.md provides the detailed context. This is a powerful pattern: conventions are defined once and referenced throughout.

Organizing Commands for Large Projects

As a command library grows, organization becomes important. A project containing twenty commands in a flat directory becomes difficult to navigate. The following strategies have proven effective in keeping the structure manageable.

Naming Conventions

A consistent naming-prefix system groups related commands:

.claude/commands/
├── deploy.md               # /deploy
├── deploy-staging.md       # /deploy-staging
├── deploy-production.md    # /deploy-production
├── create-component.md     # /create-component
├── create-service.md       # /create-service
├── create-migration.md     # /create-migration
├── review-code.md          # /review-code
├── review-security.md      # /review-security
├── test-unit.md            # /test-unit
├── test-integration.md     # /test-integration
├── test-e2e.md             # /test-e2e
└── fix-bug.md              # /fix-bug

Prefix-based naming (deploy-*, create-*, review-*, test-*) causes related commands to sort together alphabetically, simplifying discovery in the / menu.

Command Discovery

Claude Code provides a built-in discovery mechanism: typing / displays all available commands. Every command created is therefore instantly discoverable by the developer and the team. For larger command libraries, a /help command that lists all available commands with brief descriptions can be useful:

# File: .claude/commands/help.md

List all available custom commands in this project.

Read all .md files in .claude/commands/ and for each one:
1. Show the command name (filename without .md)
2. Read the first line or paragraph to get a brief description
3. Note if it accepts $ARGUMENTS

Format as a clean table:
| Command | Description | Arguments |
|---------|-------------|-----------|

Sort alphabetically by command name.

Documentation Within Commands

Every command file should begin with a clear, one-line description of its purpose. This serves two functions: it informs Claude what the command is for, and it renders the command self-documenting for team members who read the file:

# File: .claude/commands/deploy.md

Deploy the application to staging or production environments.
Usage: /deploy [staging|production]

## Steps:
...

Caution: Deeply nested subdirectory structures within .claude/commands/ should be avoided. Although organizing commands into deploy/, create/, and test/ subdirectories may appear logical, the current behaviour of Claude Code with subdirectories should be verified before adopting that structure. Flat directories with prefix-based naming represent the most reliable approach.

Common Mistakes and How to Avoid Them

Examination of numerous custom commands across teams and projects reveals certain mistakes that occur repeatedly. The following sections describe the most common pitfalls and their remedies.

Overly Vague Instructions

This is the most common mistake. The instruction “clean up the code” may mean anything from renaming variables to rewriting an entire module. Claude will make reasonable choices, but they may not be the choices the developer intends. Specify exactly what “clean up” means in the relevant context: remove unused imports, add type annotations, extract long functions, fix linter warnings, or whatever is intended.

Failure to Specify File Paths

Commands that direct Claude to “update the configuration” force the tool to guess which configuration file is meant. In a typical project, files such as config.json, .env, tsconfig.json, package.json, .eslintrc, and a dozen others may be present. Explicit instructions are preferable: “update the database configuration in config/database.yml.”

Missing Error Handling

Commands without error-handling instructions produce unpredictable results when failures occur. What should Claude do if the build fails, a file does not exist, or a test times out? Explicit error handling should be added for every step that could fail: “If the build fails, read the error output, fix the issue, and retry. If it fails a second time, stop and report the errors.”

Overly Complex Single Commands

A 200-line command file that handles deployment, testing, monitoring, rollback, notification, and documentation is fragile and difficult to maintain. If one component fails, the entire command becomes unreliable. Such files should be decomposed into focused commands: /deploy, /test, /monitor, /rollback. Each is easier to write, test, debug, and maintain.

Insufficient Testing Before Sharing

Before committing a project command for team-wide use, it should be tested thoroughly. The command should be exercised with different arguments, including edge cases such as empty arguments, incorrect file paths, and unexpected input. A command that fails on first use erodes team confidence in the entire system. Testing with --dry-run flags where possible and verifying that the output matches expectations before sharing is advisable.

Omission of Constraints

Without explicit constraints, Claude may modify files that were not intended to be changed, install unwanted packages, or push to unintended branches. Every command should include a constraints section that defines the boundaries: which files are off-limits, what operations are forbidden, and what requires explicit user confirmation.

Mistake	Symptom	Fix
Vague instructions	Inconsistent results across runs	List specific actions and expectations
No file paths	Claude edits the wrong file	Reference every file by its exact path
No error handling	Command hangs or produces garbage on failure	Add “if X fails, then do Y” for each step
Monolithic commands	Hard to debug, one failure breaks everything	Split into focused single-purpose commands
No testing	Team loses confidence in commands	Test with edge cases before committing
Missing constraints	Unintended file modifications or operations	Add explicit “do NOT” rules for every command

Real-World Command Libraries by Technology Stack

The following curated command sets for popular technology stacks provide a starting point. Each set represents the kind of command library that a mature team would maintain.

Python Stack (FastAPI / Django / Flask)

.claude/commands/
├── create-endpoint.md      # Scaffold a new API endpoint
├── create-model.md         # Create a new SQLAlchemy/Django model
├── create-migration.md     # Generate an Alembic/Django migration
├── write-tests.md          # Generate pytest tests for a module
├── review-code.md          # Code review with Python-specific checks
├── lint-fix.md             # Run ruff/flake8 and auto-fix issues
├── type-check.md           # Run mypy and fix type errors
├── deploy.md               # Deploy via Docker/Kubernetes/Lightsail
├── create-service.md       # Scaffold a new service layer class
└── create-cli.md           # Scaffold a new Click/Typer CLI command

A Python-specific /create-endpoint command would include patterns for Pydantic request/response models, dependency injection, and async handlers—conventions that differ substantially from those of JavaScript frameworks.

Node.js Stack (Express / Next.js / NestJS)

.claude/commands/
├── create-component.md     # React/Vue component with tests
├── create-page.md          # Next.js page with SSR/SSG
├── create-api-route.md     # API route handler
├── create-hook.md          # Custom React hook with tests
├── write-tests.md          # Jest/Vitest test generation
├── review-code.md          # Code review with TS/JS checks
├── lint-fix.md             # Run ESLint and Prettier fixes
├── deploy.md               # Deploy to Vercel/AWS/Netlify
├── create-middleware.md    # Express/NestJS middleware
└── storybook.md            # Generate Storybook stories

Go Stack

.claude/commands/
├── create-handler.md       # HTTP handler with middleware
├── create-service.md       # Service with interface and impl
├── create-repository.md    # Database repository pattern
├── create-migration.md     # SQL migration files
├── write-tests.md          # Table-driven test generation
├── review-code.md          # Code review with Go idiom checks
├── lint-fix.md             # Run golangci-lint and fix issues
├── create-proto.md         # Protobuf definition + generated code
├── benchmark.md            # Write and run benchmarks
└── deploy.md               # Build and deploy Go binary

DevOps Commands (Cross-Stack)

.claude/commands/
├── docker-build.md         # Build and tag Docker images
├── docker-compose-up.md    # Start all services with health checks
├── k8s-deploy.md           # Kubernetes deployment workflow
├── create-pipeline.md      # Scaffold CI/CD pipeline config
├── create-dockerfile.md    # Generate optimized Dockerfile
├── ssl-check.md            # Check SSL certificate expiry
├── log-analyze.md          # Analyze recent error logs
├── scale.md                # Scale services up or down
├── rollback.md             # Rollback to previous deployment
└── infra-audit.md          # Audit infrastructure configuration

Documentation Commands

.claude/commands/
├── document-api.md         # Generate API documentation
├── document-function.md    # Add JSDoc/docstrings to functions
├── update-readme.md        # Update README based on current state
├── changelog.md            # Generate changelog from git history
├── adr.md                  # Create Architecture Decision Record
├── runbook.md              # Generate operations runbook
└── diagram.md              # Generate Mermaid architecture diagrams

Documentation commands are particularly valuable because documentation is the task most developers avoid. Automating it with a slash command removes the friction entirely. A simple /document-api can analyse route handlers and generate comprehensive API documentation within seconds.

Tip: Begin with three to five commands that address the most frequent tasks. Additional commands can be added as repetitive workflows are identified. A well-curated library of ten to fifteen commands covers most development needs without becoming unwieldy.

Conclusion

Custom commands in Claude Code are not merely a convenience; they constitute a fundamentally different mode of working with AI in a development workflow. Rather than typing the same detailed instructions whenever a deploy, scaffold, review, or test is needed, the developer encodes that knowledge once in a Markdown file and invokes it with a single slash command for the remainder of the project’s lifetime.

The effect is immediate and measurable. Teams that adopt custom commands report substantially reduced time on repetitive workflows. The deeper benefit, however, is consistency. When every team member uses the same /deploy command, deployments follow the same process each time. When everyone uses the same /review-code command, code reviews examine the same items. Tribal knowledge that previously resided in one senior developer’s mind becomes encoded in files that the entire team can use, improve, and version-control.

A practical path forward is the following. Begin with three commands: one for the most frequent task (typically code scaffolding or deployment), one for the most disliked task (typically writing tests or documentation), and one for the team’s most significant pain point (typically code review or environment setup). These should be written following the patterns described in this guide: specific instructions, clear steps, explicit constraints, and error handling. They should be tested, refined, and committed to the repository.

Iteration follows. Whenever the developer notices that the same detailed instructions have been provided to Claude Code for a third time, those instructions should be converted into a command. When a colleague asks how to deploy or what the testing convention is, the relevant command can serve as the reference. Over time, the .claude/commands/ directory becomes a living, executable operations manual for the project, one that does not merely describe workflows but runs them.

The developers who derive the greatest benefit from AI coding tools are not those who type the fastest prompts. They are those who build systems that make every subsequent interaction faster, more consistent, and more reliable. Custom commands are the mechanism by which such a system is constructed in Claude Code. The ten commands in this guide provide a starting point; adapting them to a particular project and building from there yields substantial returns over time.

References

April 4, 2026

Building REST APIs with FastAPI: A Modern Python Web Framework Guide

This post examines FastAPI in 2026 and demonstrates how to construct a production-ready REST API from scratch. In December 2018, a Colombian developer named Sebastián Ramírez pushed the first commit of a Python web framework to GitHub. Six years later, that project — FastAPI — has surpassed 80,000 stars, overtaken Flask in monthly downloads, and become the framework of choice at Netflix, Uber, Microsoft, and hundreds of startups building production APIs. The questions that arise are clear: what makes FastAPI so compelling that companies are rewriting entire API layers around it, and how can its capabilities be applied to build robust, production-ready REST APIs?

For anyone familiar with the Python web ecosystem, the landscape has been dominated by two heavyweights for more than a decade: Flask, the minimalist micro-framework valued for its simplicity, and Django with its REST Framework, the batteries-included monolith favoured by enterprises. Both are excellent tools. They were designed, however, in a world before type hints became standard, before async was a first-class citizen in Python, and before API-first architectures became the default approach to building software.

FastAPI was created in a different environment. It leverages modern Python features that make Python one of the most productive languages available today — type annotations, async/await, and Pydantic data validation — to deliver something that approaches a transformation in developer experience: ordinary annotated Python functions are written, and the framework automatically generates interactive API documentation, validates every request and response, and runs with performance that rivals Node.js and Go. This is not marketing rhetoric. Independent benchmarks consistently show FastAPI handling 2–5x more requests per second than Flask.

This guide builds a complete REST API from zero to deployment. By the end, the reader will possess a fully functional task-management API with CRUD operations, database persistence, authentication, tests, and a production deployment strategy. Every code example is complete and runnable, permitting the reader to follow along step by step and conclude with a working API.

The discussion follows.

Summary

What this post covers: A zero-to-deployment FastAPI tutorial that builds a complete task-manager REST API with CRUD endpoints, Pydantic validation, SQLAlchemy persistence, JWT authentication, tests, and a production deployment strategy.

Key insights:

FastAPI’s appeal is structural, not cosmetic—type hints + Pydantic + ASGI/Starlette give you automatic OpenAPI docs, request/response validation, and async I/O from the same function signature you would have written anyway.
Independent benchmarks show FastAPI handling 2–5x more requests per second than Flask, putting it in the same performance class as Node.js and Go for typical I/O-bound workloads.
Use Pydantic models as the single source of truth for request bodies, response shapes, and OpenAPI schema—if you find yourself duplicating field definitions between models and SQLAlchemy tables, you are doing it wrong.
Authentication is best implemented with FastAPI’s Depends() system: a single get_current_user dependency injected into protected routes keeps JWT decoding, expiry checks, and DB lookups out of your endpoint code.
For production, the right stack is Uvicorn (or Gunicorn with Uvicorn workers) behind Nginx, with structured logging, CORS configured explicitly per origin, and tests written against TestClient so they exercise the real ASGI app, not a mock.

Main topics: Why FastAPI, Setting Up Your Environment, Your First API—Hello World, Building a Complete CRUD API—Task Manager, Request Validation and Pydantic Models, Path Parameters Query Parameters and Request Body, Adding a Database with SQLAlchemy, Authentication and Security, Middleware CORS and Error Handling, Testing Your API, Deployment, Best Practices.

Why FastAPI?

Before any code is written, the characteristics that distinguish FastAPI and explain its rapid adoption in the Python community warrant examination.

Automatic OpenAPI and Swagger Documentation

Every FastAPI application automatically generates an OpenAPI schema and serves an interactive Swagger UI at /docs and a ReDoc interface at /redoc. No plugins must be installed, no YAML files written, and no separate documentation maintained. The code is the documentation, and the two are always in sync.

Type Hints and Pydantic Validation

FastAPI is built on top of Pydantic, the most widely used data-validation library in Python. Request and response models are defined as simple Python classes with type annotations, and FastAPI automatically validates incoming data, serialises outgoing data, and generates accurate schema documentation — all from the same model definition.

Native Async Support

FastAPI natively supports Python’s async/await syntax. This permits the API to handle thousands of concurrent connections efficiently without blocking, which is critical for I/O-bound workloads such as database queries, external API calls, and file operations. Regular synchronous functions are also supported; FastAPI handles both seamlessly.

Performance Comparable to Node.js and Go

Owing to its ASGI foundation (powered by Starlette) and the Uvicorn server, FastAPI delivers exceptional performance. In the TechEmpower Web Framework Benchmarks, Python ASGI frameworks consistently outperform traditional WSGI frameworks by significant margins.

Framework Comparison

Feature	FastAPI	Flask	Django REST	Express.js
Auto Documentation	Built-in	Plugin required	Plugin required	Plugin required
Data Validation	Built-in (Pydantic)	Manual / Marshmallow	Built-in (Serializers)	Manual / Joi
Async Support	Native	Limited	Django 4.1+	Native
Performance (req/s)	~15,000+	~3,000	~2,500	~18,000+
Learning Curve	Easy	Very Easy	Moderate	Easy
Type Safety	Full (type hints)	None	Partial	TypeScript optional
Dependency Injection	Built-in	No	No	No

Key Takeaway: FastAPI provides the simplicity of Flask, the features of Django REST Framework, and performance that approaches Node.js — all in one package. For any new Python API project in 2026, FastAPI is the appropriate default choice.

Setting Up Your Environment

A clean development environment is the appropriate starting point. The discussion uses Python 3.11+ (though 3.9+ is also acceptable) and creates an isolated virtual environment for the project.

Verify the Python Installation

python3 --version
# Python 3.11.x or higher recommended

Create the Project Directory

mkdir fastapi-task-manager
cd fastapi-task-manager

Set Up a Virtual Environment

Two options are available. The classic venv approach is one:

# Option 1: Classic venv
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Option 2: Using uv (much faster)
pip install uv
uv venv
source .venv/bin/activate

Tip: Anyone unfamiliar with uv should consider trying it. It is a Rust-based Python package manager that installs dependencies 10–100x faster than pip and is rapidly becoming the standard tool for Python project management.

Install FastAPI and Uvicorn

# Install FastAPI with all optional dependencies
pip install "fastapi[standard]"

# This installs:
# - fastapi (the framework)
# - uvicorn (the ASGI server)
# - pydantic (data validation)
# - starlette (the underlying ASGI toolkit)
# - httpx (for testing)
# - python-multipart (for form data)
# - jinja2 (for templates, if needed)

Project Structure

A clean project structure that will scale as the API grows is appropriate from the outset:

fastapi-task-manager/
├── app/
│   ├── __init__.py
│   ├── main.py            # FastAPI app entry point
│   ├── models.py           # Pydantic models (schemas)
│   ├── database.py         # Database configuration
│   ├── crud.py             # Database operations
│   ├── auth.py             # Authentication logic
│   └── routers/
│       ├── __init__.py
│       └── tasks.py        # Task endpoints
├── tests/
│   ├── __init__.py
│   └── test_tasks.py       # API tests
├── requirements.txt
├── Dockerfile
└── .env

Create the initial directory structure:

mkdir -p app/routers tests
touch app/__init__.py app/routers/__init__.py tests/__init__.py

A First API: Hello World

The most direct illustration begins with the simplest possible FastAPI application. The framework’s behaviour can then be observed.

Create app/main.py:

from fastapi import FastAPI

app = FastAPI(
    title="Task Manager API",
    description="A complete REST API for managing tasks",
    version="1.0.0",
)


@app.get("/")
def read_root():
    return {"message": "Welcome to the Task Manager API"}


@app.get("/health")
def health_check():
    return {"status": "healthy"}

That is the entire requirement. Seven lines of actual code produce a working API with two endpoints. The application is run as follows:

uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

The --reload flag enables hot reloading, so the server restarts automatically when code is changed. Output of the following form should appear:

INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [12345]
INFO:     Started server process [12346]
INFO:     Waiting for application startup.
INFO:     Application startup complete.

Exploring the Swagger UI

Opening a browser at http://localhost:8000/docs reveals an attractive interactive API documentation page, generated entirely from the code. Any endpoint may be clicked, “Try it out” selected, and requests executed directly from the browser.

The alternative documentation layout is available at http://localhost:8000/redoc, and the raw OpenAPI schema — importable into Postman, Insomnia, or any API client — is available at http://localhost:8000/openapi.json.

Key Takeaway: No documentation code has been written, yet a fully interactive API explorer is available. This is one of FastAPI’s distinguishing features: code and documentation are always in sync because they are the same artefact.

Building a Complete CRUD API: Task Manager

The following section constructs a substantive example: a full task-management API with all CRUD operations, proper validation, error handling, and correct HTTP status codes. The discussion begins with in-memory storage to focus on API design, and a database is added later.

Define Pydantic Models

The first step is to define the data models. Create app/models.py:

from pydantic import BaseModel, Field
from typing import Optional
from datetime import datetime
from enum import Enum


class TaskStatus(str, Enum):
    pending = "pending"
    in_progress = "in_progress"
    completed = "completed"
    cancelled = "cancelled"


class TaskCreate(BaseModel):
    title: str = Field(
        ...,
        min_length=1,
        max_length=200,
        description="The title of the task",
        examples=["Buy groceries"],
    )
    description: Optional[str] = Field(
        None,
        max_length=2000,
        description="Detailed description of the task",
    )
    status: TaskStatus = Field(
        default=TaskStatus.pending,
        description="Current status of the task",
    )
    priority: int = Field(
        default=1,
        ge=1,
        le=5,
        description="Priority level from 1 (lowest) to 5 (highest)",
    )


class TaskUpdate(BaseModel):
    title: Optional[str] = Field(
        None,
        min_length=1,
        max_length=200,
    )
    description: Optional[str] = Field(None, max_length=2000)
    status: Optional[TaskStatus] = None
    priority: Optional[int] = Field(None, ge=1, le=5)


class TaskResponse(BaseModel):
    id: int
    title: str
    description: Optional[str] = None
    status: TaskStatus
    priority: int
    created_at: datetime
    updated_at: datetime

The separation of concerns is important: TaskCreate represents what clients send when creating a task, TaskUpdate allows partial updates (all fields optional), and TaskResponse represents what the API returns. This is a critical design pattern; the internal data model should never be exposed directly.

Build the CRUD Endpoints

The actual API can now be built. Update app/main.py:

from fastapi import FastAPI, HTTPException, Query
from typing import Optional
from datetime import datetime

from app.models import TaskCreate, TaskUpdate, TaskResponse, TaskStatus

app = FastAPI(
    title="Task Manager API",
    description="A complete REST API for managing tasks",
    version="1.0.0",
)

# In-memory storage
tasks_db: dict[int, dict] = {}
task_id_counter = 0


def get_next_id() -> int:
    global task_id_counter
    task_id_counter += 1
    return task_id_counter


@app.get("/")
def read_root():
    return {"message": "Welcome to the Task Manager API"}


@app.get("/tasks", response_model=list[TaskResponse])
def list_tasks(
    status: Optional[TaskStatus] = Query(
        None, description="Filter tasks by status"
    ),
    priority: Optional[int] = Query(
        None, ge=1, le=5, description="Filter tasks by priority"
    ),
    skip: int = Query(0, ge=0, description="Number of tasks to skip"),
    limit: int = Query(
        20, ge=1, le=100, description="Maximum number of tasks to return"
    ),
):
    """Retrieve all tasks with optional filtering and pagination."""
    results = list(tasks_db.values())

    # Apply filters
    if status is not None:
        results = [t for t in results if t["status"] == status]
    if priority is not None:
        results = [t for t in results if t["priority"] == priority]

    # Apply pagination
    return results[skip : skip + limit]


@app.get("/tasks/{task_id}", response_model=TaskResponse)
def get_task(task_id: int):
    """Retrieve a single task by its ID."""
    if task_id not in tasks_db:
        raise HTTPException(
            status_code=404,
            detail=f"Task with ID {task_id} not found",
        )
    return tasks_db[task_id]


@app.post("/tasks", response_model=TaskResponse, status_code=201)
def create_task(task: TaskCreate):
    """Create a new task."""
    now = datetime.utcnow()
    task_id = get_next_id()

    task_data = {
        "id": task_id,
        "title": task.title,
        "description": task.description,
        "status": task.status,
        "priority": task.priority,
        "created_at": now,
        "updated_at": now,
    }
    tasks_db[task_id] = task_data
    return task_data


@app.put("/tasks/{task_id}", response_model=TaskResponse)
def update_task(task_id: int, task_update: TaskUpdate):
    """Update an existing task. Only provided fields will be updated."""
    if task_id not in tasks_db:
        raise HTTPException(
            status_code=404,
            detail=f"Task with ID {task_id} not found",
        )

    existing_task = tasks_db[task_id]
    update_data = task_update.model_dump(exclude_unset=True)

    for field, value in update_data.items():
        existing_task[field] = value

    existing_task["updated_at"] = datetime.utcnow()
    return existing_task


@app.delete("/tasks/{task_id}", status_code=204)
def delete_task(task_id: int):
    """Delete a task by its ID."""
    if task_id not in tasks_db:
        raise HTTPException(
            status_code=404,
            detail=f"Task with ID {task_id} not found",
        )
    del tasks_db[task_id]

The key design decisions in this code merit explanation:

Status code 201 for creation: The POST /tasks endpoint returns 201 (Created) instead of the default 200, which is the correct HTTP semantic for resource creation.

Status code 204 for deletion: The DELETE endpoint returns 204 (No Content) with no response body, which is the standard for successful deletions.

HTTPException for errors: When a task is not found, an HTTPException is raised with a 404 status code and a human-readable detail message. FastAPI converts this into a proper JSON error response automatically.

Partial updates with exclude_unset: The model_dump(exclude_unset=True) call on the update model ensures that only fields explicitly sent by the client are updated. This is the correct behaviour for a PUT/PATCH endpoint.

Testing the CRUD API

The server is started with uvicorn app.main:app --reload, and the following requests may then be issued using curl:

# Create a task
curl -X POST http://localhost:8000/tasks \
  -H "Content-Type: application/json" \
  -d '{"title": "Learn FastAPI", "description": "Complete the tutorial", "priority": 5}'

# List all tasks
curl http://localhost:8000/tasks

# Get a specific task
curl http://localhost:8000/tasks/1

# Update a task
curl -X PUT http://localhost:8000/tasks/1 \
  -H "Content-Type: application/json" \
  -d '{"status": "in_progress"}'

# Filter tasks by status
curl "http://localhost:8000/tasks?status=in_progress"

# Delete a task
curl -X DELETE http://localhost:8000/tasks/1

Tip: All of these endpoints can also be tested interactively through the Swagger UI at http://localhost:8000/docs. It is much faster for exploration than writing curl commands.

Request Validation and Pydantic Models

One of FastAPI’s most powerful features is its deep integration with Pydantic for data validation. The capabilities of Pydantic beyond the basics already discussed are examined below.

Field Validation

Pydantic’s Field function provides fine-grained control over validation:

from pydantic import BaseModel, Field, field_validator
import re


class UserCreate(BaseModel):
    username: str = Field(
        ...,
        min_length=3,
        max_length=50,
        pattern=r"^[a-zA-Z0-9_]+$",
        description="Username (letters, numbers, underscores only)",
    )
    email: str = Field(
        ...,
        min_length=5,
        max_length=255,
        description="Valid email address",
    )
    age: int = Field(
        ...,
        gt=0,
        lt=150,
        description="Age in years",
    )
    score: float = Field(
        default=0.0,
        ge=0.0,
        le=100.0,
        description="Score between 0 and 100",
    )

    @field_validator("email")
    @classmethod
    def validate_email(cls, v: str) -> str:
        if "@" not in v or "." not in v.split("@")[-1]:
            raise ValueError("Invalid email address")
        return v.lower()

The validation constraints available include:

min_length / max_length — for strings
pattern — regex validation for strings
gt / ge / lt / le — greater than, greater or equal, less than, less or equal, for numbers
multiple_of — ensures a number is a multiple of a given value

Nested Models

Pydantic models can be nested to represent complex data structures:

from pydantic import BaseModel
from typing import Optional


class Address(BaseModel):
    street: str
    city: str
    state: str
    zip_code: str
    country: str = "US"


class ContactInfo(BaseModel):
    email: str
    phone: Optional[str] = None
    address: Optional[Address] = None


class Employee(BaseModel):
    name: str
    department: str
    contact: ContactInfo
    tags: list[str] = []


# This would be valid JSON input:
# {
#     "name": "Alice",
#     "department": "Engineering",
#     "contact": {
#         "email": "alice@example.com",
#         "address": {
#             "street": "123 Main St",
#             "city": "San Francisco",
#             "state": "CA",
#             "zip_code": "94102"
#         }
#     },
#     "tags": ["python", "fastapi"]
# }

Custom Validators

For complex validation logic that goes beyond simple field constraints, Pydantic offers model validators that can validate relationships between fields:

from pydantic import BaseModel, model_validator
from datetime import date


class DateRange(BaseModel):
    start_date: date
    end_date: date

    @model_validator(mode="after")
    def validate_date_range(self):
        if self.end_date < self.start_date:
            raise ValueError("end_date must be after start_date")
        return self


class PasswordChange(BaseModel):
    current_password: str
    new_password: str = Field(min_length=8)
    confirm_password: str

    @model_validator(mode="after")
    def passwords_match(self):
        if self.new_password != self.confirm_password:
            raise ValueError("new_password and confirm_password must match")
        if self.new_password == self.current_password:
            raise ValueError("New password must differ from current password")
        return self

When validation fails, FastAPI automatically returns a 422 (Unprocessable Entity) response with detailed error messages explaining exactly what went wrong and where. Clients receive clear, actionable error messages without any error-handling code having to be written.

Path Parameters, Query Parameters, and Request Body

FastAPI provides elegant means of extracting data from every part of an HTTP request. Each mechanism is examined below.

Path Parameters

Path parameters are extracted directly from the URL path and are always required:

from fastapi import Path

@app.get("/tasks/{task_id}/comments/{comment_id}")
def get_comment(
    task_id: int = Path(..., gt=0, description="The task ID"),
    comment_id: int = Path(..., gt=0, description="The comment ID"),
):
    return {"task_id": task_id, "comment_id": comment_id}

Query Parameters with Pagination

Query parameters are well suited to filtering, sorting, and pagination:

from fastapi import Query
from typing import Optional
from enum import Enum


class SortField(str, Enum):
    created_at = "created_at"
    priority = "priority"
    title = "title"


class SortOrder(str, Enum):
    asc = "asc"
    desc = "desc"


@app.get("/tasks")
def list_tasks(
    # Filtering
    status: Optional[TaskStatus] = Query(None),
    priority: Optional[int] = Query(None, ge=1, le=5),
    search: Optional[str] = Query(
        None, min_length=1, max_length=100,
        description="Search in title and description",
    ),
    # Sorting
    sort_by: SortField = Query(
        SortField.created_at, description="Field to sort by"
    ),
    order: SortOrder = Query(
        SortOrder.desc, description="Sort order"
    ),
    # Pagination
    skip: int = Query(0, ge=0, description="Records to skip"),
    limit: int = Query(20, ge=1, le=100, description="Max records"),
):
    """List tasks with filtering, sorting, and pagination."""
    results = list(tasks_db.values())

    if status:
        results = [t for t in results if t["status"] == status]
    if priority:
        results = [t for t in results if t["priority"] == priority]
    if search:
        results = [
            t for t in results
            if search.lower() in t["title"].lower()
            or (t["description"] and search.lower() in t["description"].lower())
        ]

    reverse = order == SortOrder.desc
    results.sort(key=lambda t: t[sort_by.value], reverse=reverse)

    return {
        "total": len(results),
        "skip": skip,
        "limit": limit,
        "tasks": results[skip : skip + limit],
    }

Combining Path, Query, and Body in One Endpoint

from fastapi import Path, Query, Body

@app.put("/projects/{project_id}/tasks/{task_id}")
def update_project_task(
    project_id: int = Path(..., gt=0),       # From URL path
    task_id: int = Path(..., gt=0),          # From URL path
    notify: bool = Query(False),              # From query string
    task_update: TaskUpdate = Body(...),      # From request body
):
    """
    URL: PUT /projects/5/tasks/42?notify=true
    Body: {"title": "Updated title", "priority": 3}
    """
    # project_id = 5 (from path)
    # task_id = 42 (from path)
    # notify = True (from query)
    # task_update = TaskUpdate(title="Updated title", priority=3) (from body)
    return {
        "project_id": project_id,
        "task_id": task_id,
        "notify": notify,
        "updates": task_update.model_dump(exclude_unset=True),
    }

FastAPI automatically determines where each parameter originates based on its type: simple types are path or query parameters, while Pydantic models constitute the request body. The Path, Query, and Body functions allow validation and documentation to be attached to each.

Adding a Database with SQLAlchemy

In-memory storage is acceptable for prototyping, but any real application requires persistent data storage. The following section integrates SQLite with SQLAlchemy; the same pattern works with PostgreSQL, MySQL, or any other database.

Install Database Dependencies

pip install sqlalchemy

Database Configuration

Create app/database.py:

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker, DeclarativeBase

SQLALCHEMY_DATABASE_URL = "sqlite:///./tasks.db"
# For PostgreSQL:
# SQLALCHEMY_DATABASE_URL = "postgresql://user:password@localhost/dbname"

engine = create_engine(
    SQLALCHEMY_DATABASE_URL,
    connect_args={"check_same_thread": False},  # SQLite only
)

SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)


class Base(DeclarativeBase):
    pass


def get_db():
    """Dependency that provides a database session per request."""
    db = SessionLocal()
    try:
        yield db
    finally:
        db.close()

Define Database Models

Create app/db_models.py:

from sqlalchemy import Column, Integer, String, DateTime, Enum as SQLEnum
from sqlalchemy.sql import func

from app.database import Base
from app.models import TaskStatus


class TaskDB(Base):
    __tablename__ = "tasks"

    id = Column(Integer, primary_key=True, index=True, autoincrement=True)
    title = Column(String(200), nullable=False)
    description = Column(String(2000), nullable=True)
    status = Column(
        SQLEnum(TaskStatus), default=TaskStatus.pending, nullable=False
    )
    priority = Column(Integer, default=1, nullable=False)
    created_at = Column(
        DateTime(timezone=True), server_default=func.now()
    )
    updated_at = Column(
        DateTime(timezone=True),
        server_default=func.now(),
        onupdate=func.now(),
    )

CRUD Operations Module

Create app/crud.py to separate database logic from endpoint logic:

from sqlalchemy.orm import Session
from typing import Optional

from app.db_models import TaskDB
from app.models import TaskCreate, TaskUpdate, TaskStatus


def get_tasks(
    db: Session,
    status: Optional[TaskStatus] = None,
    priority: Optional[int] = None,
    skip: int = 0,
    limit: int = 20,
) -> list[TaskDB]:
    query = db.query(TaskDB)

    if status is not None:
        query = query.filter(TaskDB.status == status)
    if priority is not None:
        query = query.filter(TaskDB.priority == priority)

    return query.offset(skip).limit(limit).all()


def get_task(db: Session, task_id: int) -> Optional[TaskDB]:
    return db.query(TaskDB).filter(TaskDB.id == task_id).first()


def create_task(db: Session, task: TaskCreate) -> TaskDB:
    db_task = TaskDB(**task.model_dump())
    db.add(db_task)
    db.commit()
    db.refresh(db_task)
    return db_task


def update_task(
    db: Session, task_id: int, task_update: TaskUpdate
) -> Optional[TaskDB]:
    db_task = db.query(TaskDB).filter(TaskDB.id == task_id).first()
    if db_task is None:
        return None

    update_data = task_update.model_dump(exclude_unset=True)
    for field, value in update_data.items():
        setattr(db_task, field, value)

    db.commit()
    db.refresh(db_task)
    return db_task


def delete_task(db: Session, task_id: int) -> bool:
    db_task = db.query(TaskDB).filter(TaskDB.id == task_id).first()
    if db_task is None:
        return False
    db.delete(db_task)
    db.commit()
    return True

Refactored Endpoints with Database

The endpoints in app/main.py are now updated to use the database:

from fastapi import FastAPI, HTTPException, Query, Depends
from sqlalchemy.orm import Session
from typing import Optional

from app.models import (
    TaskCreate, TaskUpdate, TaskResponse, TaskStatus,
)
from app.database import engine, get_db
from app.db_models import Base
from app import crud

# Create database tables on startup
Base.metadata.create_all(bind=engine)

app = FastAPI(
    title="Task Manager API",
    description="A complete REST API for managing tasks",
    version="1.0.0",
)


@app.get("/")
def read_root():
    return {"message": "Welcome to the Task Manager API"}


@app.get("/tasks", response_model=list[TaskResponse])
def list_tasks(
    status: Optional[TaskStatus] = Query(None),
    priority: Optional[int] = Query(None, ge=1, le=5),
    skip: int = Query(0, ge=0),
    limit: int = Query(20, ge=1, le=100),
    db: Session = Depends(get_db),
):
    """Retrieve all tasks with optional filtering and pagination."""
    return crud.get_tasks(db, status=status, priority=priority,
                          skip=skip, limit=limit)


@app.get("/tasks/{task_id}", response_model=TaskResponse)
def get_task(task_id: int, db: Session = Depends(get_db)):
    """Retrieve a single task by its ID."""
    task = crud.get_task(db, task_id)
    if task is None:
        raise HTTPException(status_code=404,
                            detail=f"Task {task_id} not found")
    return task


@app.post("/tasks", response_model=TaskResponse, status_code=201)
def create_task(task: TaskCreate, db: Session = Depends(get_db)):
    """Create a new task."""
    return crud.create_task(db, task)


@app.put("/tasks/{task_id}", response_model=TaskResponse)
def update_task(
    task_id: int,
    task_update: TaskUpdate,
    db: Session = Depends(get_db),
):
    """Update an existing task."""
    task = crud.update_task(db, task_id, task_update)
    if task is None:
        raise HTTPException(status_code=404,
                            detail=f"Task {task_id} not found")
    return task


@app.delete("/tasks/{task_id}", status_code=204)
def delete_task(task_id: int, db: Session = Depends(get_db)):
    """Delete a task by its ID."""
    if not crud.delete_task(db, task_id):
        raise HTTPException(status_code=404,
                            detail=f"Task {task_id} not found")

The key change is the Depends(get_db) pattern. This is FastAPI’s dependency injection system: it automatically creates a database session for each request and closes it when the request is complete, even if an error occurs. The pattern is clean, testable, and avoids global state.

Tip: For new projects, SQLModel may be preferable to separate SQLAlchemy and Pydantic models. Created by the same author as FastAPI, SQLModel permits a single class to serve as both Pydantic model and SQLAlchemy model, significantly reducing duplication.

Authentication and Security

No production API is complete without authentication. Two approaches are implemented below: a simple API key for server-to-server communication, and JWT tokens for user-facing authentication.

Simple API Key Authentication

Create app/auth.py:

from fastapi import Depends, HTTPException, Security, status
from fastapi.security import APIKeyHeader, OAuth2PasswordBearer, OAuth2PasswordRequestForm
from jose import JWTError, jwt
from passlib.context import CryptContext
from datetime import datetime, timedelta
from typing import Optional
from pydantic import BaseModel

# ── API Key Authentication ──────────────────────────

API_KEY = "your-secret-api-key-here"  # In production, load from env
api_key_header = APIKeyHeader(name="X-API-Key")


def verify_api_key(api_key: str = Security(api_key_header)):
    if api_key != API_KEY:
        raise HTTPException(
            status_code=status.HTTP_403_FORBIDDEN,
            detail="Invalid API key",
        )
    return api_key


# ── JWT Authentication ──────────────────────────────

SECRET_KEY = "your-jwt-secret-key"  # In production, load from env
ALGORITHM = "HS256"
ACCESS_TOKEN_EXPIRE_MINUTES = 30

pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")
oauth2_scheme = OAuth2PasswordBearer(tokenUrl="token")


class Token(BaseModel):
    access_token: str
    token_type: str


class TokenData(BaseModel):
    username: Optional[str] = None


class User(BaseModel):
    username: str
    email: str
    disabled: bool = False


class UserInDB(User):
    hashed_password: str


# Simulated user database
fake_users_db = {
    "admin": {
        "username": "admin",
        "email": "admin@example.com",
        "hashed_password": pwd_context.hash("secretpassword"),
        "disabled": False,
    }
}


def verify_password(plain_password: str, hashed_password: str) -> bool:
    return pwd_context.verify(plain_password, hashed_password)


def create_access_token(
    data: dict, expires_delta: Optional[timedelta] = None
) -> str:
    to_encode = data.copy()
    expire = datetime.utcnow() + (
        expires_delta or timedelta(minutes=15)
    )
    to_encode.update({"exp": expire})
    return jwt.encode(to_encode, SECRET_KEY, algorithm=ALGORITHM)


def get_current_user(token: str = Depends(oauth2_scheme)) -> User:
    credentials_exception = HTTPException(
        status_code=status.HTTP_401_UNAUTHORIZED,
        detail="Could not validate credentials",
        headers={"WWW-Authenticate": "Bearer"},
    )
    try:
        payload = jwt.decode(token, SECRET_KEY, algorithms=[ALGORITHM])
        username: str = payload.get("sub")
        if username is None:
            raise credentials_exception
    except JWTError:
        raise credentials_exception

    user_data = fake_users_db.get(username)
    if user_data is None:
        raise credentials_exception

    return User(**user_data)

Protecting Endpoints

Any endpoint can now be protected by adding the dependency:

from app.auth import (
    verify_api_key, get_current_user, User, Token,
    create_access_token, verify_password, fake_users_db,
    ACCESS_TOKEN_EXPIRE_MINUTES,
)
from fastapi.security import OAuth2PasswordRequestForm


# Token endpoint for JWT login
@app.post("/token", response_model=Token)
def login(form_data: OAuth2PasswordRequestForm = Depends()):
    user_data = fake_users_db.get(form_data.username)
    if not user_data or not verify_password(
        form_data.password, user_data["hashed_password"]
    ):
        raise HTTPException(
            status_code=401,
            detail="Incorrect username or password",
            headers={"WWW-Authenticate": "Bearer"},
        )

    access_token = create_access_token(
        data={"sub": form_data.username},
        expires_delta=timedelta(minutes=ACCESS_TOKEN_EXPIRE_MINUTES),
    )
    return {"access_token": access_token, "token_type": "bearer"}


# Protected endpoint — requires JWT token
@app.get("/users/me", response_model=User)
def read_users_me(current_user: User = Depends(get_current_user)):
    return current_user


# Protected endpoint — requires API key
@app.delete("/admin/clear-tasks", dependencies=[Depends(verify_api_key)])
def clear_all_tasks(db: Session = Depends(get_db)):
    db.query(TaskDB).delete()
    db.commit()
    return {"message": "All tasks deleted"}

Install the required packages for JWT authentication:

pip install python-jose[cryptography] passlib[bcrypt]

Caution: Secret keys and passwords must never be hard-coded in source code. In a production application, SECRET_KEY, API_KEY, and database credentials should always be loaded from environment variables using python-dotenv or pydantic-settings. The hard-coded values here are for tutorial purposes only. For a broader treatment of containerising the API securely, see the related Docker containers explained guide.

Middleware, CORS, and Error Handling

As the API grows, cross-cutting concerns such as CORS support (so that frontends can call the API), request logging, and global error handling become necessary.

Adding CORS for Frontend Access

from fastapi.middleware.cors import CORSMiddleware

app.add_middleware(
    CORSMiddleware,
    allow_origins=[
        "http://localhost:3000",      # React dev server
        "https://yourdomain.com",      # Production frontend
    ],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

Custom Middleware for Logging and Timing

import time
import logging
from fastapi import Request

logger = logging.getLogger("api")


@app.middleware("http")
async def log_requests(request: Request, call_next):
    start_time = time.time()

    # Process the request
    response = await call_next(request)

    # Calculate duration
    duration = time.time() - start_time

    logger.info(
        f"{request.method} {request.url.path} "
        f"- Status: {response.status_code} "
        f"- Duration: {duration:.3f}s"
    )

    # Add timing header to response
    response.headers["X-Process-Time"] = f"{duration:.3f}"
    return response

Global Exception Handlers

from fastapi import Request
from fastapi.responses import JSONResponse


@app.exception_handler(ValueError)
async def value_error_handler(request: Request, exc: ValueError):
    return JSONResponse(
        status_code=400,
        content={
            "error": "Bad Request",
            "detail": str(exc),
        },
    )


@app.exception_handler(Exception)
async def general_exception_handler(request: Request, exc: Exception):
    logger.error(f"Unhandled exception: {exc}", exc_info=True)
    return JSONResponse(
        status_code=500,
        content={
            "error": "Internal Server Error",
            "detail": "An unexpected error occurred",
        },
    )

The general exception handler is particularly important for production: it prevents stack traces from leaking to clients while still logging the full error for debugging.

Testing the API

FastAPI makes testing exceptionally straightforward with its built-in TestClient, which is a wrapper around httpx. The entire API can be tested without starting a server.

Setting Up Tests

Install pytest if it is not already present:

pip install pytest httpx

Create tests/test_tasks.py:

import pytest
from fastapi.testclient import TestClient
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

from app.main import app
from app.database import Base, get_db

# Use an in-memory SQLite database for tests
TEST_DATABASE_URL = "sqlite:///./test.db"
engine = create_engine(
    TEST_DATABASE_URL,
    connect_args={"check_same_thread": False},
)
TestingSessionLocal = sessionmaker(
    autocommit=False, autoflush=False, bind=engine
)


def override_get_db():
    db = TestingSessionLocal()
    try:
        yield db
    finally:
        db.close()


# Override the database dependency
app.dependency_overrides[get_db] = override_get_db
client = TestClient(app)


@pytest.fixture(autouse=True)
def setup_database():
    """Create tables before each test, drop after."""
    Base.metadata.create_all(bind=engine)
    yield
    Base.metadata.drop_all(bind=engine)


def test_read_root():
    response = client.get("/")
    assert response.status_code == 200
    assert response.json() == {"message": "Welcome to the Task Manager API"}


def test_create_task():
    response = client.post(
        "/tasks",
        json={
            "title": "Test Task",
            "description": "A test task",
            "priority": 3,
        },
    )
    assert response.status_code == 201
    data = response.json()
    assert data["title"] == "Test Task"
    assert data["description"] == "A test task"
    assert data["priority"] == 3
    assert data["status"] == "pending"
    assert "id" in data
    assert "created_at" in data


def test_create_task_validation_error():
    response = client.post(
        "/tasks",
        json={"title": "", "priority": 10},  # Empty title, priority too high
    )
    assert response.status_code == 422


def test_get_task():
    # Create a task first
    create_response = client.post(
        "/tasks", json={"title": "Find me"}
    )
    task_id = create_response.json()["id"]

    # Retrieve it
    response = client.get(f"/tasks/{task_id}")
    assert response.status_code == 200
    assert response.json()["title"] == "Find me"


def test_get_task_not_found():
    response = client.get("/tasks/99999")
    assert response.status_code == 404


def test_update_task():
    # Create a task
    create_response = client.post(
        "/tasks", json={"title": "Original Title"}
    )
    task_id = create_response.json()["id"]

    # Update it
    response = client.put(
        f"/tasks/{task_id}",
        json={"title": "Updated Title", "status": "in_progress"},
    )
    assert response.status_code == 200
    assert response.json()["title"] == "Updated Title"
    assert response.json()["status"] == "in_progress"


def test_delete_task():
    # Create a task
    create_response = client.post(
        "/tasks", json={"title": "Delete me"}
    )
    task_id = create_response.json()["id"]

    # Delete it
    response = client.delete(f"/tasks/{task_id}")
    assert response.status_code == 204

    # Verify it is gone
    response = client.get(f"/tasks/{task_id}")
    assert response.status_code == 404


def test_list_tasks_with_filter():
    # Create tasks with different statuses
    client.post(
        "/tasks", json={"title": "Task 1", "status": "pending"}
    )
    client.post(
        "/tasks", json={"title": "Task 2", "status": "completed"}
    )
    client.post(
        "/tasks", json={"title": "Task 3", "status": "pending"}
    )

    # Filter by status
    response = client.get("/tasks?status=pending")
    assert response.status_code == 200
    tasks = response.json()
    assert len(tasks) == 2
    assert all(t["status"] == "pending" for t in tasks)


def test_list_tasks_pagination():
    # Create 5 tasks
    for i in range(5):
        client.post("/tasks", json={"title": f"Task {i}"})

    # Get first page
    response = client.get("/tasks?skip=0&limit=2")
    assert response.status_code == 200
    assert len(response.json()) == 2

    # Get second page
    response = client.get("/tasks?skip=2&limit=2")
    assert response.status_code == 200
    assert len(response.json()) == 2

Run the tests:

pytest tests/ -v

Key Takeaway: The dependency-injection system renders testing clean: the real database is replaced by a test database with a single line (app.dependency_overrides[get_db] = override_get_db). No mocking, no patching, no test doubles. This is one of FastAPI’s most underappreciated features.

Deployment

The following section describes taking the API from development to production.

Running in Production with Gunicorn

In production, Uvicorn should be run behind Gunicorn for process management and multi-worker support:

pip install gunicorn

# Run with 4 worker processes
gunicorn app.main:app \
    --workers 4 \
    --worker-class uvicorn.workers.UvicornWorker \
    --bind 0.0.0.0:8000 \
    --access-logfile - \
    --error-logfile -

A useful rule of thumb for the number of workers is (2 x CPU cores) + 1. For a 2-core server, five workers are appropriate.

Docker Containerisation

A Dockerfile is used to containerise the FastAPI application. For a thorough treatment of Docker from development to production, including multi-stage builds and Docker Compose, see the related Docker containers guide for development and production:

# Use the official Python slim image
FROM python:3.11-slim

# Set working directory
WORKDIR /app

# Install dependencies first (leverages Docker caching)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY app/ ./app/

# Create non-root user for security
RUN adduser --disabled-password --gecos "" appuser
USER appuser

# Expose port
EXPOSE 8000

# Run with Gunicorn in production
CMD ["gunicorn", "app.main:app", \
     "--workers", "4", \
     "--worker-class", "uvicorn.workers.UvicornWorker", \
     "--bind", "0.0.0.0:8000"]

And a docker-compose.yml for easy local testing:

version: "3.8"
services:
  api:
    build: .
    ports:
      - "8000:8000"
    environment:
      - DATABASE_URL=postgresql://postgres:password@db:5432/taskmanager
      - SECRET_KEY=your-production-secret-key
    depends_on:
      - db

  db:
    image: postgres:16
    environment:
      - POSTGRES_DB=taskmanager
      - POSTGRES_PASSWORD=password
    volumes:
      - postgres_data:/var/lib/postgresql/data
    ports:
      - "5432:5432"

volumes:
  postgres_data:

Build and run:

docker-compose up --build

Cloud Deployment Options

Several cloud-deployment options are available, depending on scale and budget:

AWS Lightsail or EC2 — full control, appropriate for small to medium deployments
Google Cloud Run — serverless containers, scaling to zero, pay-per-request pricing
Railway or Render — simple PaaS options with generous free tiers
AWS Lambda with Mangum — serverless deployment using the Mangum ASGI adapter

Best Practices

As an API grows beyond a simple tutorial, the following practices keep the codebase maintainable and the API reliable.

Project Structure for Larger Applications

For larger applications, the code should be organised using FastAPI’s router system:

app/
├── __init__.py
├── main.py                 # App factory, middleware, startup events
├── config.py               # Settings via pydantic-settings
├── database.py             # DB engine, session, base
├── dependencies.py         # Shared dependencies (auth, db session)
├── models/                 # SQLAlchemy models
│   ├── __init__.py
│   ├── task.py
│   └── user.py
├── schemas/                # Pydantic schemas
│   ├── __init__.py
│   ├── task.py
│   └── user.py
├── routers/                # API route handlers
│   ├── __init__.py
│   ├── tasks.py
│   └── users.py
├── services/               # Business logic
│   ├── __init__.py
│   ├── task_service.py
│   └── user_service.py
└── middleware/              # Custom middleware
    ├── __init__.py
    └── logging.py

Each router file has the following structure:

# app/routers/tasks.py
from fastapi import APIRouter, Depends
from sqlalchemy.orm import Session

from app.dependencies import get_db, get_current_user
from app.schemas.task import TaskCreate, TaskResponse
from app.services import task_service

router = APIRouter(
    prefix="/tasks",
    tags=["tasks"],
    dependencies=[Depends(get_current_user)],
)


@router.get("/", response_model=list[TaskResponse])
def list_tasks(db: Session = Depends(get_db)):
    return task_service.get_all_tasks(db)

The main file then includes the routers:

# app/main.py
from fastapi import FastAPI
from app.routers import tasks, users

app = FastAPI(title="Task Manager API")
app.include_router(tasks.router)
app.include_router(users.router)

Environment Variables with Pydantic Settings

# app/config.py
from pydantic_settings import BaseSettings
from functools import lru_cache


class Settings(BaseSettings):
    database_url: str = "sqlite:///./tasks.db"
    secret_key: str = "change-me-in-production"
    api_key: str = "change-me-in-production"
    debug: bool = False
    allowed_origins: list[str] = ["http://localhost:3000"]

    class Config:
        env_file = ".env"


@lru_cache
def get_settings() -> Settings:
    return Settings()


# Usage in endpoints:
# settings = Depends(get_settings)

API Versioning

# Version via URL prefix
v1_router = APIRouter(prefix="/api/v1")
v2_router = APIRouter(prefix="/api/v2")

app.include_router(v1_router)
app.include_router(v2_router)

Rate Limiting

For rate limiting, the slowapi library integrates cleanly with FastAPI:

pip install slowapi

from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address
from slowapi.errors import RateLimitExceeded

limiter = Limiter(key_func=get_remote_address)
app.state.limiter = limiter
app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)


@app.get("/tasks")
@limiter.limit("60/minute")
def list_tasks(request: Request):
    ...

Key Takeaway: FastAPI’s modular architecture — routers, dependency injection, Pydantic settings — makes it straightforward to scale from a single-file prototype to a well-structured production application. The appropriate approach is to begin simply and refactor as the project grows.

Concluding Observations

This guide has covered substantial ground. Beginning from a simple “Hello World” endpoint, a complete task-management API has been constructed with CRUD operations, database persistence using SQLAlchemy, authentication with both API keys and JWT tokens, CORS support, custom middleware, comprehensive tests, and a production deployment configured with Docker.

What distinguishes FastAPI is not any single feature; it is how all of its features work together seamlessly. Type hints drive validation, documentation, and editor support simultaneously. Dependency injection keeps code testable and modular. Pydantic models serve as the single source of truth for data contracts. The async foundation permits the API to handle serious traffic without complex optimisation.

The components constructed in this guide are summarised below:

Component	Technology	Purpose
Framework	FastAPI	API routing, validation, docs
Server	Uvicorn / Gunicorn	ASGI server for production
Validation	Pydantic	Request/response data models
Database	SQLAlchemy + SQLite	Persistent data storage
Authentication	JWT + API Keys	Secure endpoint access
Testing	pytest + TestClient	Automated API testing
Deployment	Docker + Gunicorn	Containerized production setup

For teams seeking still more performance from the API layer, writing performance-critical endpoints as native extensions is becoming practical owing to Python and Rust interoperability via PyO3. For developers migrating from Flask, the transition to FastAPI is remarkably smooth: most concepts map directly, and type safety, auto-generated documentation, and improved performance are gained without additional effort. For developers migrating from Django REST Framework, the lighter weight and more explicit architecture, with comparable functionality, are likely to be appreciated.

The Python web ecosystem has evolved significantly, and FastAPI represents the present state of the art. Whether the project is a simple microservice, a complex multi-tenant SaaS, or a high-performance data API, FastAPI provides the tools to build it cleanly and efficiently.

As the codebase grows, following clean-code principles and using Git best practices for professional developers will keep the API maintainable. Building something real is the appropriate next step. The task manager constructed here can be extended with additional features — tags, due dates, user assignments, notifications — and deployed. The most effective way to learn a framework is to ship something with it.

References

April 4, 2026

How to Install and Use OpenClaw on Windows 11: A Complete Setup Guide

Summary

What this post covers: Three end-to-end installation paths — WSL2, native Windows + Conda, and Docker — for running the OpenClaw robotic-manipulation framework on Windows 11, including GPU acceleration, your first training run, and Windows-specific troubleshooting.

Key insights:

WSL2 with Ubuntu 22.04 is the recommended approach for most Windows 11 users — it delivers near-native Linux performance, supports the full CUDA toolkit, and avoids the dependency rot that plagues native Conda installs of MuJoCo on Windows.
Native Windows + Conda works but requires specific pinned versions of MuJoCo bindings and Visual C++ build tools; expect to spend extra time on environment debugging compared to WSL2.
Docker offers the most reproducible setup but adds GPU passthrough complexity (NVIDIA Container Toolkit on WSL2 backend) and slower disk I/O for large training checkpoints.
GPU acceleration through CUDA delivers roughly 10–50x training-throughput speedups over CPU-only runs; verifying nvidia-smi visibility inside WSL2 before installing PyTorch saves hours of confused debugging.
The most common Windows-specific failures are X11/display issues for the MuJoCo viewer (fixable via WSLg or VcXsrv), path conflicts between Windows and WSL2 home directories, and DLL load errors from mismatched CUDA versions.

Main topics: Introduction, System Requirements, Method 1: WSL2 (Recommended Approach), Method 2: Native Windows with Conda, Method 3: Docker on Windows, Running Your First Experiments, Training Your First Policy, GPU Acceleration and Performance Tips, Troubleshooting Common Windows Issues, Integration with VS Code, Next Steps and Resources, Final Thoughts, References.

Introduction

An often-overlooked fact: more than 70 percent of AI researchers and robotics students operate Windows as their primary operating system, yet almost every serious robotics simulation framework ships with Linux-first documentation and Linux-only installation scripts. Anyone who has examined a GitHub README full of apt-get commands and wondered whether a Windows 11 machine could participate is familiar with the difficulty.

OpenClaw is an open-source robotic manipulation framework designed for AI research. It provides a rich set of simulated environments for dexterous manipulation tasks, including robotic hands grasping objects, assembling parts and performing precise movements that test the limits of reinforcement learning. Built on top of MuJoCo, which is now free and open source, and compatible with widely used RL libraries such as Stable Baselines3, OpenClaw has rapidly become a preferred toolkit for researchers working on manipulation policies.

The complication is that, like most robotics frameworks, OpenClaw was developed with Linux in mind. The official documentation assumes Ubuntu, the CI pipelines test on Linux, and many convenience scripts are written in bash. For the Windows 11 user, getting OpenClaw running can feel like assembling a puzzle with several missing pieces.

This guide addresses that gap. The following sections present three complete installation methods, WSL2, native Windows with Conda, and Docker, each with full command-by-command instructions. By the end, the reader will have OpenClaw running on a Windows 11 machine, will have trained an initial manipulation policy, and will be able to visualise robotic simulations with full GPU acceleration. A Linux dual-boot is not required.

Key Takeaway: Working with current robotics AI frameworks does not require abandoning Windows 11. With WSL2, Conda or Docker, OpenClaw can be run with full GPU acceleration directly from a Windows desktop.

System Requirements

Before proceeding to installation, the machine should be verified as adequate to the task. OpenClaw runs physics simulations and neural network training simultaneously, which requires substantial computational capacity. The required specifications are summarised below.

Hardware Requirements

Component	Minimum	Recommended
OS	Windows 11 21H2	Windows 11 22H2 or later
GPU	NVIDIA GTX 1070 (8GB VRAM)	NVIDIA RTX 3060 12GB or better
RAM	16 GB	32 GB or more
Storage	50 GB free (SSD)	100 GB+ free (NVMe SSD)
CPU	Intel i5 / AMD Ryzen 5	Intel i7/i9 or AMD Ryzen 7/9
Python	3.9	3.10 or 3.11

Software Prerequisites

Regardless of which installation method is selected, the following items should be prepared in advance.

NVIDIA GPU drivers: Version 525.0 or later (download from nvidia.com/drivers)
Windows Terminal: Pre-installed on Windows 11, but grab it from the Microsoft Store if missing
Git for Windows: Download from git-scm.com
A text editor or IDE: VS Code is strongly recommended

To check the current NVIDIA driver version, open PowerShell and run the following.

nvidia-smi

The output should display the driver version and CUDA version. If the command fails, NVIDIA drivers should be installed or updated before proceeding.

Caution: AMD GPUs are not supported for CUDA-accelerated training. Users with AMD GPUs may follow this guide for CPU-only training, but performance will be substantially slower. ROCm support on Windows remains limited for most ML frameworks.

Method 1: WSL2 (Recommended Approach)

WSL2 (Windows Subsystem for Linux 2) is the preferred mechanism for running Linux-native tools on Windows. It provides a real Linux kernel, full system call compatibility and, critically for this purpose, native GPU passthrough. NVIDIA GPUs therefore operate inside WSL2 at near-native performance. For OpenClaw, this is the recommended path because it offers complete Linux compatibility without the operational difficulties of dual-booting.

Step 1: Enable and Install WSL2

Open PowerShell as Administrator and run:

# Install WSL2 with Ubuntu 22.04 (default)
wsl --install -d Ubuntu-22.04

# If WSL is already installed, make sure it's version 2
wsl --set-default-version 2

# Verify installation
wsl --list --verbose

After installation completes, restart the computer. When Ubuntu is opened from the Start menu for the first time, the user is prompted to create a username and password. A simple credential should be chosen, since it will be entered frequently for sudo commands.

# Verify WSL2 is running correctly
wsl --list --verbose

# Expected output:
#   NAME            STATE           VERSION
# * Ubuntu-22.04    Running         2

Step 2: Update the System and Install Base Dependencies

Open the Ubuntu terminal (either from the Start menu or by typing wsl in PowerShell) and run the following commands.

# Update package lists and upgrade existing packages
sudo apt update && sudo apt upgrade -y

# Install essential build tools and libraries
sudo apt install -y \
    build-essential \
    cmake \
    git \
    wget \
    curl \
    unzip \
    pkg-config \
    libgl1-mesa-dev \
    libglu1-mesa-dev \
    libglew-dev \
    libosmesa6-dev \
    libglfw3-dev \
    libxrandr-dev \
    libxinerama-dev \
    libxcursor-dev \
    libxi-dev \
    patchelf \
    python3-dev \
    python3-pip \
    python3-venv \
    software-properties-common

Step 3: Install NVIDIA CUDA Toolkit in WSL2

This is the step that most often causes difficulty. The key point is that NVIDIA drivers must not be installed inside WSL2. The Windows host drivers handle GPU communication. Only the CUDA toolkit is required inside WSL2.

Caution: The nvidia-driver package should NOT be installed inside WSL2. The Windows host driver is shared with WSL2 automatically. Installing a Linux driver inside WSL2 will disable GPU support.

# Add the CUDA repository key and repo
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda-repo-wsl-ubuntu-12-4-local_12.4.0-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-12-4-local_12.4.0-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-12-4-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt update
sudo apt install -y cuda-toolkit-12-4

# Add CUDA to your PATH
echo 'export PATH=/usr/local/cuda-12.4/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc

# Verify CUDA installation
nvcc --version
nvidia-smi

Both commands should succeed. nvidia-smi displays GPU information drawn from the Windows host driver, and nvcc --version confirms that the CUDA compiler is installed.

Step 4: Install MuJoCo

OpenClaw uses MuJoCo as its physics simulation backend. Since DeepMind released MuJoCo as free and open-source software, installation has become substantially simpler.

# Download and extract MuJoCo
mkdir -p ~/.mujoco
wget https://github.com/google-deepmind/mujoco/releases/download/3.1.3/mujoco-3.1.3-linux-x86_64.tar.gz
tar -xzf mujoco-3.1.3-linux-x86_64.tar.gz -C ~/.mujoco/
mv ~/.mujoco/mujoco-3.1.3 ~/.mujoco/mujoco313

# Set environment variables
echo 'export MUJOCO_PATH=$HOME/.mujoco/mujoco313' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=$MUJOCO_PATH/lib:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc

# Test MuJoCo binary
$MUJOCO_PATH/bin/simulate $MUJOCO_PATH/model/humanoid/humanoid.xml &

Tip: If the MuJoCo viewer opens and displays an animated humanoid, GPU passthrough and graphics rendering are functioning correctly inside WSL2.

Step 5: Clone and Install OpenClaw

The next step is to create a dedicated Python virtual environment and install OpenClaw from source.

# Create a workspace directory
mkdir -p ~/robotics && cd ~/robotics

# Clone the OpenClaw repository
git clone https://github.com/openclaw-project/openclaw.git
cd openclaw

# Create and activate a Python virtual environment
python3 -m venv venv
source venv/bin/activate

# Upgrade pip and install build tools
pip install --upgrade pip setuptools wheel

# Install PyTorch with CUDA support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

# Install MuJoCo Python bindings
pip install mujoco==3.1.3

# Install OpenClaw and all dependencies
pip install -e ".[all]"

# Alternatively, install from requirements if available
# pip install -r requirements.txt
# pip install -e .

Verify that the installation completed successfully.

# Verify PyTorch CUDA support
python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CUDA available: {torch.cuda.is_available()}'); print(f'GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else \"N/A\"}')"

# Verify MuJoCo
python -c "import mujoco; print(f'MuJoCo version: {mujoco.__version__}')"

# Verify OpenClaw
python -c "import openclaw; print(f'OpenClaw loaded successfully')"

Step 6: Set Up GUI Forwarding for Visualization

Windows 11 ships with WSLg (Windows Subsystem for Linux GUI), which causes graphical applications to operate transparently in most cases. On Windows 11 22H2 or later, GUI forwarding should be automatic. The setup can be verified as follows.

# Test GUI display — this should open a small window
sudo apt install -y x11-apps
xclock &

# If xclock shows a clock window, WSLg is working.
# If not, make sure WSL is up to date:
# (Run this in PowerShell, not WSL)
# wsl --update

If WSLg is not functioning, an X server can be used as a fallback.

# Fallback: Set DISPLAY for manual X server (VcXsrv or X410)
# Only needed if WSLg is not working
echo 'export DISPLAY=$(cat /etc/resolv.conf | grep nameserver | awk "{print \$2}"):0' >> ~/.bashrc
echo 'export LIBGL_ALWAYS_INDIRECT=0' >> ~/.bashrc
source ~/.bashrc

Step 7: Run Your First OpenClaw Environment

# Make sure you're in the OpenClaw directory with venv activated
cd ~/robotics/openclaw
source venv/bin/activate

# Run the demo script to verify everything works
python -m openclaw.demo --env GraspCube-v1 --render

# Or run a minimal test script
python -c "
import openclaw
import numpy as np

env = openclaw.make('GraspCube-v1', render_mode='human')
obs, info = env.reset()
print(f'Observation space: {env.observation_space.shape}')
print(f'Action space: {env.action_space.shape}')

for step in range(100):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        obs, info = env.reset()

env.close()
print('Environment test completed successfully!')
"

If a simulation window appears in which a robotic hand attempts to grasp a cube, even clumsily, the installation is functioning correctly. OpenClaw is now installed on Windows 11 via WSL2.

Method 2: Native Windows with Conda

Users who prefer to remain entirely within the Windows ecosystem without WSL2 may install OpenClaw natively using Conda. The approach functions but carries certain caveats: some features may require additional configuration, and Windows-specific path issues may arise. For many use cases, however, it works reliably.

Step 1: Install Miniconda

Download and install Miniconda from docs.conda.io. Select the Windows 64-bit installer. During installation:

install for “Just Me” (recommended);
check “Add Miniconda to my PATH” (despite the warning, this simplifies subsequent steps);
check “Register Miniconda as the default Python”.

Open a new Anaconda Prompt or PowerShell session and verify the installation.

conda --version
# Should output: conda 24.x.x or later

Step 2: Create the Conda Environment

# Create a new environment with Python 3.10
conda create -n openclaw python=3.10 -y
conda activate openclaw

# Install PyTorch with CUDA support via conda
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia -y

# Verify CUDA is available
python -c "import torch; print(torch.cuda.is_available())"

Step 3: Install MuJoCo for Windows

# Install MuJoCo Python package
pip install mujoco==3.1.3

# Download the MuJoCo binary release for Windows
# Create directory: C:\Users\YourName\.mujoco\
# Download from: https://github.com/google-deepmind/mujoco/releases
# Extract mujoco-3.1.3-windows-x86_64.zip to C:\Users\YourName\.mujoco\mujoco313

# Set environment variables (PowerShell)
[Environment]::SetEnvironmentVariable("MUJOCO_PATH", "$env:USERPROFILE\.mujoco\mujoco313", "User")
[Environment]::SetEnvironmentVariable("PATH", "$env:PATH;$env:USERPROFILE\.mujoco\mujoco313\bin", "User")

# Verify
python -c "import mujoco; print(mujoco.__version__)"

Step 4: Install OpenClaw

# Clone the repository
cd %USERPROFILE%\Documents
git clone https://github.com/openclaw-project/openclaw.git
cd openclaw

# Install OpenClaw
pip install -e ".[all]"

# If you encounter build errors, try installing dependencies separately:
pip install numpy scipy gymnasium stable-baselines3 tensorboard
pip install -e .

Step 5: Handle Windows-Specific Issues

Windows paths use backslashes, which can create problems with Linux-oriented Python packages. The common fixes are as follows.

# Fix 1: If OpenClaw has hardcoded Linux paths, set this environment variable
set OPENCLAW_ASSET_DIR=%cd%\assets

# Fix 2: For path separator issues in config files, use raw strings in Python
# Instead of: path = "C:\Users\name\data"
# Use:        path = r"C:\Users\name\data"
# Or:         path = "C:/Users/name/data"  (forward slashes work in Python)

# Fix 3: Long path support (PowerShell as Admin)
New-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Control\FileSystem" `
    -Name "LongPathsEnabled" -Value 1 -PropertyType DWORD -Force

# Fix 4: If you get DLL errors, install Visual C++ Redistributable
# Download from: https://aka.ms/vs/17/release/vc_redist.x64.exe

Tip: If a FileNotFoundError related to asset files arises, check whether the framework uses os.path.join() correctly. Some robotics frameworks assume a forward-slash path separator. Setting the OPENCLAW_ASSET_DIR environment variable with forward slashes often resolves these issues.

Step 6: Test the Installation

conda activate openclaw

python -c "
import openclaw
import torch

print(f'OpenClaw loaded')
print(f'PyTorch: {torch.__version__}')
print(f'CUDA: {torch.cuda.is_available()}')
if torch.cuda.is_available():
    print(f'GPU: {torch.cuda.get_device_name(0)}')

env = openclaw.make('GraspCube-v1', render_mode='human')
obs, info = env.reset()
print(f'Environment created: obs shape = {obs.shape}')
env.close()
print('All good!')
"

Method 3: Docker on Windows

Docker provides the cleanest and most reproducible installation. All components run in an isolated container, which prevents accidental pollution of the system Python environment or CUDA versions. The trade-off is somewhat more involved setup for GPU passthrough and GUI forwarding.

Step 1: Install Docker Desktop

Download Docker Desktop from docker.com. During installation, ensure that “Use WSL 2 instead of Hyper-V” is selected as the backend. After installation:

# Verify Docker is working (PowerShell)
docker --version
docker run hello-world

# Enable GPU support — install NVIDIA Container Toolkit
# In your WSL2 Ubuntu terminal:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Verify GPU access from Docker.

docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi

If the GPU is listed in the output, Docker GPU passthrough is functioning.

Step 2: Create the OpenClaw Dockerfile

Create a file named Dockerfile.openclaw in the working directory.

# Dockerfile.openclaw
FROM nvidia/cuda:12.4.0-devel-ubuntu22.04

ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1

# Install system dependencies
RUN apt-get update && apt-get install -y \
    build-essential cmake git wget curl unzip \
    python3.10 python3.10-venv python3.10-dev python3-pip \
    libgl1-mesa-dev libglu1-mesa-dev libglew-dev \
    libosmesa6-dev libglfw3-dev patchelf \
    xvfb x11-utils \
    && rm -rf /var/lib/apt/lists/*

# Set Python 3.10 as default
RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.10 1
RUN update-alternatives --install /usr/bin/python python /usr/bin/python3.10 1

# Install MuJoCo
RUN mkdir -p /root/.mujoco && \
    wget -q https://github.com/google-deepmind/mujoco/releases/download/3.1.3/mujoco-3.1.3-linux-x86_64.tar.gz && \
    tar -xzf mujoco-3.1.3-linux-x86_64.tar.gz -C /root/.mujoco/ && \
    mv /root/.mujoco/mujoco-3.1.3 /root/.mujoco/mujoco313 && \
    rm mujoco-3.1.3-linux-x86_64.tar.gz

ENV MUJOCO_PATH=/root/.mujoco/mujoco313
ENV LD_LIBRARY_PATH=$MUJOCO_PATH/lib:$LD_LIBRARY_PATH

# Create workspace
WORKDIR /workspace

# Install Python packages
RUN pip install --upgrade pip setuptools wheel && \
    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124 && \
    pip install mujoco==3.1.3

# Clone and install OpenClaw
RUN git clone https://github.com/openclaw-project/openclaw.git && \
    cd openclaw && \
    pip install -e ".[all]"

# Default command
CMD ["/bin/bash"]

Step 3: Build and Run the Container

# Build the Docker image (this takes 10-20 minutes)
docker build -f Dockerfile.openclaw -t openclaw:latest .

# Run with GPU support and volume mount for saving experiments
docker run -it --gpus all \
    -v ${PWD}/experiments:/workspace/experiments \
    -v ${PWD}/configs:/workspace/configs \
    --name openclaw-dev \
    openclaw:latest

# For GUI support (renders to a virtual display, saves videos)
docker run -it --gpus all \
    -e DISPLAY=$DISPLAY \
    -v /tmp/.X11-unix:/tmp/.X11-unix \
    -v ${PWD}/experiments:/workspace/experiments \
    --name openclaw-gui \
    openclaw:latest

For headless rendering (no display), Xvfb may be used.

# Inside the container
Xvfb :1 -screen 0 1024x768x24 &
export DISPLAY=:1

# Now rendering commands will work headlessly
python -m openclaw.demo --env GraspCube-v1 --record-video output.mp4

Step 4: Daily Workflow with Docker

# Start an existing stopped container
docker start -ai openclaw-dev

# Run a training job in the background
docker exec -d openclaw-dev python -m openclaw.train \
    --config configs/grasp_cube.yaml \
    --output experiments/run_001

# Check training logs
docker exec openclaw-dev tail -f experiments/run_001/train.log

# Copy results out of the container
docker cp openclaw-dev:/workspace/experiments/run_001 ./local_results/

Key Takeaway: Docker is well suited to reproducibility. Once the image builds successfully, it can be shared with collaborators and guarantees identical environments. The overhead is minimal: GPU performance in Docker matches native performance within 1 to 2 percent.

Running Your First Experiments

With OpenClaw installed via any of the methods above, the framework’s capabilities can now be explored. OpenClaw ships with several pre-built environments covering a range of manipulation tasks.

Exploring Available Environments

import openclaw

# List all registered environments
envs = openclaw.list_environments()
for env_name in envs:
    print(env_name)

Typical environments include the following tasks.

Environment	Task Description	Difficulty
`GraspCube-v1`	Pick up a cube with a dexterous hand	Beginner
`RotateBlock-v1`	In-hand rotation of a block to target orientation	Intermediate
`StackBlocks-v1`	Stack two blocks on top of each other	Advanced
`InsertPeg-v1`	Insert a peg into a hole with tight tolerance	Advanced
`OpenDrawer-v1`	Pull open a drawer using the handle	Intermediate

Loading and Interacting with an Environment

import openclaw
import numpy as np

# Create the environment with visual rendering
env = openclaw.make('GraspCube-v1', render_mode='human')

# Reset and inspect the observation
obs, info = env.reset(seed=42)
print(f"Observation shape: {obs.shape}")
print(f"Observation range: [{obs.min():.3f}, {obs.max():.3f}]")
print(f"Action space: {env.action_space}")
print(f"Action range: [{env.action_space.low.min():.1f}, {env.action_space.high.max():.1f}]")

# Run random actions for 500 steps
total_reward = 0
for step in range(500):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    total_reward += reward

    if terminated or truncated:
        print(f"Episode ended at step {step}, total reward: {total_reward:.2f}")
        obs, info = env.reset()
        total_reward = 0

env.close()

Recording Simulation Videos

For sharing results or debugging policies, recording videos is essential.

import openclaw
from gymnasium.wrappers import RecordVideo

# Wrap the environment with video recording
env = openclaw.make('GraspCube-v1', render_mode='rgb_array')
env = RecordVideo(env, video_folder='./videos', episode_trigger=lambda e: True)

obs, info = env.reset()
for step in range(1000):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        obs, info = env.reset()

env.close()
print("Video saved to ./videos/")

Evaluating a Pre-trained Model

OpenClaw typically includes pre-trained checkpoints for benchmarking.

from stable_baselines3 import PPO
import openclaw

# Load a pre-trained model (if available in the repo)
model = PPO.load("pretrained/grasp_cube_ppo.zip")

env = openclaw.make('GraspCube-v1', render_mode='human')
obs, info = env.reset()

total_reward = 0
episode_count = 0

for step in range(5000):
    action, _states = model.predict(obs, deterministic=True)
    obs, reward, terminated, truncated, info = env.step(action)
    total_reward += reward

    if terminated or truncated:
        episode_count += 1
        print(f"Episode {episode_count}: reward = {total_reward:.2f}")
        total_reward = 0
        obs, info = env.reset()

env.close()
print(f"Evaluated {episode_count} episodes")

Understanding the Config System

OpenClaw uses YAML configuration files to define environments, training hyperparameters and experiment settings. This simplifies reproduction of results and adjustment of parameters without modifying code.

# Example: configs/grasp_cube.yaml
environment:
  name: GraspCube-v1
  max_episode_steps: 200
  reward_type: dense  # 'dense' or 'sparse'
  obs_type: state     # 'state', 'pixels', or 'state+pixels'

robot:
  hand_type: shadow_hand
  control_mode: position  # 'position', 'velocity', or 'torque'
  action_scale: 0.05

object:
  type: cube
  size: [0.04, 0.04, 0.04]
  mass: 0.1
  friction: [1.0, 0.005, 0.0001]

simulation:
  physics_timestep: 0.002
  control_timestep: 0.02  # 50 Hz control
  num_substeps: 10
  gravity: [0, 0, -9.81]

Training Your First Policy

The next step is training a neural network to control a robotic hand. The following example uses Stable Baselines3’s PPO (Proximal Policy Optimisation) algorithm, which is widely used in robotic manipulation research.

Setting Up the Training Script

Create a file called train_grasp.py.

"""
Train a PPO agent to grasp a cube using OpenClaw.
"""
import os
import argparse
from datetime import datetime

import openclaw
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import SubprocVecEnv, VecMonitor
from stable_baselines3.common.callbacks import (
    EvalCallback,
    CheckpointCallback,
    CallbackList,
)

def make_env(env_id, rank, seed=0):
    """Create a wrapped environment for vectorized training."""
    def _init():
        env = openclaw.make(env_id)
        env.reset(seed=seed + rank)
        return env
    return _init

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--env', default='GraspCube-v1', help='Environment ID')
    parser.add_argument('--num-envs', type=int, default=8, help='Parallel envs')
    parser.add_argument('--total-timesteps', type=int, default=2_000_000)
    parser.add_argument('--output-dir', default='./experiments')
    parser.add_argument('--seed', type=int, default=42)
    args = parser.parse_args()

    # Create experiment directory
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    exp_dir = os.path.join(args.output_dir, f'{args.env}_{timestamp}')
    os.makedirs(exp_dir, exist_ok=True)

    # Create vectorized training environments
    train_envs = SubprocVecEnv([
        make_env(args.env, i, args.seed) for i in range(args.num_envs)
    ])
    train_envs = VecMonitor(train_envs, os.path.join(exp_dir, 'monitor'))

    # Create evaluation environment
    eval_env = SubprocVecEnv([make_env(args.env, 0, args.seed + 1000)])
    eval_env = VecMonitor(eval_env)

    # Configure PPO
    model = PPO(
        policy='MlpPolicy',
        env=train_envs,
        learning_rate=3e-4,
        n_steps=2048,
        batch_size=256,
        n_epochs=10,
        gamma=0.99,
        gae_lambda=0.95,
        clip_range=0.2,
        ent_coef=0.01,
        vf_coef=0.5,
        max_grad_norm=0.5,
        verbose=1,
        seed=args.seed,
        tensorboard_log=os.path.join(exp_dir, 'tensorboard'),
        device='cuda',
    )

    # Set up callbacks
    eval_callback = EvalCallback(
        eval_env,
        best_model_save_path=os.path.join(exp_dir, 'best_model'),
        log_path=os.path.join(exp_dir, 'eval_logs'),
        eval_freq=10_000,
        n_eval_episodes=10,
        deterministic=True,
    )

    checkpoint_callback = CheckpointCallback(
        save_freq=50_000,
        save_path=os.path.join(exp_dir, 'checkpoints'),
        name_prefix='ppo_grasp',
    )

    callbacks = CallbackList([eval_callback, checkpoint_callback])

    # Train!
    print(f"Starting training: {args.total_timesteps} timesteps")
    print(f"Experiment directory: {exp_dir}")
    model.learn(
        total_timesteps=args.total_timesteps,
        callback=callbacks,
        progress_bar=True,
    )

    # Save final model
    model.save(os.path.join(exp_dir, 'final_model'))
    print(f"Training complete! Model saved to {exp_dir}")

    # Cleanup
    train_envs.close()
    eval_env.close()

if __name__ == '__main__':
    main()

Launch Training

# Basic training run
python train_grasp.py --env GraspCube-v1 --total-timesteps 2000000

# With more parallel environments (faster on multi-core CPUs)
python train_grasp.py --env GraspCube-v1 --num-envs 16 --total-timesteps 5000000

# For a quick test run
python train_grasp.py --env GraspCube-v1 --num-envs 4 --total-timesteps 50000

Monitor Training with TensorBoard

Open a separate terminal while training is running.

# Install TensorBoard if not already installed
pip install tensorboard

# Launch TensorBoard
tensorboard --logdir ./experiments --port 6006

# Open in your browser: http://localhost:6006

Key metrics to monitor during training are as follows.

ep_rew_mean: Average episode reward—this should generally trend upward
ep_len_mean: Average episode length—shorter can mean the agent achieves the goal faster
loss/policy_loss: Should decrease and stabilize
loss/value_loss: Should decrease over time
explained_variance: Should approach 1.0 as training progresses

Tip: For the GraspCube-v1 task, meaningful improvement should appear within 500,000 to 1 million timesteps. If the reward curve remains completely flat after one million steps, the environment configuration and reward function should be checked. Dense rewards converge substantially faster than sparse rewards for beginners.

Evaluate Your Trained Agent

from stable_baselines3 import PPO
import openclaw
import numpy as np

# Load the best model from training
model = PPO.load("experiments/GraspCube-v1_YYYYMMDD_HHMMSS/best_model/best_model")

env = openclaw.make('GraspCube-v1', render_mode='human')

rewards = []
for episode in range(20):
    obs, info = env.reset()
    episode_reward = 0
    done = False

    while not done:
        action, _ = model.predict(obs, deterministic=True)
        obs, reward, terminated, truncated, info = env.step(action)
        episode_reward += reward
        done = terminated or truncated

    rewards.append(episode_reward)
    print(f"Episode {episode + 1}: reward = {episode_reward:.2f}")

env.close()
print(f"\nMean reward: {np.mean(rewards):.2f} +/- {np.std(rewards):.2f}")

GPU Acceleration and Performance Tips

Maximising GPU utilisation can substantially accelerate training. The following sections describe verification, optimisation and benchmarking procedures.

CUDA Setup Verification

# Comprehensive CUDA check script
python -c "
import torch
import subprocess

print('=== CUDA Diagnostics ===')
print(f'PyTorch version: {torch.__version__}')
print(f'CUDA available: {torch.cuda.is_available()}')
print(f'CUDA version (PyTorch): {torch.version.cuda}')
print(f'cuDNN version: {torch.backends.cudnn.version()}')
print(f'cuDNN enabled: {torch.backends.cudnn.enabled}')

if torch.cuda.is_available():
    print(f'GPU count: {torch.cuda.device_count()}')
    for i in range(torch.cuda.device_count()):
        props = torch.cuda.get_device_properties(i)
        print(f'  GPU {i}: {props.name}')
        print(f'    Memory: {props.total_mem / 1024**3:.1f} GB')
        print(f'    Compute capability: {props.major}.{props.minor}')
        print(f'    Multi-processors: {props.multi_processor_count}')

    # Quick benchmark
    print('\n=== Quick Benchmark ===')
    x = torch.randn(10000, 10000, device='cuda')
    import time
    start = time.time()
    for _ in range(100):
        y = torch.mm(x, x)
    torch.cuda.synchronize()
    elapsed = time.time() - start
    print(f'100x matrix multiply (10000x10000): {elapsed:.2f}s')
    print(f'TFLOPS estimate: {100 * 2 * 10000**3 / elapsed / 1e12:.1f}')
"

Optimizing Batch Sizes

The appropriate batch size depends on the available GPU VRAM. The following table provides a general guideline.

GPU VRAM	Recommended Batch Size	Parallel Envs	Expected Throughput
6 GB (RTX 3060)	128	4-8	~2,000 steps/sec
8 GB (RTX 3070/4060)	256	8-12	~3,500 steps/sec
12 GB (RTX 3060 12GB/4070)	512	12-16	~5,000 steps/sec
16 GB+ (RTX 4080/4090)	1024	16-32	~10,000+ steps/sec

WSL2 vs Native Performance Comparison

Based on typical benchmarks, the three installation methods compare as follows.

Metric	WSL2	Native Windows	Docker (WSL2 backend)
GPU compute	98-100% of native Linux	95-100%	97-100%
Disk I/O	60-70% (cross-filesystem)	100% (native NTFS)	50-65% (overlay)
Linux compatibility	Excellent	Partial	Full
Setup complexity	Medium	Low	Medium-High
GUI rendering	WSLg (built-in)	Native	Requires forwarding
Reproducibility	Good	Fair	Excellent

Key Takeaway: For most users, WSL2 offers the best balance of performance, compatibility and ease of use. Project files should be kept on the Linux filesystem (inside ~/) rather than on /mnt/c/ in order to avoid the disk I/O penalty.

Memory Management Tips

# Monitor GPU memory during training
watch -n 1 nvidia-smi

# In Python, check memory usage:
import torch
print(f"Allocated: {torch.cuda.memory_allocated() / 1024**3:.2f} GB")
print(f"Cached: {torch.cuda.memory_reserved() / 1024**3:.2f} GB")

# Free GPU cache if needed
torch.cuda.empty_cache()

# Limit WSL2 memory usage by creating .wslconfig
# Create/edit: C:\Users\YourName\.wslconfig

Create or edit C:\Users\YourName\.wslconfig to control WSL2’s resource usage.

[wsl2]
memory=16GB          # Limit WSL2 RAM (default: 50% of system RAM)
processors=8         # Limit CPU cores
swap=8GB             # Swap file size
localhostForwarding=true

Multi-GPU Training Setup

For systems with multiple GPUs, OpenClaw combined with Stable Baselines3 can use them as follows.

# Check available GPUs
python -c "
import torch
for i in range(torch.cuda.device_count()):
    print(f'GPU {i}: {torch.cuda.get_device_name(i)}')
"

# To use a specific GPU
CUDA_VISIBLE_DEVICES=1 python train_grasp.py

# For multi-GPU with data parallelism, modify the training script:
# model = PPO(..., device='cuda:0')
# Or use torch.nn.DataParallel for custom architectures

Troubleshooting Common Windows Issues

If the preceding steps have been completed, OpenClaw is most likely running. Robotics simulation frameworks are complex systems, however, and failures do occur. The most common issues and their solutions are summarised below.

Error	Cause	Solution
`CUDA not found` in WSL2	Windows NVIDIA driver too old or CUDA toolkit not installed in WSL2	Update Windows NVIDIA driver to 525+, install `cuda-toolkit-12-4` in WSL2 (not the full driver)
`GLFWError: API unavailable`	MuJoCo cannot create an OpenGL context	Install `libosmesa6-dev`, set `MUJOCO_GL=osmesa` for headless, or fix WSLg
`EGL error` / rendering fails	Missing EGL/Mesa libraries	Run: `sudo apt install -y libegl1-mesa-dev libgles2-mesa-dev`
`Permission denied` errors	File permissions mismatch between Windows and WSL2	Work in `~/` not `/mnt/c/`; run `chmod +x` on scripts
`DLL load failed` (native Windows)	Missing Visual C++ Redistributable or wrong CUDA DLLs	Install VC++ Redist; verify CUDA PATH order
WSLg display not working	WSL not updated or Wayland issue	Run `wsl --update` in PowerShell; try `export DISPLAY=:0`
`CUDA out of memory`	Batch size too large or memory leak	Reduce batch size, reduce `num_envs`, call `torch.cuda.empty_cache()`
Python version conflicts	System Python interfering with venv/conda	Always activate your venv/conda env; use `which python` to verify
`ModuleNotFoundError: mujoco`	MuJoCo not installed in the active environment	Activate your venv/conda, then `pip install mujoco==3.1.3`
`subprocess-exited-with-error` during pip install	Missing build dependencies	Install `build-essential cmake` (WSL2) or Visual Studio Build Tools (Windows)

Detailed Fix: MuJoCo Rendering in WSL2

Rendering is the most frequent source of difficulty. A systematic approach to resolving it is presented below.

# Step 1: Check if WSLg is running
ls /tmp/.X11-unix/
# Should list at least X0 or X1

# Step 2: Check DISPLAY variable
echo $DISPLAY
# Should be something like :0 or :1

# Step 3: Test with a simple OpenGL app
sudo apt install -y mesa-utils
glxinfo | head -20
# Should show "direct rendering: Yes" for GPU acceleration

# Step 4: If rendering still fails, try different backends
export MUJOCO_GL=egl     # Hardware EGL (preferred)
# or
export MUJOCO_GL=osmesa  # Software rendering (slower but always works)
# or
export MUJOCO_GL=glfw    # GLFW (requires display)

# Step 5: Test MuJoCo rendering
python -c "
import mujoco
import numpy as np

model = mujoco.MjModel.from_xml_string('')
data = mujoco.MjData(model)

renderer = mujoco.Renderer(model, height=480, width=640)
mujoco.mj_step(model, data)
renderer.update_scene(data)
pixels = renderer.render()
print(f'Rendered frame: {pixels.shape}')  # Should be (480, 640, 3)
print('Rendering works!')
"

Caution: When switching between MUJOCO_GL backends, the Python session should be restarted completely. MuJoCo initialises the rendering backend on first import and caches it.

Integration with VS Code

VS Code is well suited to OpenClaw development, particularly when using WSL2. Microsoft’s WSL extension provides a native-Linux working experience while the editor itself runs on Windows.

Setting Up VS Code with WSL2

# Install the WSL extension in VS Code (from Windows)
# 1. Open VS Code
# 2. Go to Extensions (Ctrl+Shift+X)
# 3. Search for "WSL" by Microsoft
# 4. Click Install

# Open your OpenClaw project from WSL2
cd ~/robotics/openclaw
code .

This command opens VS Code on Windows but connects it to the WSL2 filesystem. The terminal inside VS Code uses the WSL2 bash shell, and all file operations occur on the Linux filesystem, combining the advantages of both environments.

Setting Up Debugging

Create a launch configuration at .vscode/launch.json in the project.

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Train GraspCube",
            "type": "debugpy",
            "request": "launch",
            "program": "${workspaceFolder}/train_grasp.py",
            "args": ["--env", "GraspCube-v1", "--total-timesteps", "10000"],
            "console": "integratedTerminal",
            "env": {
                "CUDA_VISIBLE_DEVICES": "0",
                "MUJOCO_GL": "egl"
            },
            "python": "${workspaceFolder}/venv/bin/python"
        },
        {
            "name": "Debug Current File",
            "type": "debugpy",
            "request": "launch",
            "program": "${file}",
            "console": "integratedTerminal",
            "python": "${workspaceFolder}/venv/bin/python"
        },
        {
            "name": "Evaluate Model",
            "type": "debugpy",
            "request": "launch",
            "program": "${workspaceFolder}/evaluate.py",
            "args": ["--model", "experiments/best_model/best_model.zip"],
            "console": "integratedTerminal",
            "python": "${workspaceFolder}/venv/bin/python"
        }
    ]
}

Recommended Extensions for Robotics Development

Python (Microsoft): Core Python support with IntelliSense, linting, and debugging
Pylance: Fast, feature-rich Python language server
WSL (Microsoft): Seamless WSL2 integration
Jupyter: For interactive experimentation and visualization
GitLens: Enhanced Git integration for tracking changes
YAML: Syntax highlighting for OpenClaw config files
Docker (Microsoft): If using the Docker installation method
Remote – SSH: For connecting to remote training servers
Error Lens: Inline error display—catches issues before running

Workspace Settings

Create .vscode/settings.json for project-specific configuration.

{
    "python.defaultInterpreterPath": "${workspaceFolder}/venv/bin/python",
    "python.linting.enabled": true,
    "python.linting.flake8Enabled": true,
    "python.formatting.provider": "black",
    "python.formatting.blackArgs": ["--line-length", "100"],
    "editor.formatOnSave": true,
    "editor.rulers": [100],
    "files.exclude": {
        "**/__pycache__": true,
        "**/*.pyc": true,
        "**/experiments/*/checkpoints": true
    },
    "terminal.integrated.env.linux": {
        "MUJOCO_GL": "egl",
        "CUDA_VISIBLE_DEVICES": "0"
    }
}

Next Steps and Resources

A fully functional OpenClaw installation on Windows 11 is now in place. The following directions may be explored next.

Building Custom Environments

OpenClaw’s environment API follows the Gymnasium standard, which makes the creation of custom tasks straightforward.

import openclaw
from openclaw.envs import BaseManipulationEnv

class MyCustomTask(BaseManipulationEnv):
    """Custom manipulation task with your own reward function."""

    def __init__(self, **kwargs):
        super().__init__(
            model_path="path/to/your/model.xml",
            **kwargs
        )

    def _get_obs(self):
        # Define your observation space
        return {
            'robot_state': self._get_robot_state(),
            'object_state': self._get_object_state(),
            'goal': self._get_goal(),
        }

    def _compute_reward(self, achieved_goal, desired_goal, info):
        # Define your reward function
        distance = np.linalg.norm(achieved_goal - desired_goal)
        return -distance  # Dense reward: minimize distance

    def _check_success(self, achieved_goal, desired_goal):
        distance = np.linalg.norm(achieved_goal - desired_goal)
        return distance < 0.05  # 5cm threshold

# Register the environment
openclaw.register(
    id='MyCustomTask-v1',
    entry_point='my_envs:MyCustomTask',
    max_episode_steps=200,
)

Sim-to-Real Transfer Basics

The ultimate goal of simulation training is the deployment of policies on real robots. Key techniques include the following.

Domain randomisation: vary physics parameters (friction, mass, damping) during training so that the policy generalises.
System identification: measure the real robot's parameters and match them in simulation.
Asymmetric actor-critic: grant the critic access to privileged simulation information while the actor uses only observations available in the real world.
Progressive transfer: begin with simple tasks and increase complexity incrementally.

Contributing to OpenClaw

Open-source robotics depends on community contributions. The following avenues for involvement are particularly useful.

Report bugs through GitHub Issues with detailed reproduction steps.
Contribute new environments for additional manipulation tasks.
Improve Windows compatibility, given that the experience of completing this setup is itself valuable.
Write documentation and tutorials.
Share trained models and benchmark results.

Community and Learning Resources

OpenClaw GitHub: Source code, issues, and discussions
MuJoCo Documentation: mujoco.readthedocs.io—essential for understanding the physics engine
Stable Baselines3 Docs: stable-baselines3.readthedocs.io,RL algorithm reference
Gymnasium API: gymnasium.farama.org—environment interface standard
Robotic Manipulation Course (MIT 6.881): Excellent free lectures on manipulation theory
DeepMind Control Suite: Related environment suite for continuous control
Papers: Search for "dexterous manipulation reinforcement learning" on arXiv for the latest research

Final Thoughts

Setting up a robotics AI framework on Windows 11 once required either a dual-boot Linux partition or hours of work resolving incompatible dependencies. That period has ended. With WSL2 providing near-native Linux performance, Conda offering cross-platform package management, and Docker delivering reproducible containers, Windows 11 is now a first-class platform for robotics simulation research.

This guide has covered three complete installation paths for OpenClaw. The WSL2 method offers the best balance of compatibility and performance and is recommended for most users. The native Conda approach is appropriate for simpler use cases in which WSL2 should be avoided entirely. Docker is the appropriate choice when reproducibility is paramount, particularly in team environments.

The discussion has extended beyond basic installation to cover the complete workflow: running environments, training reinforcement learning policies with PPO, monitoring with TensorBoard, optimising GPU performance, and resolving the most common Windows-specific issues. VS Code has also been configured for a professional development experience.

The field of robotic manipulation is advancing rapidly. Frameworks such as OpenClaw permit experimentation with recent algorithms without access to physical robots. A Windows 11 machine equipped with a reasonable NVIDIA GPU is sufficient to begin training policies that may eventually run on real robotic hands.

The gap between simulation and reality continues to narrow each year. The path forward involves experimenting, accepting initial failures, and training agents that progress from clumsy attempts to reliable performance. The Windows 11 setup is now prepared, and only the work itself remains.

Key Takeaway: Windows 11 with WSL2 provides a near-seamless experience for running Linux-native robotics frameworks. With the installation steps in this guide, the path from a fresh Windows machine to training robotic manipulation policies can be completed in under an hour.

References

MuJoCo Documentation—mujoco.readthedocs.io
Stable Baselines3 Documentation—stable-baselines3.readthedocs.io
Microsoft WSL2 Documentation,learn.microsoft.com/en-us/windows/wsl/
NVIDIA CUDA on WSL—docs.nvidia.com/cuda/wsl-user-guide/
NVIDIA Container Toolkit—docs.nvidia.com/datacenter/cloud-native/container-toolkit/
Docker Desktop for Windows,docs.docker.com/desktop/install/windows-install/
Gymnasium API Reference—gymnasium.farama.org
Schulman, J., et al. "Proximal Policy Optimization Algorithms." arXiv:1707.06347 (2017)
OpenAI. "Learning Dexterous In-Hand Manipulation." arXiv:1808.00177 (2018)
Todorov, E., Erez, T., Tassa, Y. "MuJoCo: A physics engine for model-based control." IROS 2012

April 4, 2026

How to Create Professional PowerPoint Presentations Using Claude Cowork: A Step-by-Step Guide

Summary

What this post covers: A hands-on guide to building professional PowerPoint decks with Claude Cowork using three distinct workflows: direct computer use, programmatic generation with python-pptx, and AI-assisted outlining with manual polish.

Key insights:

Knowledge workers spend roughly eight hours per week on slides, and Claude Cowork can cut that effort by about 90 percent by combining agentic computer control with code generation.
Direct computer use is fastest for one-off internal decks, python-pptx is the right choice for recurring or data-driven reports, and the outline-and-edit method preserves the most creative control for high-stakes presentations.
Among AI presentation tools (Copilot, Gamma, Beautiful.ai, SlidesGPT), Cowork stands out because it is a general-purpose agent that can also research, analyze data, and automate work end-to-end, not just generate slides.
Better prompts (audience, structure, constraints, examples) consistently produce better decks; an iterative four-pass workflow (skeleton, narrative, design, speaker notes) beats one-shot generation.
Cowork has real limitations around fine pixel-level design, large images, and complex animations, so a human review pass before presenting is still required.

Main topics: Introduction, Prerequisites and Setup, Method 1: Direct Computer Use with Cowork, Method 2: Python-pptx Script Generation, Method 3: Outline and Manual Creation, Practical Examples, Advanced Techniques, Prompt Engineering for Better Presentations, Comparison: Claude Cowork vs Other AI Presentation Tools, Limitations and Workarounds, Best Practices for AI-Generated Presentations, Final Thoughts, References.

Introduction: The Presentation Problem

A statistic worth noting: the average professional spends eight hours per week creating presentations. An entire workday each week is consumed by adjusting text boxes, selecting chart styles, aligning bullet points, and reconsidering whether a title slide looks sufficiently formal. Over the course of a year, the total exceeds 400 hours, equivalent to roughly ten work weeks.

That time can be reduced by approximately 90 percent. The mechanism is neither a template gallery nor an outsourced designer. It is an AI agent capable of observing the screen, opening PowerPoint, building slides in real time, and generating entire presentation files programmatically through Python code, all from a single natural-language prompt.

Claude Cowork provides precisely this capability. Released by Anthropic as part of its Claude desktop application, Cowork is an agentic computer-use feature that converts Claude from a chatbot into a fully featured desktop assistant. It can control the mouse and keyboard, execute scripts, browse the web for research, and operate autonomously on multi-step tasks.

This guide examines three distinct methods for creating professional PowerPoint presentations using Claude Cowork: fully hands-off computer use, programmatic generation with the python-pptx library, and structured outlines refined manually. Four real-world presentation decks are built step by step, advanced techniques such as data-driven automation are explored, and Cowork is compared with every major AI presentation tool currently available.

Whether the reader is a startup founder rehearsing a pitch, a consultant assembling a quarterly business review, or an engineer explaining system architecture to stakeholders, this guide will alter the process of presentation creation substantially.

The guide proceeds as follows.

Prerequisites and Setup

Before the methods are examined, a few prerequisites must be in place. Setup typically takes approximately five minutes.

What Is Required

Requirement	Details
Claude Subscription	Claude Pro ($20/mo), Max ($100/mo or $200/mo), or Team plan. Cowork is not available on the free tier.
Claude Desktop App	Download from claude.ai/download—available for macOS and Windows.
Cowork Enabled	Go to Claude Desktop → Settings → Feature Previews → Enable “Computer Use” / Cowork.
Presentation Software	Microsoft PowerPoint (desktop), Google Slides (browser), or LibreOffice Impress.
Python (for Method 2)	Python 3.9+ with `pip install python-pptx`. Optional but powerful.

Enabling Cowork in Claude Desktop

If Cowork has not yet been enabled, the configuration proceeds as follows:

Open the Claude desktop app (not the browser version; Cowork requires the native application).
Click the profile icon in the bottom-left corner.
Navigate to Settings → Feature Previews.
Toggle on “Computer Use” (also labelled “Cowork” in newer versions).
Grant the required permissions: Claude requires screen access and input control.
Restart the application if prompted.

Once enabled, a new option to start a “Cowork” session appears in the Claude chat interface. The option instructs Claude that it may observe the screen and interact with desktop applications.

Caution: Cowork’s computer use is currently in research preview. Claude requests confirmation before taking actions, and continued supervision is recommended, particularly during clicking, typing, or file-saving operations. The system should be regarded as a capable assistant whose actions still warrant oversight.

Method 1: Direct Computer Use with Cowork

This is the most striking method and the one that most closely resembles autonomous operation. The user specifies the desired presentation, and Claude opens PowerPoint, creates slides, enters content, applies formatting, and saves the file while the user observes.

How Computer Use Works

When a Cowork session begins, Claude obtains the following capabilities:

Screen observation. Periodic screenshots allow Claude to interpret what is displayed.
Mouse control. Claude can click buttons, menus, and interface elements.
Keyboard input. Claude can enter text, use keyboard shortcuts, and navigate applications.
Terminal command execution. Claude can launch applications, run scripts, and manage files.

The result is that Claude can interact with PowerPoint (or Google Slides, or any other presentation tool) in much the same way as a human user, although more rapidly and without creative blocks.

Step-by-Step Walkthrough

Step 1: Start a Cowork session. In the Claude desktop app, open a new conversation and select the Cowork mode. A banner confirms that Claude may now interact with the computer.

Step 2: Provide a presentation brief. An example prompt follows:

I need you to create a 10-slide PowerPoint presentation for a quarterly business review.

Company: Acme Corp
Quarter: Q1 2026
Key metrics:
- Revenue: $4.2M (up 18% YoY)
- New customers: 340
- Churn rate: 2.1% (down from 3.4%)
- NPS score: 72

Sections needed:
- Title slide with company logo placeholder
- Executive summary
- Revenue breakdown by product line
- Customer acquisition funnel
- Churn analysis
- NPS trends
- Key wins this quarter
- Challenges and risks
- Q2 priorities
- Thank you / Q&A slide

Style: Professional, dark blue theme, clean and minimal.
Please open PowerPoint and create this deck for me.

Step 3: Observe Claude’s work. After the action is confirmed, Claude will:

Open PowerPoint from the taskbar or applications folder.
Select a blank presentation (or apply a built-in theme if one was specified).
Create the title slide and enter the title, subtitle, and date.
Add new slides one by one, selecting appropriate layouts (title with content, two-column, or blank for charts).
Enter all text content, including headings, bullet points, and data figures.
Apply formatting such as font sizes, colours, and alignment.
Apply a cohesive theme, adjusting the slide master where necessary.
Save the file to the preferred location.

Step 4: Review and refine. Once Claude completes the task, the user is notified that the deck is ready. The file should be opened, each slide reviewed, and adjustments requested as required:

The revenue slide looks great, but can you:
1. Make the revenue number larger and bold
2. Add a simple bar chart placeholder showing Q1 vs Q4 comparison
3. Change the background of the title slide to a gradient from dark blue to navy

Tip: Formatting requests should be precise. Instead of “make it look better,” specify “increase the heading font to 28pt, use Calibri Bold, and left-align all bullet points with 1.5 line spacing.” The more precise the instruction, the better the output produced by Claude.

Effective Prompts for Computer Use

Presentation quality depends substantially on prompt quality. The following prompt patterns work well with Cowork’s computer use:

For a pitch deck:

Open PowerPoint and create a 12-slide startup pitch deck for a B2B SaaS company
called "DataFlow" that provides real-time analytics for e-commerce.

Funding stage: Series A, seeking $5M
Traction: $1.2M ARR, 85 customers, 140% net revenue retention

Use a modern, clean design with a primary color of #1a73e8 (Google blue).
Include placeholder boxes where charts and screenshots should go.
Add speaker notes to every slide with talking points.

For a training presentation:

Create a 15-slide onboarding training deck for new software engineers.

Topics to cover:
- Company tech stack overview
- Development workflow (Git, CI/CD, code review)
- Architecture overview (microservices, AWS infrastructure)
- Security best practices
- First-week checklist

Style: Light theme, friendly and approachable. Use icons or emoji where appropriate.
Include a quiz slide at the end with 5 multiple-choice questions.

Method 2: Python-pptx Script Generation

When pixel-perfect control, repeatable automation, or presentations driven by live data are required, the python-pptx method is the most appropriate option. Instead of manipulating PowerPoint visually, Claude is asked to generate a Python script that creates the .pptx file programmatically.

This approach is particularly powerful because:

Presentation scripts can be version-controlled in Git.
Data can be ingested from CSV, Excel, databases, or APIs.
Updated presentations can be regenerated with a single command.
Absolute precision over positioning, sizing, and styling is preserved.

Getting Started with python-pptx

The library is installed as follows:

pip install python-pptx

Claude can then be requested—either in a regular chat or in a Cowork session—to generate complete scripts. The principal building blocks are described below.

Creating a Title Slide

from pptx import Presentation
from pptx.util import Inches, Pt, Emu
from pptx.dml.color import RGBColor
from pptx.enum.text import PP_ALIGN

prs = Presentation()
prs.slide_width = Inches(13.333)  # Widescreen 16:9
prs.slide_height = Inches(7.5)

# Title slide
slide_layout = prs.slide_layouts[6]  # Blank layout for full control
slide = prs.slides.add_slide(slide_layout)

# Background color
background = slide.background
fill = background.fill
fill.solid()
fill.fore_color.rgb = RGBColor(0x1a, 0x1a, 0x2e)  # Dark navy

# Title text
from pptx.util import Inches, Pt
txBox = slide.shapes.add_textbox(Inches(1), Inches(2), Inches(11), Inches(2))
tf = txBox.text_frame
tf.word_wrap = True
p = tf.paragraphs[0]
p.text = "Q1 2026 Business Review"
p.font.size = Pt(44)
p.font.bold = True
p.font.color.rgb = RGBColor(0xFF, 0xFF, 0xFF)
p.alignment = PP_ALIGN.LEFT

# Subtitle
p2 = tf.add_paragraph()
p2.text = "Acme Corp — Confidential"
p2.font.size = Pt(20)
p2.font.color.rgb = RGBColor(0xBB, 0xBB, 0xBB)
p2.alignment = PP_ALIGN.LEFT

prs.save("q1_review.pptx")
print("Presentation saved!")

Building Bullet Point Slides

def add_content_slide(prs, title, bullets, bg_color=RGBColor(0xFF, 0xFF, 0xFF)):
    slide = prs.slides.add_slide(prs.slide_layouts[6])

    # Background
    background = slide.background
    fill = background.fill
    fill.solid()
    fill.fore_color.rgb = bg_color

    # Slide title
    title_box = slide.shapes.add_textbox(Inches(0.8), Inches(0.5), Inches(11), Inches(1))
    tf = title_box.text_frame
    p = tf.paragraphs[0]
    p.text = title
    p.font.size = Pt(32)
    p.font.bold = True
    p.font.color.rgb = RGBColor(0x1a, 0x1a, 0x2e)

    # Accent line under title
    from pptx.shapes import autoshape
    line = slide.shapes.add_shape(
        1,  # Rectangle
        Inches(0.8), Inches(1.45), Inches(2), Inches(0.05)
    )
    line.fill.solid()
    line.fill.fore_color.rgb = RGBColor(0x1a, 0x73, 0xe8)
    line.line.fill.background()

    # Bullet points
    content_box = slide.shapes.add_textbox(Inches(0.8), Inches(1.8), Inches(11), Inches(5))
    tf = content_box.text_frame
    tf.word_wrap = True

    for i, bullet in enumerate(bullets):
        if i == 0:
            p = tf.paragraphs[0]
        else:
            p = tf.add_paragraph()
        p.text = f"  {bullet}"
        p.font.size = Pt(20)
        p.font.color.rgb = RGBColor(0x33, 0x33, 0x33)
        p.space_after = Pt(12)

    return slide

# Usage
add_content_slide(prs, "Key Wins This Quarter", [
    "Landed 3 enterprise accounts worth $1.2M combined ARR",
    "Reduced customer onboarding time from 14 days to 3 days",
    "Launched self-serve analytics dashboard — 89% adoption in week one",
    "Engineering velocity up 34% after platform migration",
    "NPS improved from 64 to 72 — highest score in company history"
])

Adding Charts

from pptx.chart.data import CategoryChartData
from pptx.enum.chart import XL_CHART_TYPE

def add_chart_slide(prs, title, categories, series_data):
    slide = prs.slides.add_slide(prs.slide_layouts[6])

    # Title
    title_box = slide.shapes.add_textbox(Inches(0.8), Inches(0.5), Inches(11), Inches(1))
    tf = title_box.text_frame
    p = tf.paragraphs[0]
    p.text = title
    p.font.size = Pt(32)
    p.font.bold = True

    # Chart data
    chart_data = CategoryChartData()
    chart_data.categories = categories

    for series_name, values in series_data.items():
        chart_data.add_series(series_name, values)

    # Add chart to slide
    chart = slide.shapes.add_chart(
        XL_CHART_TYPE.COLUMN_CLUSTERED,
        Inches(1), Inches(1.8), Inches(11), Inches(5),
        chart_data
    ).chart

    # Style the chart
    chart.has_legend = True
    chart.legend.include_in_layout = False
    chart.style = 2

    return slide

# Usage — Revenue by quarter
add_chart_slide(prs, "Revenue Trend",
    ["Q2 2025", "Q3 2025", "Q4 2025", "Q1 2026"],
    {
        "Revenue ($M)": [2.8, 3.1, 3.6, 4.2],
        "Target ($M)": [3.0, 3.2, 3.5, 4.0]
    }
)

Adding Tables

def add_table_slide(prs, title, headers, rows):
    slide = prs.slides.add_slide(prs.slide_layouts[6])

    # Title
    title_box = slide.shapes.add_textbox(Inches(0.8), Inches(0.5), Inches(11), Inches(1))
    tf = title_box.text_frame
    p = tf.paragraphs[0]
    p.text = title
    p.font.size = Pt(32)
    p.font.bold = True

    # Create table
    num_rows = len(rows) + 1  # +1 for header
    num_cols = len(headers)
    table_shape = slide.shapes.add_table(
        num_rows, num_cols,
        Inches(0.8), Inches(1.8), Inches(11.5), Inches(4.5)
    )
    table = table_shape.table

    # Header row
    for i, header in enumerate(headers):
        cell = table.cell(0, i)
        cell.text = header
        for paragraph in cell.text_frame.paragraphs:
            paragraph.font.bold = True
            paragraph.font.size = Pt(14)
            paragraph.font.color.rgb = RGBColor(0xFF, 0xFF, 0xFF)
        cell.fill.solid()
        cell.fill.fore_color.rgb = RGBColor(0x1a, 0x1a, 0x2e)

    # Data rows
    for row_idx, row_data in enumerate(rows):
        for col_idx, value in enumerate(row_data):
            cell = table.cell(row_idx + 1, col_idx)
            cell.text = str(value)
            for paragraph in cell.text_frame.paragraphs:
                paragraph.font.size = Pt(12)
            if row_idx % 2 == 0:
                cell.fill.solid()
                cell.fill.fore_color.rgb = RGBColor(0xF0, 0xF0, 0xF0)

    return slide

# Usage
add_table_slide(prs, "Product Line Performance",
    ["Product", "Revenue", "Growth", "Margin"],
    [
        ["Analytics Pro", "$1.8M", "+24%", "78%"],
        ["DataSync", "$1.4M", "+15%", "72%"],
        ["API Gateway", "$0.7M", "+31%", "85%"],
        ["Consulting", "$0.3M", "-5%", "45%"],
    ]
)

Running the Generated Script

Once Claude has produced the complete script, two execution options are available:

Option A: Cowork executes the script.

Please run the Python script you just created and open the resulting
PowerPoint file so I can review it.

Cowork opens a terminal, executes the script, and then opens the generated .pptx file in PowerPoint.

Option B: The user executes the script directly.

python create_presentation.py

Key Takeaway: The python-pptx method provides a reusable, version-controlled, and data-driven approach to presentation generation. Scripts can be saved, parameterised, and rerun to regenerate updated decks whenever new data arrives. The approach is particularly valuable for recurring presentations such as weekly reports or monthly board updates.

Method 3: Outline and Manual Creation

Full automation is not always desirable. In some cases, Claude’s strategic contribution—structure, narrative arc, and content—is valuable, but the user prefers to design the slides personally. Method 3 is intended for those who value creative control while wishing to avoid the blank-page problem.

How It Works

Claude is asked to produce a detailed slide-by-slide outline that includes:

Slide title and layout recommendation
Exact content (bullet points, key figures, quotes)
Speaker notes with talking points and timing
Design suggestions (colors, imagery, chart types)
Transition recommendations between slides

Example Prompt

I need to create a presentation about our company's cloud migration strategy.

Audience: C-suite executives (non-technical)
Duration: 20 minutes
Slides: 12-15

Please create a detailed slide-by-slide outline with:
1. Slide title
2. Layout type (title slide, content, two-column, full-image, chart, etc.)
3. Exact text content for each element
4. Speaker notes (what I should say, not what's on screen)
5. Design notes (suggested imagery, colors, chart types)
6. Estimated time per slide

Focus on business impact, cost savings, and risk mitigation.
Avoid technical jargon — this is for executives, not engineers.

What Claude Produces

Claude generates output of the following form for each slide:

SLIDE 4: The Cost of Staying Put
Layout: Two-column with key metric callout

LEFT COLUMN:
- Current infrastructure costs: $2.4M/year
- Annual growth in server costs: 23%
- Unplanned downtime last year: 47 hours
- Revenue impact of downtime: $890K

RIGHT COLUMN:
[Suggested chart: Line graph showing infrastructure cost trajectory
over 5 years if no action is taken — hockey stick curve]

KEY METRIC (large, centered below columns):
"By 2028, maintaining current infrastructure will cost $6.1M/year"

SPEAKER NOTES:
"This slide is your wake-up call moment. Pause after revealing the
$6.1M figure. Let it sink in. Then say: 'And that's just the
direct cost — it doesn't include the opportunity cost of our
engineering team spending 30% of their time on maintenance instead
of building new features.' Estimated time: 2 minutes."

DESIGN NOTES:
Use red/warning colors for the cost figures. The chart should show
a clear upward trend that looks unsustainable. Consider a subtle
red gradient background to reinforce urgency.

The level of detail allows each slide to be built quickly because the strategic work has already been completed. Only the design execution remains.

Tip: Claude should also be requested to generate a “presentation narrative arc,” a one-paragraph summary of the emotional progression intended for the audience. For example: “Begin with urgency around the cost problem, move to hope through the cloud opportunity, build confidence with the migration plan, and close with optimism about the future state.” Such an arc keeps the deck cohesive.

Practical Examples: Four Real-World Decks

Concrete examples are more useful than abstract discussion. The four presentations below illustrate common scenarios, with the exact prompts to provide to Cowork and the expected outputs.

Quarterly Business Review (10 Slides)

The prompt:

Create a 10-slide quarterly business review deck in PowerPoint.

Company: TechFlow Inc.
Period: Q1 2026

Data:
- Revenue: $8.7M (plan was $8.2M) — 106% attainment
- Gross margin: 74% (up from 71%)
- Headcount: 142 (added 18 in Q1)
- Customer count: 520 (net new: 47)
- Logo churn: 3 customers (0.6%)
- NRR: 118%
- Top deal: Megacorp ($420K ACV)
- Pipeline for Q2: $12.4M weighted

Slides needed:
1. Title slide
2. Executive summary — 4 key metrics in large numbers
3. Revenue vs plan (bar chart)
4. Revenue by segment (pie chart: Enterprise 55%, Mid-market 30%, SMB 15%)
5. Customer metrics (new logos, churn, NRR)
6. Top wins — 3 biggest deals with logos
7. Product updates — 3 major releases
8. Team growth — hiring progress
9. Q2 outlook and priorities
10. Appendix — detailed financial table

Use a clean, modern theme with navy (#1a1a2e) and electric blue (#1a73e8).
Save as "TechFlow_Q1_2026_QBR.pptx"

What Cowork produces: A complete 10-slide deck with formatted charts, styled tables, consistent branding, and speaker notes. The entire process takes approximately three to five minutes for computer use, or it is generated almost instantly as a python-pptx script.

Startup Pitch Deck (12 Slides)

The prompt:

Create a 12-slide Series A pitch deck for an AI-powered legal tech startup.

Company: LegalMind AI
Mission: Making legal research 10x faster with AI
Stage: Series A — raising $8M
Key metrics: $2.1M ARR, 200+ law firms, 95% retention, 3x YoY growth

Follow the classic pitch deck structure:
1. Title / hook
2. Problem — legal research takes 10+ hours per case
3. Solution — AI-powered case law analysis
4. Product demo screenshots (use placeholder images)
5. Market size — $28B legal tech market, $4B serviceable
6. Business model — SaaS, $500-$5,000/month per firm
7. Traction — growth chart, key logos, metrics
8. Competition — 2x2 quadrant (speed vs accuracy)
9. Team — 3 founders with relevant backgrounds
10. Go-to-market strategy
11. Financial projections — 3-year revenue forecast
12. The ask — $8M for engineering, sales, expansion

Design: Minimalist, white background, accent color #6C5CE7 (purple).
Make it investor-ready — clean, no clutter, big numbers.

Technical Architecture Presentation

The prompt:

Create a technical architecture presentation for our platform migration.

Audience: Engineering team (technical)
Length: 15 slides

Cover:
- Current architecture (monolith on EC2)
- Target architecture (microservices on EKS)
- Migration phases (4 phases over 6 months)
- Service decomposition plan
- Data migration strategy
- CI/CD pipeline changes
- Monitoring and observability stack
- Risk mitigation
- Timeline and milestones

Include architecture diagram descriptions (text-based, I'll replace
with actual diagrams) and code snippets showing key config changes.

Style: Dark theme suitable for screen sharing. Use monospace fonts
for technical content.

Sales Proposal Deck

The prompt:

Create a sales proposal deck for a prospective enterprise customer.

Our company: CloudSync (data integration platform)
Prospect: Global Retail Corp (Fortune 500 retailer)
Deal size: $350K/year
Competition: They're also evaluating Informatica and Fivetran

Create 10 slides:
1. Title with both company logos (placeholders)
2. Understanding their challenges (data silos, slow reporting)
3. Our solution overview
4. Technical fit — integration with their stack (Snowflake, SAP, Shopify)
5. Implementation timeline (8 weeks)
6. Case study — similar retailer, 60% faster reporting
7. ROI analysis — $1.2M annual savings
8. Pricing — 3 tiers with recommended option highlighted
9. Why us vs competition (comparison table)
10. Next steps and timeline

Design: Professional, trustworthy. Use their brand colors (green #2E7D32)
alongside ours (blue #1565C0).

Key Takeaway: Each prompt above includes specific data, a clear structure, design preferences, and context about the audience. The more detail provided at the outset, the less iteration is required. A well-crafted prompt saves more time than any tool feature.

Advanced Techniques

Once the basics are familiar, the following advanced approaches can extend the presentation workflow further.

Automated Report Decks with Scheduled Tasks

Cowork supports scheduled tasks, sometimes called “recurring tasks.” Claude can therefore be configured to generate presentations on a schedule. For example, every Monday morning a fresh weekly metrics deck can be deposited in the Downloads folder, populated with the latest data.

Configuration proceeds as follows:

Set up a recurring task: Every Monday at 8 AM, generate a weekly
metrics presentation.

Steps:
1. Read the latest data from our metrics spreadsheet at
   ~/Documents/weekly_metrics.csv
2. Run the Python script at ~/scripts/generate_weekly_deck.py
   with the CSV as input
3. Save the output as ~/Presentations/Weekly_Report_[DATE].pptx
4. Notify me when complete

Cowork retains the task and executes it on schedule: the latest data is read, the generation script is run, and an updated deck is produced each week without manual intervention.

Data-Driven Presentations from CSV and Excel

One of the most powerful patterns is to provide Cowork with a data file and allow it to build a presentation around the data:

I've attached our Q1 sales data in sales_q1_2026.csv. Please:

1. Analyze the data and identify key trends
2. Create a 10-slide presentation that tells the story of our Q1 sales
3. Include charts generated from the actual data
4. Highlight the top 5 performing products and bottom 3
5. Add a forecast slide projecting Q2 based on current trends
6. Use the python-pptx approach to ensure charts are data-accurate

The audience is our VP of Sales — focus on actionable insights,
not just data display.

Cowork reads the CSV, performs the analysis, generates appropriate visualisations, and builds a presentation that tells a coherent story from the data.

Using Projects for Brand Consistency

Claude’s Projects feature allows context to be saved across conversations. The feature can be used to maintain brand guidelines:

Add this to our project context:

BRAND GUIDELINES FOR ALL PRESENTATIONS:
- Primary color: #1a1a2e (Dark Navy)
- Secondary color: #1a73e8 (Electric Blue)
- Accent color: #e8f4fd (Light Blue)
- Font: Calibri for body, Calibri Light for headings
- Logo: Always place in top-right corner of title slide
- Footer: "Confidential — [Company Name] — [Date]" on every slide
- Slide numbers: Bottom-right, starting from slide 2
- Chart style: Minimal grid lines, data labels on bars
- Maximum 6 bullet points per slide, maximum 8 words per bullet

Every presentation that Claude is asked to create within that Project then follows these guidelines automatically.

From Research to Deck: Web Search Integration

Cowork can browse the web. It can therefore research a topic and build a presentation from the resulting findings:

I need a presentation on "The State of AI in Healthcare — 2026" for
a healthcare conference.

Please:
1. Research the latest trends, statistics, and key players in AI healthcare
2. Find 3-4 compelling case studies of AI improving patient outcomes
3. Get market size data and growth projections
4. Compile everything into a 15-slide presentation
5. Include source citations on each slide
6. Add a references slide at the end

Target audience: Hospital administrators (non-technical).
Focus on ROI and patient outcomes, not technical architecture.

Cowork opens a browser, searches for relevant information, compiles findings, and builds a fully sourced presentation in a single workflow.

Prompt Engineering for Better Presentations

The quality of an AI-generated presentation is directly proportional to the quality of the prompt. The templates below consistently produce strong results.

Effective Prompt Templates

Presentation Type	Key Prompt Elements	Example Snippet
Pitch Deck	Problem, solution, market size, traction, team, ask	“Create a 12-slide Series A pitch… $2M ARR, raising $8M…”
Business Review	KPIs, period comparison, wins, challenges, outlook	“10-slide QBR… revenue $4.2M (+18% YoY)… Q2 priorities…”
Technical Architecture	Current state, target state, migration plan, risks	“Architecture deck for engineering… monolith to microservices…”
Sales Proposal	Customer pain, solution fit, ROI, pricing, vs. competition	“Proposal for Fortune 500 retailer… competing against Informatica…”
Training / Onboarding	Learning objectives, step-by-step content, quizzes	“15-slide onboarding deck for new engineers… include quiz…”
Conference Talk	Narrative arc, audience level, demo placeholders, Q&A	“30-minute keynote on AI trends… for non-technical CxOs…”
Board Update	Financial summary, strategic progress, risks, asks	“Board deck… focus on runway, burn rate, strategic milestones…”

Tips for Writing Effective Prompts

Always specify the audience. A presentation for engineers differs substantially from one for investors. Telling Claude who will be in the room shapes vocabulary, level of detail, and persuasion strategy.

State the number of slides. Without an explicit target, Claude may produce eight slides or thirty. Specify clearly, for example “Create exactly 12 slides.”

Define the tone. “Professional but approachable” yields different results from “formal and data-heavy” or “energetic and startup-oriented.” A few adjectives provide useful direction.

Include real data. The principal difference between a generic AI deck and a useful one is the presence of real numbers. Supplying actual metrics renders the resulting presentation immediately actionable.

Request speaker notes. Even when the material is familiar, talking points reduce preparation time. A useful request is “detailed speaker notes with timing estimates for each slide.”

Specify design constraints. Brand colours, preferred fonts, layout preferences (minimal compared with data-dense), and a light or dark theme should be stated.

Indicate what to exclude. Constraints such as “No clip art. No stock photo clichés. No slides with more than 20 words.” often improve output quality more effectively than additive instructions.

Comparison: Claude Cowork and Other AI Presentation Tools

Claude Cowork is not the only AI tool that supports presentation creation. Its position relative to alternative tools is summarised below.

Feature	Claude Cowork	Microsoft Copilot	Gamma.app	Beautiful.ai	SlidesGPT
Creates.pptx files	Yes (both methods)	Yes (native)	Export only	Export only	Yes
Works with existing PPT	Yes (computer use)	Yes (native)	No	No	No
Data-driven charts	Yes (python-pptx)	Yes (Excel integration)	Limited	Limited	Basic
Programmatic/scriptable	Yes (Python scripts)	No	API only	No	API only
Web research built in	Yes	Yes (Bing)	Yes	No	No
Scheduled automation	Yes (Cowork tasks)	No	No	No	No
Design quality (out of box)	Good (needs guidance)	Good (uses PPT themes)	Excellent	Excellent	Average
General AI assistant	Yes (full Claude)	Limited to Office	Presentations only	Presentations only	Presentations only
Price	$20/mo (Pro)	$30/mo (M365 Copilot)	$10/mo (Plus)	$12/mo (Pro)	$4.17/deck

When to choose Claude Cowork: Cowork is appropriate when maximum flexibility is required, that is, when a single tool must create presentations and also write code, analyse data, conduct research, and automate recurring workflows. It is the strongest option when presentation needs extend beyond well-designed slides into data analysis, scripting, and multi-step automation.

When to choose Copilot: Copilot is appropriate for users already embedded in the Microsoft ecosystem who want seamless integration with Excel, Word, and Teams. It operates natively inside PowerPoint, which provides better theme support and fewer formatting irregularities.

When to choose Gamma or Beautiful.ai: These tools are appropriate when design quality is the principal concern and PowerPoint compatibility is not required. They produce visually striking decks with minimal effort, although the user is bound to their respective ecosystems.

Limitations and Workarounds

No tool is without weaknesses. A candid assessment of where Cowork’s presentation capabilities encounter limits, together with corresponding workarounds, is provided below.

Computer Use Precision

The limitation: Cowork’s computer use is in research preview. It interprets the screen via screenshots and therefore occasionally misclicks, selects the wrong menu item, or places text in the wrong text box. Complex PowerPoint interfaces with many nested menus can lead to confusion.

The workaround: Use the python-pptx method for presentations that require pixel-perfect precision. Computer use should be reserved for simpler decks or for editing existing presentations where Claude can be guided step by step. Specific slides can also be zoomed into so that Claude can focus on one element at a time.

Complex Animations and Transitions

The limitation: Although Cowork can apply basic transitions such as fade and slide, complex animation sequences—such as bullet points appearing one by one with specific timing or morphing between slides—are difficult to achieve through computer use and are not fully supported in python-pptx.

The workaround: Claude should build the content and static design, with animations added manually afterwards. Animating a finished deck requires substantially less time than building one from scratch. Alternatively, Claude can be asked to document the animation plan, for example: “Slide 5: bullets should appear on click, one at a time, with a 0.3s fade-in.”

Image-Heavy Presentations

The limitation: Claude cannot generate images, since it is a language model rather than an image generator. Cowork can search the web for images and insert them, but the results may not match the user’s brand aesthetic, and copyright considerations apply.

The workaround: Claude should be asked to create placeholder boxes with descriptive labels such as “[Photo: Team celebrating product launch]” or “[Chart: Market size growth 2020–2026].” The user or a designer can then replace these with actual assets. For icons, Claude can suggest free icon libraries such as Google Material Icons or Feather Icons.

Custom Template Compliance

The limitation: If the user’s organisation requires a strict PowerPoint template with custom slide masters, layouts, and placeholders, Cowork may not navigate the template perfectly through computer use.

The workaround: python-pptx should be used with the organisation’s template file as the base:

from pptx import Presentation

# Load your company template
prs = Presentation('company_template.pptx')

# Now add slides using the template's layouts
slide_layout = prs.slide_layouts[1]  # Your company's content layout
slide = prs.slides.add_slide(slide_layout)

# Content goes into the template's predefined placeholders
title = slide.placeholders[0]
title.text = "Q1 Revenue Analysis"

body = slide.placeholders[1]
body.text = "Revenue grew 18% year-over-year..."

prs.save('branded_presentation.pptx')

The approach ensures that every slide uses approved layouts, fonts, and branding elements.

Very Large Presentations

The limitation: For decks exceeding 30–40 slides, computer use can become slow and may lose context regarding earlier slides. python-pptx scripts can also become unwieldy at scale.

The workaround: Large presentations should be broken into sections. Claude can be asked to create slides 1–15, the result reviewed, and slides 16–30 added subsequently. For python-pptx, modular functions (one function per section) keep the code maintainable.

Caution: AI-generated presentations should always be reviewed before they are shared externally. Data accuracy, spelling of names and company-specific terms, and the fidelity of charts to the underlying data must be verified. AI systems can fabricate numbers or subtly misrepresent trends when the source data is ambiguous.

Best Practices for AI-Generated Presentations

The following practices consistently produce the strongest results in extensive use of Claude Cowork.

Always Review and Refine

AI-generated slides should be treated as a first draft, not a final product. Claude advances the user 80–90% of the way to completion in a fraction of the usual time. The final 10–20%—personal touches, precise data verification, and nuances known only to the author—is what makes a presentation truly excellent.

A review checklist should be built:

Are all numbers accurate and up to date?
Do charts correctly represent the data?
Are company names, product names, and people’s names spelled correctly?
Does the narrative flow logically from slide to slide?
Is the tone appropriate for the audience?
Are there any claims that need citations?

Maintain Brand Consistency

Claude’s Projects feature should be used to store brand guidelines including colours, fonts, logo placement, and slide layouts. This eliminates the need to repeat brand instructions in every prompt and ensures consistency across all presentations.

A more robust approach is to create a python-pptx base module containing the brand settings:

# brand.py — import this in all presentation scripts
from pptx.dml.color import RGBColor
from pptx.util import Pt

# Company colors
PRIMARY = RGBColor(0x1a, 0x1a, 0x2e)
SECONDARY = RGBColor(0x1a, 0x73, 0xe8)
ACCENT = RGBColor(0xe8, 0xf4, 0xfd)
TEXT_DARK = RGBColor(0x33, 0x33, 0x33)
TEXT_LIGHT = RGBColor(0xFF, 0xFF, 0xFF)
SUCCESS = RGBColor(0x27, 0xAE, 0x60)
WARNING = RGBColor(0xE7, 0x4C, 0x3C)

# Typography
HEADING_SIZE = Pt(32)
SUBHEADING_SIZE = Pt(24)
BODY_SIZE = Pt(18)
CAPTION_SIZE = Pt(12)

# Standard settings
FONT_FAMILY = "Calibri"
MAX_BULLETS_PER_SLIDE = 6
MAX_WORDS_PER_BULLET = 8

Keep Slides Minimal

The most common error in presentations, whether AI-generated or not, is excess text on each slide. The following guidelines should be followed:

6 x 6 rule: A maximum of six bullet points per slide and six words per bullet.
One idea per slide. A slide that covers two topics should be split into two slides.
Allow visuals to breathe. White space is not wasted space; it is design.
Use the speaker notes for detail. A slide is a visual aid, not a document. Details should be placed in the notes and spoken aloud.

The principles should be stated to Claude at the outset, for example: “Follow the 6×6 rule. Keep slides minimal. Place detailed information in the speaker notes rather than on the slides.”

Add Custom Data Visualisations

Although python-pptx can produce basic charts and Cowork can use PowerPoint’s built-in chart tools, the most important visualisations deserve dedicated attention. Options include:

Creating charts in Excel or Google Sheets first and then pasting them into the deck.
Using Python libraries such as matplotlib or plotly to generate chart images, which are then inserted into slides.
Using dedicated data visualisation tools such as Tableau or Power BI for complex dashboards, with the relevant views captured as screenshots.

Claude can be asked to generate the chart code separately:

Generate a matplotlib chart showing our revenue trend:
Q1 2025: $2.1M, Q2: $2.8M, Q3: $3.1M, Q4: $3.6M, Q1 2026: $4.2M

Style it with our brand colors. Save as revenue_chart.png at 300 DPI.
Then insert it into slide 3 of the presentation.

Version-Control Presentation Code

For users of the python-pptx method, presentation scripts should be treated as any other code:

Scripts should be kept in a Git repository.
Meaningful file names should be used, for example q1_2026_qbr.py rather than presentation.py.
Data inputs should be parameterised so that the same script can generate decks for different quarters.
A short README explaining how to run each script should accompany the scripts.

The practice is particularly valuable for recurring presentations: a Q2 deck is only a data update away from the Q1 script.

Use an Iterative Approach

It is not advisable to attempt a perfect presentation in a single prompt. Instead, the following passes are recommended:

First pass: Generate the structure and core content.
Second pass: Refine the narrative. Claude should be asked to improve flow, strengthen the opening, and sharpen the conclusion.
Third pass: Polish the design, adjust colours, fix alignment, and ensure consistency.
Final pass: Add speaker notes, verify data, and conduct a full review.

Each pass takes a fraction of the time required to produce everything from scratch, and the iterative approach yields substantially better results than attempting to achieve everything in a single attempt.

Final Thoughts

Creating presentations has long been a task that many professionals dread: time-consuming, creatively demanding, and often producing underwhelming results. Claude Cowork substantially changes this calculus.

With three distinct methods available—direct computer use for hands-off creation, python-pptx for programmatic precision, and structured outlines for creative control—the appropriate approach can be matched to each situation. A quick internal update may warrant the speed of computer use. A recurring board deck calls for a parameterised Python script. A high-stakes keynote benefits from Claude’s strategic outline combined with a personal design touch.

The key insight is that Claude Cowork is not merely a presentation tool but a general-purpose AI agent that happens to be effective at presentations. It can research a topic, analyse data, write content, build slides, and automate the entire process on a schedule. No other single tool offers that breadth.

The recommended starting point is a simple deck. The computer use method should be tried first to observe Claude opening PowerPoint and building slides in real time. Python-pptx can then be explored for a data-driven report. The eight hours per week spent on manual creation will soon appear unnecessary.

The next strong presentation is one prompt away.

References

April 4, 2026

Claude Cowork: Anthropic’s Desktop AI Agent That Works While You Sleep

Summary

What this post covers: A detailed examination of Claude Cowork, Anthropic’s desktop-first autonomous agent launched on 16 January 2026, including its capabilities, the January-March 2026 release timeline, the manner in which it differs from Claude Code, pricing, real-world use cases, and the competitive landscape.

Key insights:

Cowork is positioned for non-technical knowledge workers, while Claude Code targets developers. Both run on the same Claude models, but Cowork emphasizes desktop control, Google Drive and Gmail integration, and phone dispatch rather than a CLI or IDE workflow.
The March 2026 computer-use update is the inflection point: Cowork can now click through GUIs, fill forms, and use applications that have no API, substantially expanding what can be automated beyond integration-supported tools.
Persistent Projects and scheduled tasks are the features that cause Cowork to function as a colleague rather than a chatbot. It retains context across sessions, dispatches work from a phone, and runs jobs overnight on a schedule.
At $20 per month for the Pro tier, the return-on-investment calculation is favourable for any user whose recurring research, reporting, or email-triage work consumes several hours per week. Those hours, rather than the subscription cost, represent the real expense being reduced.
Cowork remains a research preview: computer use can be unreliable on complex interfaces, the integration list is incomplete, and human oversight remains essential for any high-stakes deliverable.

Main topics: What Is Claude Cowork?, Key Features That Define Cowork, Claude Cowork and Claude Code, Real-World Use Cases Across Industries, Pricing and Plans, How Cowork Compares with the Competition, Getting Started with Claude Cowork, Limitations and Considerations, Likely Future Directions for Cowork, Conclusion, References.

Consider the experience of waking to find a weekly competitive analysis already compiled, the inbox triaged and summarized, and a polished research brief on the desktop, all completed overnight by an AI agent dispatched from a mobile device the prior evening. This scenario is not science fiction. As of early 2026, it describes an available product. The product is called Claude Cowork, and it represents one of the most significant shifts in how non-technical professionals interact with artificial intelligence.

Anthropic, the AI safety company behind the Claude family of models, launched Cowork as a research preview on 16 January 2026. Subsequent substantial updates in February and March 2026 have transformed it from a promising experiment into a tool that materially changes daily workflow for knowledge workers. Unlike traditional AI chatbots that require user attention at every step of a complex task, Cowork operates autonomously, executing multi-step workflows on the desktop computer while the user attends to higher-value work, or while the user sleeps.

This article examines in detail what Claude Cowork is, how it operates, the audience for which it is designed, the manner in which it differs from both Claude Code and competing products, and the procedure for beginning to use it. Readers who are researchers, analysts, operations managers, or any other professionals who spend substantial time on repetitive knowledge work will find a complete description here.

What Is Claude Cowork?

Claude Cowork is a desktop-first AI agent that brings agentic capabilities to non-technical users through the Claude desktop application. It functions as a capable virtual assistant that resides on the user’s computer and can perform actions rather than only suggesting what should be done.

The traditional AI assistant model proceeds as follows: the user asks a question, receives an answer, acts on it, returns with a follow-up, and so on. Each step requires active user involvement. Cowork breaks this pattern entirely. The user describes a task, such as “Research the top five competitors in the European EV charging market, compile their latest quarterly results, and create a comparison table in a Google Doc,” and Cowork executes the entire workflow from start to finish.

Key Takeaway: Claude Cowork is not a chatbot. It is an autonomous agent that executes multi-step tasks on the user’s desktop, accessing files, browsers, and tools without requiring intervention at each step.

The term “Cowork” is deliberate. Anthropic designed this product to function as a skilled colleague seated at a virtual desk beside the user. Tasks are delegated to it as they would be to a team member, with context, instructions, and the expectation that the work will be completed. The distinction is that this colleague operates at machine speed, retains instructions perfectly, and is available continuously.

The Research Preview Timeline

Cowork’s development has progressed rapidly since its initial launch:

Date	Milestone	Key Additions
January 16, 2026	Research Preview Launch	Core agentic workflows, local file access, Projects
February 2026	Integration Expansion	Google Drive, Gmail, scheduled tasks, phone dispatch
March 2026	Computer Use Update	Full desktop control, browser automation, expanded tool integrations

Each update has meaningfully expanded Cowork’s capabilities. The March 2026 computer use update was particularly significant, as it gave Cowork the ability to interact directly with the computer’s graphical interface, opening applications, clicking buttons, filling forms, and navigating websites in the manner of a human user.

Key Features That Define Cowork

The following sections examine the features that define Claude Cowork and make it genuinely useful in day-to-day work.

Multi-Step Task Execution

This is the foundational capability that distinguishes Cowork from a standard chatbot. Given a complex task, Cowork decomposes it into steps, executes each one, handles errors and edge cases, and delivers a completed result.

Consider the task of preparing a board-meeting brief. With a traditional AI assistant, the following sequence is required:

Ask for a summary of recent financial performance
Copy that output somewhere
Ask for a competitive landscape overview
Copy that too
Ask for key risk factors
Manually compile everything into a document
Format it properly

With Cowork, the user issues a single instruction: “Prepare my Q1 board meeting brief using the financial data in my Google Drive, our competitor tracker spreadsheet, and the risk register document. Format it as a polished PDF with our standard template.” Cowork then autonomously accesses each source, synthesizes the information, formats the document, and saves the finished product to the specified location.

Computer Use (March 2026)

The March 2026 update introduced full computer use capabilities, a transformative addition. Cowork can now perform the following actions:

Open and interact with desktop applications: word processors, spreadsheets, presentation software, email clients
Navigate web browsers: search the web, log into services, fill out forms, download files
Manipulate files: create, move, rename, and organize files and folders on the user’s system
Use specialized tools: interact with industry-specific software that does not provide an API integration

This functionality is what makes Cowork resemble a colleague rather than software. It can use the computer as a person would, clicking through interfaces, reading the screen, and taking appropriate actions. The implications for automation are considerable, because Cowork is not limited to applications with built-in API integrations. If a human can use an application through a graphical interface, Cowork can typically do so as well.

Caution: Computer use remains in its early stages. While capable, it occasionally misclicks or misreads screen elements. The output of computer-use tasks should always be reviewed, particularly for high-stakes work such as financial transactions or legal documents.

Local File Access

Among Cowork’s most practical features is its ability to read and write local files without the friction of manual uploads and downloads. Previous AI workflows required users to copy-paste text, upload documents to a web interface, wait for processing, and download the results. Cowork accesses the local file system directly.

The user can therefore direct Cowork at a folder of PDFs with an instruction such as “Summarize each document and create a master index,” and Cowork will process them in sequence without any manual file handling. For professionals who handle large volumes of documents (legal teams reviewing contracts, analysts processing earnings reports, researchers compiling literature reviews) this provides a substantial time saving.

Task Dispatch from a Phone

This is where the “works while the user sleeps” claim becomes literal. The user can message Claude from a phone, describe a task, and Cowork will execute it on the desktop computer. The desktop does not need to be actively in use; provided it is powered on and connected, Cowork can operate.

Consider the following scenario: while commuting home on the train, the user recalls the need for a summary of all customer-feedback emails from the past week for the following morning’s meeting. The user opens the phone and messages Claude: “Go through my Gmail, find all customer feedback emails from the past seven days, categorize the feedback by theme, and create a summary document on my desktop.” By the time the user arrives home, the work is complete.

Tip: For phone-dispatched tasks to operate reliably, the desktop Claude application should be running and the computer should not be in sleep mode. System power settings can be configured to prevent sleep during working hours.

Scheduled Tasks

Cowork supports scheduled tasks: recurring automated workflows that run on a defined cadence. Some useful examples include:

Daily morning briefing: Every day at 7 AM, Cowork compiles overnight news relevant to your industry, checks your calendar for the day, and generates a one-page briefing document
Weekly report generation: Every Friday at 4 PM, Cowork pulls data from your tracking spreadsheets and generates a formatted weekly status report
Automated file processing: Whenever new files appear in a designated folder, Cowork processes them according to your instructions—extracting data, reformatting, or routing to the appropriate location
Email digests: Twice daily, Cowork scans your inbox, identifies high-priority items, and sends you a categorized summary

This scheduled-task functionality moves Cowork from a reactive tool (the user asks, the tool acts) to a proactive one (the tool acts automatically according to user-defined rules). For teams with repetitive operational workflows, this capability alone can justify the subscription cost.

Projects: Persistent Workspaces

Projects are persistent workspaces within Cowork in which files, links, instructions, and context can be stored. The agent retains this material across sessions. A Project may be understood as a briefing folder for a specific area of work.

For example, a user might create a Project titled “Competitive Intelligence” containing the following:

Links to competitor websites and press pages
Your company’s competitive positioning document
Instructions on how you want competitive updates formatted
Previous reports for style reference
A list of key metrics to track

When the user requests any task within that Project, this context is immediately available. There is no need to re-explain preferences or re-upload reference documents on each occasion. The agent accumulates institutional knowledge over time and becomes more useful with continued use within a given Project.

Tool Integrations

Cowork connects with a growing list of third-party services through direct integrations:

Category	Integrations	Key Capabilities
Productivity	Google Drive, Google Docs, Google Sheets	Read, create, and edit documents and spreadsheets
Communication	Gmail	Read, search, and draft emails
Legal / Contracts	DocuSign	Prepare and route documents for signature
Finance / Data	FactSet	Pull financial data, market metrics, and analytics
Web Research	Built-in web search	Search the web and internal document repositories

These integrations enable Cowork to execute end-to-end workflows that span multiple tools. A single task might involve retrieving data from FactSet, researching context on the web, creating a formatted report in Google Docs, and emailing the finished product via Gmail, all without the user touching any of these applications.

Web Research

Cowork can search both the open web and internal document repositories. This dual capability is particularly valuable for research tasks that require the combination of public information (market data, news, academic papers) with proprietary internal knowledge (company reports, internal wikis, prior analyses).

The web-research capability extends beyond simple search. Cowork can visit multiple pages, extract relevant information, cross-reference sources, and synthesize findings into coherent analysis. For research-intensive roles, this can compress hours of manual research into minutes.

Claude Cowork and Claude Code: Understanding the Difference

Readers already familiar with Claude Code may wonder how Cowork relates to it. The answer is straightforward: they are designed for fundamentally different users and use cases.

Dimension	Claude Code	Claude Cowork
Interface	Command-line terminal (CLI)	Desktop application (GUI)
Primary users	Software developers, DevOps engineers	Knowledge workers, analysts, researchers, operations teams
Core capability	Write, debug, and deploy code	Execute knowledge work tasks across desktop tools
Technical requirement	Terminal proficiency required	No terminal or coding skills needed
Execution environment	Shell, filesystem, git, package managers	Desktop apps, browsers, cloud services
Typical task	“Refactor this module and write tests”	“Compile a competitive analysis from these sources”
Computer use	No (operates via CLI)	Yes (can control desktop GUI)
Phone dispatch	No	Yes
Scheduled tasks	Via cron/CI (manual setup)	Built-in scheduling feature

The distinction may be summarized as follows: Claude Code is for users who work primarily in the terminal; Claude Cowork is for users who work primarily in documents, spreadsheets, and email.

There is some overlap. Both products can access local files, both can perform research, and both can execute multi-step tasks autonomously. The execution environment and target user profile, however, differ entirely. A software engineer building a web application requires Claude Code. A financial analyst constructing an investment thesis requires Claude Cowork.

Many advanced users will require both. A startup CTO might use Claude Code for development work during the day and Claude Cowork for business planning, investor communications, and market research. The two products complement rather than compete with one another.

Key Takeaway: Claude Code and Claude Cowork are companion products rather than competitors. Code targets developers through the CLI; Cowork targets knowledge workers through a desktop GUI. The choice should be guided by workflow, and both can be used together.

Real-World Use Cases Across Industries

The most effective method of understanding Cowork’s value is through concrete examples. The following detailed use cases span several professional domains.

Research and Analysis

A market-research analyst must compile a report on the state of autonomous-vehicle regulation across ten countries. Traditionally, this task requires two to three days of manual research, reading regulatory documents, cross-referencing sources, and constructing comparison tables.

With Cowork, the analyst creates a Project titled “AV Regulation Research” and provides instructions: which countries to cover, which regulatory dimensions to compare, the desired output format, and links to key regulatory-body websites. Cowork then performs the following steps:

Searches the web for the latest regulatory developments in each country
Accesses government regulatory databases where available
Reads through the analyst’s existing internal research documents in Google Drive
Cross-references all sources to build a comprehensive comparison
Creates a formatted report with comparison tables, source citations, and an executive summary
Saves the finished document to Google Drive and emails the analyst a notification

A task that previously required days is completed in hours, and the analyst’s expertise is applied to reviewing and refining the output rather than to manual data collection.

Financial Analysis

An investment analyst must prepare earnings-season coverage for a portfolio of twenty technology stocks. For each company, the analyst requires a summary of the earnings call, key financial metrics versus consensus, changes in management guidance, and a brief assessment of the quarter.

Cowork can retrieve data from FactSet, search the web for earnings-call transcripts and analyst commentary, compile metrics into standardized comparison tables, and generate individual company summaries together with a portfolio-level overview. The analyst can schedule this work to run automatically as each company reports, so that summaries are available the following morning.

Legal and Compliance

A legal team must review a set of vendor contracts for compliance with new data-privacy regulations. Each contract must be checked against a specific checklist of required clauses, and any gaps must be flagged.

Cowork can read each contract PDF, compare the terms against the compliance checklist stored in the Project, generate a gap analysis for each contract, and compile a summary report identifying compliant vendors and those that require contract amendments. For the non-compliant contracts, Cowork can also draft amendment language based on the team’s standard templates.

Operations and Administration

An operations manager runs a weekly process that requires downloading sales data from a CRM, combining it with inventory data from a separate system, generating a forecast update, and distributing it to regional managers. This process consumes three to four hours each week and involves multiple tools.

With Cowork’s scheduled-task feature, the entire workflow runs automatically every Friday. Cowork accesses the necessary systems (using computer use for applications without API integrations), processes the data, generates the forecast in the standard template, and emails the results to the distribution list. The operations manager reviews the output and approves the dispatch, a ten-minute task in place of a four-hour one.

Email Management

A senior executive receives two hundred or more emails per day. Most are informational, some require responses, and a few are genuinely urgent. Sorting through them constitutes a daily time sink.

Cowork can be configured to perform a twice-daily email triage: read all incoming emails, categorize them by priority and topic, draft responses for routine items (which the executive reviews before sending), flag truly urgent items for immediate attention, and generate a summary document indicating what has arrived and what requires action. This converts email management from an hour-long chore into a focused fifteen-minute review.

Quick Reference: Task Examples

Task	Traditional Approach	With Cowork	Time Saved
Weekly competitive report	4–6 hours manual research	Automated, 20 min review	~80%
Earnings call summaries (20 stocks)	2–3 days of reading/writing	Overnight batch processing	~85%
Contract compliance review (10 docs)	1–2 days legal review	2–3 hours + review	~70%
Daily email triage (200+ emails)	60–90 minutes per day	15-minute review	~75%
Market research report	2–3 days research and writing	4–6 hours + review	~65%
Weekly operations forecast	3–4 hours manual processing	Automated, 10 min review	~90%

Pricing and Plans

Anthropic offers Claude Cowork as part of its broader Claude subscription tiers. The current pricing structure is as follows:

Plan	Price	Cowork Access	Best For
Pro	$20/month	Basic Cowork features, limited task runs	Individual professionals testing agentic workflows
Max	$100–$200/month	Full Cowork with higher limits, priority execution	Power users running frequent or complex workflows
Team	$30/user/month	Cowork with team sharing, shared Projects	Small to mid-size teams collaborating on workflows
Enterprise	Custom pricing	Full Cowork, SSO, audit logs, admin controls, custom integrations	Large organizations with compliance and security requirements

For most individuals, the Pro plan at twenty dollars per month is a reasonable starting point for exploring Cowork’s capabilities. Users who routinely encounter usage limits or operate complex multi-tool workflows will find that the Max tier removes those constraints. Teams that require shared Projects and collaborative workflows should consider the Team plan, while enterprises with specific compliance requirements will require the custom Enterprise tier.

Tip: A starting point on the Pro plan permits evaluation of Cowork for specific use cases. Users can upgrade to Max or Team once they understand how Cowork fits into their workflow and how much capacity they require. Overcommitment in the first month is unnecessary.

The value proposition becomes clear when the subscription cost is compared to the time savings. If Cowork saves an analyst even five hours per week, a conservative estimate based on the use cases described above, that amounts to approximately twenty hours per month. At a fully loaded cost of fifty to one hundred dollars per hour for a knowledge worker, the monthly savings exceed even the Max plan’s subscription fee. The economics are compelling even at modest adoption levels.

How Cowork Compares with the Competition

Claude Cowork does not exist in isolation. Microsoft, Google, and OpenAI each have competing visions for AI-assisted work. The following table compares the principal offerings.

Feature	Claude Cowork	Microsoft Copilot	Google Gemini Workspace	OpenAI Desktop App
Autonomous multi-step tasks	Strong	Moderate	Moderate	Basic
Computer use (GUI control)	Yes	No	No	Limited
Local file access	Yes	Via OneDrive/SharePoint	Via Google Drive	Limited
Phone dispatch	Yes	No	No	No
Scheduled tasks	Built-in	Via Power Automate	Limited	No
Persistent workspaces	Projects	Notebooks	Gems	Custom GPTs
Ecosystem lock-in	Low (cross-platform)	High (Microsoft 365)	High (Google Workspace)	Low
Third-party integrations	Growing (FactSet, DocuSign, etc.)	Deep Microsoft ecosystem	Deep Google ecosystem	Limited
Underlying model quality	Claude (top-tier reasoning)	GPT-4 variants	Gemini models	GPT-4 variants

Areas in Which Cowork Excels

Cowork’s principal advantages are its computer-use capability, phone dispatch, and low ecosystem lock-in. Microsoft Copilot performs well for organizations entirely within the Microsoft 365 ecosystem, but it struggles with tools outside that environment. Google Gemini exhibits the same limitation: capable within Google Workspace but constrained outside it. Cowork’s computer-use feature enables operation with virtually any application, regardless of whether a formal integration exists.

The phone-dispatch feature is also unique among current competitors and represents a genuine workflow innovation. The ability to conceive a task away from one’s desk and immediately dispatch it for execution is not currently available from the major competitors.

Areas in Which Competitors Excel

Microsoft Copilot benefits from deep, native integration with the most widely used office suite. For organizations operating on Microsoft 365, Copilot’s integration with Word, Excel, PowerPoint, Teams, and Outlook is seamless in a way that Cowork cannot fully replicate through external integrations alone.

Similarly, for organizations fully committed to Google Workspace, Gemini’s native integration provides a smoother experience for tasks that remain within the Google ecosystem. The experience of using Gemini inside a Google Doc or Sheet is more refined than having an external agent interact with those same tools.

OpenAI’s desktop app, while currently the least capable of the four in terms of agentic features, benefits from GPT-4’s strong general capabilities together with OpenAI’s substantial user base and brand recognition.

The Principal Differentiator: Agent-First Design

The aspect that most distinguishes Cowork is its agent-first design philosophy. Microsoft and Google added AI capabilities on top of existing productivity suites. Copilot is essentially an intelligent overlay on Office, and Gemini is an intelligent overlay on Workspace. Cowork was built from the outset as an autonomous agent. The difference is evident in how it handles complex, multi-step workflows that span multiple tools and data sources.

When a task requires retrieving data from three sources, combining it, applying analysis, and distributing results across two platforms, Cowork’s agent architecture handles this naturally. Copilot and Gemini, designed primarily for in-app assistance, can struggle with workflows that cross application boundaries.

Getting Started with Claude Cowork

The following step-by-step procedure describes how to begin using Cowork.

Enable Cowork in the Claude Desktop App

Download Claude Desktop. If it is not already installed, the Claude desktop application should be downloaded from claude.ai. It is available for macOS and Windows.
Subscribe to a paid plan. Cowork requires at least a Pro subscription ($20 per month). Log into the Claude account and upgrade if necessary.
Enable Cowork. Open the Claude desktop application, navigate to Settings, and locate the Cowork section. Toggle it on. Additional permissions for local file access and computer use may be required.
Grant permissions. Cowork will request permissions to access the filesystem, the screen, and any integrations to be used. These should be reviewed carefully, and only the relevant ones should be enabled.

Caution: Granting computer-use permissions allows Cowork to control the mouse and keyboard. This capability should be enabled only for tasks in which automated desktop control is acceptable, and the agent’s actions should always be reviewed for sensitive operations.

Set Up the First Task

A simple task is appropriate as the first exercise. The following is a suitable example:

Task: "Read the PDF files in my Documents/Reports folder,
create a one-paragraph summary of each, and compile them
into a single document called 'Report Summaries' on my Desktop."

This task exercises several Cowork capabilities, namely local file access, document reading, text generation, and file creation, while remaining low-stakes enough that the user can readily verify the output.

As familiarity grows, more complex tasks can be attempted:

Week 1: Simple file processing and summarization tasks
Week 2: Multi-source research tasks (combine web research with local documents)
Week 3: Set up your first Project with persistent context
Week 4: Configure scheduled tasks and try phone dispatch

Configure Integrations

To obtain maximum value from Cowork, the services used daily should be connected:

Google Drive: Settings > Integrations > Google Drive > Authorize. This grants Cowork read/write access to Drive files.
Gmail: Settings > Integrations > Gmail > Authorize. This enables email reading, searching, and drafting.
Additional services: The Integrations panel should be reviewed for newly added services. Anthropic is adding integrations regularly during the research preview.

Create the First Project

Projects are the mechanism through which Cowork’s value compounds over time. The procedure for creating one is as follows:

Open the Claude desktop application and navigate to the Projects section.
Click “New Project” and provide a descriptive name.
Add relevant files, links, and reference documents.
Write a set of instructions describing preferences, standards, and common tasks for the domain.
Begin assigning tasks within the Project context.

A well-configured Project substantially improves Cowork’s output quality because the agent has all the context required to produce work that matches the user’s standards and preferences.

Tip: Examples of past work should be included in Projects. If Cowork is to produce weekly reports, two or three examples of well-prepared past reports should be uploaded. Cowork learns style and formatting preferences from these examples.

Set Up Scheduled Tasks

Once a task is ready to run regularly, the following procedure applies:

Run the task manually first to confirm that it produces the desired output.
Open the task and click “Schedule” (or create a new scheduled task).
Set the frequency (daily, weekly, or a custom cron expression).
Set the time of day for execution.
Choose whether to receive a notification on task completion.
Optionally set conditions, for example, run only if new files are present in a specific folder.

One or two scheduled tasks form a reasonable starting point, with expansion from there. A few reliable automated workflows are preferable to a dozen unreliable ones.

Limitations and Considerations

No product review is complete without an honest assessment of limitations. Cowork, still in research preview, has several important limitations.

Research Preview Status

As of April 2026, Cowork remains labelled as a research preview. The implications are as follows:

Features may change, be removed, or be restructured
Reliability, while generally good, is not at production-grade levels for all features
Rate limits and usage caps may shift as Anthropic refines pricing
Some integrations are early-stage and may have rough edges

For critical business processes, human oversight should be retained, and exclusive reliance on Cowork for time-sensitive deliverables should be avoided until the product exits research preview.

Privacy and Data Considerations

Granting Cowork access to local files, email, and cloud storage entails providing an AI system with access to potentially sensitive information. Key considerations include the following:

Data handling: Anthropic’s data-retention policies should be understood. The privacy documentation indicates what data is stored, for how long, and how it is used.
Sensitive documents: Care should be exercised in selecting files and folders to which access is granted. Specific folder permissions can be configured rather than blanket filesystem access.
Email access: Gmail integration permits Cowork to read emails. Whether the inbox contains information that should not be processed by an AI system should be considered.
Computer-use recording: When computer use is active, Cowork captures screenshots to understand the screen contents. This should be borne in mind when sensitive information is displayed.

Caution: Enterprise users should coordinate with their IT and security teams before deploying Cowork. The Enterprise plan includes SSO, audit logs, and administrative controls designed for organizations with strict data-governance requirements.

What Cowork Cannot Do at Present

Real-time collaboration: Cowork operates asynchronously. It cannot join a live meeting and take notes in real time, although it can process meeting recordings after the fact.
Physical actions: It can control the computer but cannot perform any action in the physical world; it cannot print, sign physical documents, or manage physical inventory.
Perfect accuracy on all tasks: As with all AI systems, Cowork can make mistakes. It may misinterpret instructions, miss nuances in documents, or produce inaccurate summaries. Human review remains essential.
Highly specialized domain work: Although Cowork performs well on general knowledge work, tasks that require deep domain expertise (advanced scientific analysis, complex legal strategy, nuanced medical interpretation) continue to require expert human oversight.
Cross-organization workflows: Cowork operates within the user’s own systems and accounts. It cannot directly interact with a colleague’s computer or access systems for which the user lacks credentials.

Setting Reliability Expectations

In practice, Cowork handles straightforward multi-step tasks with high reliability. File processing, research compilation, report generation, and similar workflows succeed consistently. More complex tasks involving computer use, particularly those that navigate unfamiliar or complex user interfaces, exhibit higher failure rates. The recommendation is to begin with simpler tasks and gradually increase complexity as the system’s capabilities and boundaries are understood.

Likely Future Directions for Cowork

Although Anthropic has not published a detailed public roadmap for Cowork, several directions appear likely based on the trajectory of updates and broader industry trends.

Expanded Integrations

The current integration list (Google Drive, Gmail, DocuSign, FactSet) is solid but narrow relative to the universe of business tools. Integrations with CRM platforms such as Salesforce and HubSpot, project-management tools such as Jira and Asana, communication platforms such as Slack and Microsoft Teams, and data-visualization tools such as Tableau and Power BI can be anticipated. Each new integration expands the range of end-to-end workflows that Cowork can automate.

Improved Computer Use

Computer use is Cowork’s most ambitious feature and the one with the most room for improvement. Future updates are likely to bring faster execution, more reliable interaction with complex UIs, improved error recovery, and support for additional applications and web interfaces. As this capability matures, it effectively removes the need for formal integrations for many applications: if Cowork can use the application through its GUI, a dedicated integration becomes optional rather than required.

Enterprise Features

Enterprise adoption requires features that individual users do not need: role-based access controls, detailed audit trails, data-loss-prevention policies, custom model fine-tuning, on-premises deployment options, and integration with enterprise identity-management systems. Substantial investment in this area is expected, since enterprise contracts represent the most significant revenue opportunity for AI platform companies.

Multi-Agent Collaboration

A particularly notable possibility is multi-agent workflows in which several Cowork agents collaborate on a single task. A complex project such as preparing a company’s annual report might be assigned to multiple agents: one handling financial-data analysis, another market research, a third competitor analysis, and a coordinating agent assembling the final document. This divide-and-conquer approach to knowledge work could substantially expand the scope and complexity of tasks Cowork can handle.

Learning and Adaptation

Over time, Cowork should improve at understanding individual users’ preferences, work styles, and quality standards. The Projects feature already enables some of this through explicit instructions and examples. Future versions may learn more implicitly, recognizing, for example, that the user consistently prefers tables to bullet points, prefers executive summaries to be a single paragraph, or prefers financial figures rounded to one decimal place. Such passive learning could substantially reduce the amount of upfront configuration required.

Conclusion

Claude Cowork represents a genuine advance in how non-technical professionals can use AI. It is not merely another chatbot with a new interface. It is a fundamentally different approach to AI-assisted work: an autonomous agent that resides on the desktop, understands the user’s context through persistent Projects, connects to tools through integrations and computer use, and operates even when the user is not actively directing it.

The principal innovations (multi-step task execution, computer use, phone dispatch, scheduled tasks, and persistent Projects) combine to create something that resembles a digital colleague more than a tool. The practical impact is real: tasks that traditionally consumed hours or days of manual work can be completed in a fraction of the time, with the user’s expertise focused on review, refinement, and decision-making rather than on data gathering and formatting.

Is Cowork without limitations? No. It remains in research preview; computer use can be unreliable on complex interfaces; the integration list is still expanding; and human oversight remains essential for high-stakes work. The trajectory, however, is clear. Each monthly update has brought meaningful improvements, and the foundation (an agent-first architecture combined with one of the most capable language models available) is strong.

For knowledge workers who spend substantial time on research, report generation, data compilation, email management, or document processing, Cowork is worth evaluating now. A Pro subscription can be used to build a Project around the most time-consuming recurring task, and the resulting time savings can be measured. The twenty-dollar monthly investment can readily return hundreds of dollars in reclaimed productive hours.

The era of AI that waits for the next prompt is yielding to an era of AI that works alongside the user, and at times in advance of the user. Claude Cowork is one of the most compelling products driving that transition.

References

Anthropic. “Introducing Claude Cowork.” Anthropic News Blog, January 2026.
Anthropic. “Claude Cowork: Computer Use Update.” Anthropic News Blog, March 2026.
Anthropic. “Claude Pricing.” Anthropic Pricing Page.
Anthropic. “Claude Desktop App.” Claude Download Page.
Microsoft. “Microsoft Copilot for Microsoft 365.” Microsoft 365 Copilot.
Google. “Gemini for Google Workspace.” Google Workspace AI.
OpenAI. “ChatGPT Desktop App.” OpenAI ChatGPT.

Disclaimer: This article is for informational purposes only and does not constitute investment advice. Product features, pricing, and availability may change. Always verify current details directly with Anthropic before making purchasing decisions.

April 4, 2026

Claude in 2026: Everything New in Anthropic’s Most Powerful AI Model Family

Summary

What this post covers: A comprehensive 2026 examination of the Claude ecosystem: the Opus/Sonnet/Haiku model family, Claude Code, extended thinking, MCP, the API/SDK, safety practices, and Claude’s position relative to GPT-4o, Gemini 2.5, Llama 4, and DeepSeek.

Key insights:

Claude Opus 4.6 currently leads composite benchmarks on coding (SWE-bench Verified), scientific reasoning (GPQA Diamond), and mathematics (MATH-500), placing Anthropic, rather than OpenAI or Google, at the frontier of reasoning quality in early 2026.
The three-tier structure is a cost-and-quality routing mechanism rather than a hierarchy: Sonnet 4.6 ($3/$15 per M tokens) is the appropriate default for most production workloads, with Opus reserved for difficult reasoning and Haiku 4.5 for high-volume routing or classification.
Claude Code is the most concrete differentiator: an agentic CLI/IDE tool that autonomously navigates codebases, edits multiple files, runs tests, and commits, rather than offering Copilot-style inline suggestions.
The Model Context Protocol (MCP) is becoming a de facto industry standard for connecting LLMs to tools and data sources, and is the integration layer on which most enterprise Claude deployments are now built.
No single “best” model exists: Claude leads on coding and reasoning, Gemini on context length and Google integration, Llama and DeepSeek on cost and openness, and GPT-4o on multimodal breadth. Selection should be governed by workload rather than by brand.

Main topics: Introduction, The Claude Model Family in 2026, Claude Code, Extended Thinking, Tool Use and Function Calling, Model Context Protocol, API and SDK, Safety and Alignment, Real-World Applications, Comparison with Competitors, Conclusion, References.

Introduction: Why Claude Matters More Than Ever

In January 2026, a research organization with fewer than 1,500 employees surpassed a major search-engine company and a firm previously valued at over a trillion dollars in what may be the most consequential AI benchmark sequence in recent memory. Anthropic’s Claude Opus 4.6 achieved the highest composite result yet recorded on SWE-bench Verified, GPQA Diamond, and MATH-500, and did so by a substantial margin. For the first time, a single model family delivered the best performance across coding, scientific reasoning, and mathematical problem-solving simultaneously.

This result is not merely a benchmark curiosity. It reflects a fundamental shift in how AI is built, deployed, and used by millions of developers, researchers, analysts, and businesses worldwide. Claude is no longer simply the “safety-focused alternative” to ChatGPT. By a range of measures it is currently the most capable large language model available, and Anthropic has constructed an ecosystem around it that extends well beyond a chatbot interface.

Developers who have not used the Claude API since 2024 are working with outdated assumptions. Investors tracking the AI landscape will benefit from understanding what Anthropic has built and where it is heading. Those who simply use AI tools daily will find that the Claude of early 2026 is a substantially different product from what existed even twelve months earlier.

This article provides a comprehensive guide to recent developments in the Claude ecosystem. It examines the full model family (Opus, Sonnet, and Haiku) and the appropriate context for each. It examines in detail Claude Code, Anthropic’s agentic coding tool that is reshaping how software is built. It explores extended thinking, tool use, the Model Context Protocol, the API and SDK, safety practices, real-world applications, and the position of Claude relative to GPT-4o, Gemini 2.5, Llama 4, and DeepSeek.

The following sections address both technical detail and the broader context.

Key Takeaway: Claude in 2026 is more than a chatbot. It is a model family (Opus, Sonnet, Haiku) supported by an integrated ecosystem that comprises a coding agent, an open integration protocol, extended reasoning capabilities, and enterprise-grade APIs. This guide covers each of these elements.

The Claude Model Family in 2026: Opus, Sonnet, and Haiku

Anthropic organizes its Claude models into three tiers, each designed for different use cases, budgets, and latency requirements. The tiers can be understood as comparable to choosing among a high-performance vehicle, a balanced sedan, and an efficient commuter: each is capable of reaching the destination, but the trade-offs between power, speed, and cost differ.

As of early 2026, the current generation is the 4.5/4.6 family, which represents Anthropic’s most advanced models to date. The following sections describe what each tier offers and the contexts in which it is appropriate.

Claude Opus 4.6: Anthropic’s Most Capable Model

Claude Opus 4.6 (model ID: claude-opus-4-6) is Anthropic’s flagship. It is the appropriate choice when a task demands the highest possible reasoning quality and when additional cost and latency are acceptable.

Opus 4.6 performs well on tasks that require multi-step reasoning: complex code architecture decisions, nuanced legal or financial document analysis, advanced mathematics, scientific research synthesis, and long-form writing that must maintain coherence across thousands of words. It is also the model powering the most advanced tier of Claude Code, where it autonomously navigates large codebases, writes tests, refactors modules, and commits changes.

What distinguishes Opus from its predecessors is not only raw capability but reliability. Earlier generations of large language models, including previous Claude versions, occasionally produced confidently incorrect answers on complex tasks. Opus 4.6 demonstrates a marked improvement in recognizing the limits of its knowledge, qualifying uncertain statements, and requesting clarification rather than guessing. This matters considerably in production environments where an AI hallucination can be costly.

The context window is 200,000 tokens, which corresponds to approximately 500 pages of text or an entire mid-sized codebase. With extended context options, certain configurations support up to 1 million tokens, allowing Opus to ingest and reason over substantial documents or repositories in a single conversation.

Tip: For applications in which accuracy on complex reasoning is mission-critical (for example, code review for a financial trading system or summarization of a 200-page legal contract), Opus 4.6 justifies its premium. For most other use cases, Sonnet is the more appropriate default.

Claude Sonnet 4.6: A Balanced Default

Claude Sonnet 4.6 (model ID: claude-sonnet-4-6) is the appropriate default model for most developers and businesses. It offers a balanced combination of capability and speed, performing within a few percentage points of Opus on most benchmarks while being substantially faster and less expensive.

Sonnet handles the majority of real-world tasks effectively: writing and debugging code, answering complex questions, generating content, analyzing data, and powering chatbots. It is the model Anthropic recommends for most API integrations, and it is the default in the Claude.ai web interface and mobile applications.

The principal advantage of Sonnet is its response latency. For interactive applications such as chat interfaces, coding assistants, and real-time analysis tools, the difference between Opus and Sonnet is observable. Sonnet typically responds two to four times more quickly, which substantially improves the user experience in tools where each response precedes the next action.

Sonnet 4.6 also shares the 200,000-token context window of its larger counterpart, so selecting the faster model does not sacrifice the ability to work with large documents or codebases.

Claude Haiku 4.5: Speed and Efficiency at Scale

Claude Haiku 4.5 (model ID: claude-haiku-4-5-20251001) is Anthropic’s fastest and most cost-effective model. It is designed for high-volume, latency-sensitive applications that require rapid, competent responses at minimal cost.

Haiku is well-suited to classification tasks, brief summarization, lightweight code generation, customer service chatbots, data extraction, and any scenario involving thousands or millions of API calls where cost control is important. Although it is the smallest model in the family, Haiku 4.5 is markedly capable and outperforms many competitors’ flagship models from the previous year.

One pattern that has become increasingly common is the use of Haiku as a routing layer: a fast, inexpensive model that classifies incoming requests and decides whether to handle them directly or escalate to Sonnet or Opus. This arrangement delivers Opus-level quality on difficult problems and Haiku-level costs on routine ones.

Key Takeaway: The three-tier model structure is not a “good, better, best” hierarchy. It is a mechanism for matching the appropriate model to the task at hand. Most teams use Sonnet as the default, escalate to Opus for difficult problems, and deploy Haiku for high-volume workloads.

Model Comparison Table

Feature	Opus 4.6	Sonnet 4.6	Haiku 4.5
Model ID	claude-opus-4-6	claude-sonnet-4-6	claude-haiku-4-5-20251001
Context Window	200K tokens (up to 1M)	200K tokens	200K tokens
Best For	Complex reasoning, research, advanced coding	General-purpose, most API integrations	High-volume, low-latency tasks
Input Price	$15 / M tokens	$3 / M tokens	$0.80 / M tokens
Output Price	$75 / M tokens	$15 / M tokens	$4 / M tokens
Speed	Moderate	Fast	Very Fast
Extended Thinking	Yes	Yes	Limited
Tool Use	Yes	Yes	Yes

Claude Code: An Agentic Tool for Writing, Testing, and Shipping

If the model family is the engine, Claude Code is the vehicle that places that capability directly in developers’ hands. Initially launched as a CLI tool in late 2024 and substantially expanded throughout 2025 and into 2026, Claude Code represents Anthropic’s vision of AI-assisted software development. It is not simply an autocomplete tool but a genuine coding agent that can autonomously navigate a codebase, write code, run tests, fix bugs, and commit changes.

Claude Code is fundamentally different from tools such as GitHub Copilot, which primarily offer inline suggestions as a developer types. Claude Code operates at a higher level of abstraction. A user describes the desired outcome in natural language (“add pagination to the user list API endpoint,” “refactor this module to use dependency injection,” “find and fix the bug causing the login timeout”), and Claude Code determines which files to read, what changes to make, how to test them, and how to commit the result.

Available Platforms

As of early 2026, Claude Code is available across a wide set of platforms:

CLI (Command Line Interface): The original and most capable form. It is installed with npm install -g @anthropic-ai/claude-code and invoked by running claude in any project directory. The CLI provides full access to all features, including custom slash commands, hooks, and MCP server connections.
Desktop App (Mac and Windows): A standalone application that wraps the CLI experience in a native desktop interface. It is appropriate for developers who prefer a graphical environment while retaining the agentic workflow.
Web App (claude.ai/code): A browser-based version that connects to repositories via GitHub. It is suitable for short tasks or for use away from the primary development machine.
VS Code Extension: Deep integration with the most widely used code editor. Claude Code appears as a sidebar panel and can access the workspace, terminal, and source control.
JetBrains Extension: Similar integration for IntelliJ IDEA, PyCharm, WebStorm, and other JetBrains IDEs. It supports the same agentic workflows as the CLI.

Key Features

Agentic Code Editing. Claude Code does not merely suggest changes; it implements them. When given a task, it reads the relevant files, plans an approach, writes or modifies code across multiple files, and can run the test suite to verify that the changes are correct. It operates in a loop: make changes, run tests, address any failures, and repeat until the task is complete.

Custom Slash Commands. Teams can define reusable commands in .claude/commands/ directories. For example, a team might create a /deploy command that runs the deployment pipeline, a /review command that performs code review against the team’s style guide, or a /write-post command that orchestrates blog-post creation and publishing. These commands are version-controlled alongside the code, ensuring that the entire team shares the same workflows.

Hooks System. Claude Code supports pre- and post-execution hooks that run before or after specific actions. Hooks can enforce coding standards, run linters, execute security checks, or trigger notifications. This integrates Claude Code into the CI/CD pipeline rather than leaving it as a standalone tool.

MCP Server Integration. Through the Model Context Protocol (discussed in detail below), Claude Code can connect to external tools and data sources, including databases, APIs, documentation servers, and issue trackers. Claude Code can therefore look up a Jira ticket, inspect a database schema, read API documentation, and then write code that integrates the resulting context.

Git Integration. Claude Code supports Git natively. It can create branches, stage changes, write commit messages, and create pull requests. Many developers now use Claude Code as their primary interface for Git operations, describing the intended commit in natural language and allowing Claude to handle the details.

# Install Claude Code
npm install -g @anthropic-ai/claude-code

# Start a session in your project directory
cd my-project
claude

# Example interactions inside Claude Code
> Add comprehensive unit tests for the authentication module
> Refactor the database layer to use connection pooling
> Find the bug causing the 500 error on /api/users and fix it
> Create a new REST endpoint for product search with pagination

Claude Code Compared to Copilot, Cursor, and Windsurf

The AI coding-tool market is crowded, and each product adopts a distinct approach. The following table compares Claude Code to the principal alternatives.

Feature	Claude Code	GitHub Copilot	Cursor	Windsurf
Primary Mode	Agentic (autonomous)	Inline suggestions + chat	AI-native editor	Flow-state IDE
Underlying Models	Claude (Opus, Sonnet)	GPT-4o, Claude, Gemini	Multi-model (user choice)	Proprietary + GPT-4o
Multi-File Editing	Excellent	Good (Workspace mode)	Excellent (Composer)	Good
Terminal Integration	Native (CLI-first)	Limited	Yes	Yes
Custom Commands	Yes (slash commands)	Limited	Yes (rules)	Limited
MCP Support	Full native support	Partial	Yes	Limited
Autonomous Testing	Yes (runs tests, fixes)	No	Partial	Partial
Price (Pro Tier)	$20/month (Claude Pro)	$19/month (Pro)	$20/month (Pro)	$15/month (Pro)

The fundamental difference is philosophical. GitHub Copilot is designed to assist a developer who remains at the controls; it is a co-pilot in the strict sense. Cursor is an AI-native editor that blurs the line between writing code manually and having AI write it. Claude Code is an autonomous agent to which tasks are delegated. The developer specifies what to build, and Claude Code builds it.

In practice, many developers use multiple tools. A common pattern uses Claude Code for large-scale tasks (new features, refactoring, complex bug fixes) and Copilot or Cursor for the moment-to-moment inline coding experience. The tools are not mutually exclusive.

Tip: Users new to AI coding tools can begin with Claude Code’s web version at claude.ai/code. It requires no installation and provides familiarity with the agentic workflow. The CLI can then be installed once the full experience is appropriate.

Extended Thinking: How Claude Reasons Through Difficult Problems

One of Claude’s most capable and underappreciated features is extended thinking, which allows the model to devote additional time to reasoning through a problem before generating a response. This is not merely a matter of taking longer to answer. It is a fundamentally different mode of operation that produces substantially improved results on complex tasks.

When extended thinking is enabled, Claude generates an internal chain of thought before producing its visible response. This chain of thought can extend to thousands of tokens of internal reasoning. It permits Claude to decompose complex problems into steps, consider multiple approaches, verify its own work, and identify errors before presenting a final answer.

The impact on quality is considerable. On mathematical reasoning benchmarks, extended thinking improves Claude’s accuracy by 15-30 percentage points on the most difficult problems. On coding tasks, it reduces bugs in first-attempt solutions by roughly 40%. On analytical tasks that require multi-step logic, such as financial modelling or legal analysis, the improvements are even more pronounced.

Extended thinking operates as follows through the API:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000  # Allow up to 10K tokens of thinking
    },
    messages=[
        {
            "role": "user",
            "content": "Analyze the time complexity of this algorithm and suggest optimizations..."
        }
    ]
)

# The response includes both thinking and text blocks
for block in response.content:
    if block.type == "thinking":
        print(f"Internal reasoning: {block.thinking}")
    elif block.type == "text":
        print(f"Response: {block.text}")

The budget_tokens parameter controls the volume of “thinking” Claude is permitted. A higher budget yields more thorough reasoning but slower responses and higher costs. Simple questions do not require extended thinking. For complex multi-step problems (debugging a race condition, optimizing a database query, analyzing a complex contract), a generous thinking budget can be the difference between a mediocre answer and an excellent one.

Caution: Extended thinking tokens are billed at the same rate as output tokens. A 10,000-token thinking budget on Opus 4.6 costs up to $0.75 per request. The feature should be applied strategically rather than on every API call.

In Claude Code, extended thinking is invoked automatically when the model encounters complex tasks. No manual configuration is required; the system allocates a thinking budget based on the complexity of the request. This is one reason that Claude Code can autonomously resolve multi-file bugs that simpler tools cannot address.

Tool Use and Function Calling

Large language models are powerful, but they have fundamental limitations. They cannot check current weather, look up a stock price, query a database, or send an email on their own. Tool use (also called function calling) bridges this gap by allowing Claude to invoke external functions defined by the developer.

When tool definitions are provided, Claude can decide when to call each tool, what arguments to pass, and how to incorporate the results into its response. This transforms Claude from a text generator into an agent capable of taking actions in external systems.

A practical example is the provision of stock-price lookups:

import anthropic
import json

client = anthropic.Anthropic()

# Define the tools Claude can use
tools = [
    {
        "name": "get_stock_price",
        "description": "Get the current stock price for a given ticker symbol",
        "input_schema": {
            "type": "object",
            "properties": {
                "ticker": {
                    "type": "string",
                    "description": "The stock ticker symbol (e.g., AAPL, GOOGL)"
                }
            },
            "required": ["ticker"]
        }
    }
]

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    tools=tools,
    messages=[
        {"role": "user", "content": "What's the current price of NVIDIA stock?"}
    ]
)

# Claude will respond with a tool_use block
for block in response.content:
    if block.type == "tool_use":
        print(f"Claude wants to call: {block.name}")
        print(f"With arguments: {json.dumps(block.input)}")
        # You would execute the function and send the result back

Tool use is not restricted to simple lookups. Advanced patterns provide Claude with access to a full suite of tools, including database query tools, file-system tools, API-calling tools, and web-search tools, and permit Claude to orchestrate complex multi-step workflows. For example, a developer might ask Claude to “find all customers who signed up last month, check which ones have not made a purchase, and draft a personalized re-engagement email for each.” Claude would use multiple tools in sequence, making decisions at each step based on the data retrieved.

This is how Claude Code operates internally. When Claude Code is asked to “fix the failing tests,” it uses tools to read files, run shell commands, edit code, and execute tests, with all of these actions orchestrated by the model’s reasoning capabilities.

Model Context Protocol: An Open Standard for AI Integration

If tool use is the mechanism by which Claude interacts with external systems, the Model Context Protocol (MCP) is the standard that makes those interactions universal and interoperable. Developed by Anthropic and released as an open standard, MCP is among the most important and most underappreciated developments in the AI ecosystem.

The problem that MCP addresses is straightforward but consequential. Every AI application today must connect to external data sources and tools: databases, file systems, APIs, SaaS applications, development tools, and others. Without a standard protocol, every integration must be custom-built. Integrating Claude with a PostgreSQL database requires a custom tool. Reading from Google Drive requires another. Accessing Jira tickets requires a third. This approach does not scale.

MCP provides a standardized protocol for AI-to-tool communication. It functions as a USB equivalent for AI integrations. Just as USB allowed any peripheral to be connected to any computer without custom drivers, MCP allows any data source or tool to be connected to any AI model without custom integration code.

The protocol defines three types of capabilities that an MCP server can offer:

Tools: Functions the AI can call (query a database, create a file, send a message)
Resources: Data sources the AI can read (documents, database records, API responses)
Prompts: Predefined templates for common interactions

An MCP configuration in Claude Code has the following form:

// .claude/mcp.json in your project root
{
  "mcpServers": {
    "postgres": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-postgres"],
      "env": {
        "DATABASE_URL": "postgresql://user:pass@localhost/mydb"
      }
    },
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_TOKEN": "ghp_..."
      }
    },
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/docs"]
    }
  }
}

With this configuration, Claude Code can query a PostgreSQL database directly to understand the schema before writing code, examine GitHub issues and pull requests for context, and read documentation files, without requiring any of this information to be copied into the conversation manually.

The MCP ecosystem has expanded rapidly. As of early 2026, official and community MCP servers are available for PostgreSQL, MySQL, MongoDB, Redis, GitHub, GitLab, Jira, Confluence, Slack, Google Drive, AWS services, Kubernetes, Docker, and dozens of additional systems. Many organizations are building custom MCP servers for their internal tools and APIs.

Key Takeaway: MCP is to AI integrations what REST APIs were to web services: a standardized mechanism that allows different systems to communicate. For organizations building AI-powered applications, investing time in understanding and adopting MCP is likely to yield returns as the ecosystem matures.

API and SDK: Building with Claude

Whether the project is a simple chatbot or a complex multi-agent system, the Anthropic API and its official SDKs serve as the entry point. The API has matured substantially since its early releases, and the developer experience in 2026 is refined and well-documented.

Python SDK Examples

The Anthropic Python SDK is the most widely used means of integrating Claude into applications. The following complete example demonstrates the principal features:

# Install: pip install anthropic
import anthropic

client = anthropic.Anthropic()  # Reads ANTHROPIC_API_KEY from environment

# Basic message
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ]
)
print(response.content[0].text)

# System prompt + conversation history
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    system="You are a senior Python developer. Be concise and include code examples.",
    messages=[
        {"role": "user", "content": "How do I implement a binary search tree?"},
        {"role": "assistant", "content": "Here's a clean BST implementation..."},
        {"role": "user", "content": "Now add a method to find the k-th smallest element."}
    ]
)

# Streaming for real-time responses
with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    messages=[
        {"role": "user", "content": "Write a comprehensive guide to Python decorators."}
    ]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

The TypeScript/JavaScript SDK follows a near-identical structure:

// Install: npm install @anthropic-ai/sdk
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 1024,
  messages: [
    { role: "user", content: "Explain the JavaScript event loop." }
  ]
});

console.log(response.content[0].text);

Both SDKs support all Claude features: tool use, extended thinking, streaming, image and PDF input, system prompts, and batch processing.

Pricing Comparison

Understanding pricing is important for organizations building production applications. The following table compares Claude pricing with that of the principal competitors:

Model	Provider	Input (per M tokens)	Output (per M tokens)	Context Window
Claude Opus 4.6	Anthropic	$15.00	$75.00	200K (up to 1M)
Claude Sonnet 4.6	Anthropic	$3.00	$15.00	200K
Claude Haiku 4.5	Anthropic	$0.80	$4.00	200K
GPT-4o	OpenAI	$2.50	$10.00	128K
GPT-4.5	OpenAI	$75.00	$150.00	128K
Gemini 2.5 Pro	Google	$1.25	$10.00	1M
Gemini 2.5 Flash	Google	$0.15	$0.60	1M
Llama 4 Maverick	Meta (open source)	Free (self-host) / varies	Free (self-host) / varies	1M
DeepSeek V3	DeepSeek	$0.27	$1.10	128K

Key Takeaway: Claude Sonnet 4.6 offers the most favourable quality-to-price ratio for most use cases. GPT-4o is slightly less expensive for input tokens but has a smaller context window. Gemini 2.5 Flash and DeepSeek V3 are the budget options, although they trail substantially in reasoning quality. For maximum capability, Opus 4.6 and GPT-4.5 are the premium choices, with Opus generally offering stronger coding and reasoning performance at less than half the price.

Safety and Alignment: Anthropic’s Approach

Anthropic was founded specifically to build safe AI. This statement is not a marketing tagline but the organization’s core mission, and it shapes every aspect of how Claude is developed and deployed. Understanding Anthropic’s safety approach is important because it directly affects how Claude behaves, what it will and will not do, and why it sometimes differs in character from competing models.

Constitutional AI (CAI) is Anthropic’s foundational alignment technique. Rather than relying solely on human feedback to train the model (the RLHF approach used by OpenAI and others), Constitutional AI uses a set of principles, termed a “constitution,” to guide the model’s behaviour. During training, Claude evaluates its own responses against these principles and revises them accordingly. This produces a model that is helpful, harmless, and honest without requiring human labellers to review every training example.

The practical effect is that Claude is more careful and nuanced than some competitors in sensitive areas. It declines clearly harmful requests, but it also engages thoughtfully with complex ethical questions rather than refusing them outright. Anthropic has worked specifically to avoid the “alignment tax”, the perception that safer models are less useful. Claude is designed to be both safer and more capable.

Responsible Scaling Policy (RSP) is Anthropic’s framework for deciding when and how to deploy more powerful models. The RSP defines “AI Safety Levels” (ASL), analogous to biosafety levels, that specify the safety evaluations and security measures required before a model of a given capability level can be deployed. As models become more capable, they must pass increasingly rigorous safety evaluations.

This matters for users and developers because Claude’s capabilities are not only technically constrained but also institutionally constrained. Anthropic will not release a model that passes dangerous capability thresholds without corresponding safety measures, even if competitors release less rigorously tested models first.

What this means in practice:

Claude will not help create malware, generate CSAM, or assist with weapons development
Claude will engage with nuanced topics (politics, ethics, sensitive history) thoughtfully rather than refusing outright
Claude will acknowledge uncertainty rather than fabricating information
Claude will follow system prompts from developers while maintaining core safety boundaries
Enterprise customers get additional controls for content filtering and usage policies

Tip: Developers building customer-facing applications with Claude should review Anthropic’s system prompt documentation carefully. A well-constructed system prompt provides substantial control over Claude’s tone, behaviour, and boundaries within the safety constraints.

Real-World Applications: How Teams Are Using Claude

Benchmarks and feature lists indicate what a model can do in theory. Real-world deployments show what it does in practice. The following sections describe how companies and developers are using Claude across domains in 2026.

Software Development. This is Claude’s strongest domain. Companies ranging from startups to Fortune 500 enterprises use Claude Code as part of their development workflow. GitLab has reported that teams using Claude Code experienced a 40% reduction in time-to-merge for pull requests. Replit integrated Claude as its primary AI backend, supporting code generation for millions of users. Individual developers report that Claude Code handles approximately 60-80% of routine coding tasks (writing boilerplate, implementing standard patterns, writing tests, fixing bugs), allowing them to focus on architecture and design decisions.

Research and Analysis. Academic researchers use Claude to synthesize literature, analyze datasets, and draft papers. Investment analysts use it to process earnings calls, SEC filings, and market data. Legal professionals use it to review contracts and identify relevant precedents. The principal advantage Claude offers in these settings is its large context window, which allows the ingestion of hundreds of pages of source material within a single conversation.

Content Creation. Marketing teams use Claude to draft blog posts, social-media content, email campaigns, and product documentation. Unlike earlier AI writing tools that produced generic, stilted prose, Claude’s output is conversational, well-structured, and adaptable to different tones and audiences. Many content teams use Claude as a first-draft generator and then edit and refine the output rather than writing from scratch.

Customer Service. Companies deploy Claude-powered chatbots that handle customer inquiries with substantially more nuance than traditional rule-based systems. Claude understands context, handles follow-up questions, escalates appropriately, and maintains a consistent brand voice. Anthropic offers enterprise features specifically for this use case, including content filtering, usage analytics, and integration with existing customer-service platforms.

Data Engineering and Analytics. Claude performs well at writing SQL queries, building data pipelines, creating visualizations, and explaining complex datasets. Data analysts who find Python or SQL challenging can describe their requirements in natural language and obtain working code. When combined with MCP servers that connect directly to databases, Claude can query, analyze, and summarize data end-to-end.

Education. Teachers use Claude to create lesson plans, generate practice problems, and develop assessment rubrics. Students use it as a tutor that can explain concepts, work through problems step by step, and adapt to their level of understanding. Anthropic has partnered with several educational institutions to develop AI literacy programmes that teach students how to use AI tools effectively and critically.

Claude Compared with GPT-4o, Gemini 2.5, and Other Models

The AI landscape in early 2026 is the most competitive it has been. Four major participants (Anthropic, OpenAI, Google, and Meta), together with strong challengers such as DeepSeek, are advancing the frontier. The following section provides a measured assessment of Claude’s position relative to the competition.

Capability	Claude (Opus 4.6)	GPT-4o	Gemini 2.5 Pro	Llama 4 Maverick	DeepSeek V3
Coding	Excellent	Very Good	Very Good	Good	Very Good
Reasoning	Excellent	Very Good	Excellent	Good	Good
Long Context	Very Good (200K-1M)	Good (128K)	Excellent (1M)	Excellent (1M)	Good (128K)
Multimodal	Good (images, PDFs)	Excellent (images, audio, video)	Excellent (images, audio, video)	Good (images)	Good (images)
Instruction Following	Excellent	Very Good	Good	Fair	Good
Safety	Industry Leading	Very Good	Good	Variable	Fair
Price/Performance	Very Good (Sonnet tier)	Very Good	Excellent (Flash tier)	Excellent (open source)	Excellent
Open Source	No	No	No	Yes	Yes

Claude and GPT-4o (OpenAI). This is the comparison most readers consider central. GPT-4o remains a strong all-around model with substantial multimodal capabilities; it can process images, audio, and video natively, whereas Claude is currently limited to images and PDFs. GPT-4o also benefits from the substantial ChatGPT user base and ecosystem. However, Claude consistently outperforms GPT-4o on coding benchmarks (SWE-bench, HumanEval+), complex reasoning tasks (GPQA), and instruction following. Claude’s larger context window (200K versus 128K) is a meaningful advantage in document-heavy workflows. OpenAI’s GPT-4.5 narrows the reasoning gap but at substantially higher cost.

Claude and Gemini 2.5 Pro (Google). Gemini’s principal advantage is its native 1-million-token context window and its deep integration with Google’s ecosystem (Search, Workspace, Cloud). For tasks that require processing very large volumes of data in a single pass, Gemini is difficult to surpass. Google also offers Gemini 2.5 Flash at aggressive pricing, making it attractive for cost-sensitive applications. On pure reasoning and coding quality, however, Claude Opus and Sonnet retain an advantage. Gemini also tends to be less reliable at following complex multi-step instructions.

Claude and Llama 4 (Meta). Llama 4 represents a substantial advance for open-source AI. The Maverick variant, a mixture-of-experts model, offers strong performance at a fraction of the cost when self-hosted. For organizations with capable ML infrastructure teams and strict data-residency requirements, Llama is compelling. However, Llama models generally trail the closed-source leaders on the most demanding reasoning and coding tasks, and operating them requires considerable infrastructure investment.

Claude and DeepSeek V3. DeepSeek has been the surprise development of 2025-2026. The V3 model offers performance close to GPT-4o at a fraction of the cost, and it has been released as open source. DeepSeek is particularly popular in price-sensitive markets and for developers who wish to self-host. The trade-offs are weaker instruction following, less reliable safety guardrails, and substantially less capability on the most difficult reasoning tasks compared to Claude or GPT-4o.

Caution: AI benchmarks change rapidly. The specific figures cited here may have shifted by the time of reading. The structural differences (Anthropic’s safety focus, Google’s ecosystem integration, Meta’s open-source approach, DeepSeek’s cost efficiency) are more durable than any particular benchmark score.

Conclusion

The Claude ecosystem in 2026 represents not merely incremental improvement but the maturation of AI from a novelty into genuine infrastructure. The three-tier model family provides developers with precise control over the capability-cost-speed trade-off. Claude Code transforms how software is built by offering genuine agentic coding rather than enhanced autocomplete. Extended thinking delivers measurably improved results on difficult problems. The Model Context Protocol is creating a standardized integration layer that the broader industry is adopting. Anthropic’s sustained focus on safety means that as these models become more capable, they also become more trustworthy.

For developers, the most consequential action available is to apply Claude Code to a real project rather than a toy example. The experience of providing a natural-language description of a complex task and observing Claude navigate a codebase, write code across multiple files, run tests, and resolve issues autonomously is qualitatively different from previous AI tooling. It does not replace developer skill; it amplifies it.

For organizations building applications, the Anthropic API with Claude Sonnet 4.6 as the default model offers the most favourable balance of quality, speed, and cost currently available. Extended thinking can be added for difficult problems, tool use for interaction with external systems, and MCP for seamless integration with data sources.

For those evaluating the competitive landscape, no single “best” AI model exists; there are only trade-offs. Claude leads on coding and reasoning. Gemini leads on context length and ecosystem integration. Llama and DeepSeek lead on cost and openness. GPT-4o leads on multimodal breadth. The appropriate choice depends on the specific use case, budget, and priorities.

What is clear is that the era of AI as a curiosity has passed. These are substantive tools used by capable teams to build substantive products. Claude, with its considered balance of capability and safety, sits at the centre of that transformation.

The question is no longer whether to use AI in a workflow but how to use it most effectively. In 2026, Claude provides more avenues for that answer than at any previous point.

References and Further Reading

Anthropic Documentation: docs.anthropic.com,Official API reference, guides, and tutorials
Claude Code: Claude Code documentation—Installation, features, and usage guides
Model Context Protocol: modelcontextprotocol.io—MCP specification and server directory
Anthropic Research: anthropic.com/research,Published papers on Constitutional AI, safety, and alignment
Claude Model Card: Model overview and specifications
Anthropic Python SDK: github.com/anthropics/anthropic-sdk-python
Anthropic TypeScript SDK: github.com/anthropics/anthropic-sdk-typescript
SWE-bench Verified Leaderboard: swebench.com—Coding benchmark results
Chatbot Arena: lmarena.ai—Community-driven model comparison
Anthropic’s Responsible Scaling Policy: anthropic.com RSP overview

This article is for informational purposes only and does not constitute investment, financial, or professional advice. AI capabilities, pricing, and benchmarks change frequently, verify current details at the official documentation links above.

April 4, 2026