Production pipelines: S3 → Lambda → EventBridge → DDB

Production geospatial isn't a notebook — it's a pipeline. This week is the real AWS architecture LaunchDetect runs in production: S3 ingest, Lambda compute, EventBridge schedule, DynamoDB state.

Learning objectives

Wire S3 PutObject events to Lambda triggers
Use EventBridge for scheduled and event-driven orchestration
Persist detection records to DynamoDB
Reason about cost and latency in serverless geospatial pipelines

Primer

Production geospatial isn't a notebook. It's a pipeline: data lands somewhere, code runs, results land somewhere else, the pipeline is monitored, alerts fire when something breaks. This week is the AWS architecture that LaunchDetect actually runs in production — minus a few enterprise-specific layers.

The core stack: S3 + Lambda + EventBridge + DynamoDB

Four services, each doing one thing well:

S3 — object storage. Every GOES NetCDF, every detection JSON, every static page lives in S3. Cheap (~$0.023/GB/month), durable (11 9s), event-emitting.
Lambda — serverless compute. Function as a service. You pay for invocations and execution duration; no servers to manage. Cold start is the main pitfall.
EventBridge — event bus. Routes events between AWS services and your own consumers. Replaces ad-hoc SNS/SQS/CloudWatch Events combinations.
DynamoDB — NoSQL key-value/document store. Single-digit-millisecond reads. Pay for storage + read/write units.

The detection pipeline

LaunchDetect's flow:

NOAA writes a new GOES Band 7 mesoscale NetCDF to s3://noaa-goes18/....
NOAA's bucket emits an S3 event; an EventBridge rule fans it out to LaunchDetect's scorer Lambda.
The scorer Lambda fetches the NetCDF (range-requested for just the geographic window of interest), converts radiance to brightness temperature, threshold-detects hotspots, applies parallax correction, runs the Layer 3 ML classifier, and writes detection candidates to DynamoDB.
A DynamoDB stream triggers a publisher Lambda that decides whether the candidate is a real launch (vs fire / glint / industrial source), writes the public detection JSON to S3, and emits a "launch detected" event to EventBridge.
Subscribers (web dashboard, push-notification service, blog generator) receive the event and update their own state.

Total latency from NOAA file landing to a push notification on a user's phone: typically 30–90 seconds.

DynamoDB partition key design

DynamoDB's #1 footgun is hot partitions. Every item has a partition key (PK) and optionally a sort key (SK). DynamoDB hashes PK and routes the item to a physical partition. If 90% of your writes go to a single PK, you bottleneck on that one partition's WCU/RCU limit (3,000 reads / 1,000 writes per second).

Good PK choices spread writes evenly across partitions. For launch detections, a natural PK is DETECTION#{ulid} — ULIDs are time-ordered but have enough entropy that they distribute evenly. Bad PK: DATE#{yyyy-mm-dd} — all today's writes go to one partition.

Lambda cold starts

When Lambda receives a request and has no warm container available, it cold-starts: provision a sandbox, download the function code, initialize the runtime, run the handler. Cold start can be 200 ms (Python 3.13 lightweight) to 3+ seconds (heavy Java / large dependency tree).

For latency-sensitive request paths (API endpoints), cold start matters and you mitigate with: provisioned concurrency, smaller deployment packages, lighter runtimes, lazy imports. For event-driven batch (which is most space-GIS pipelines), cold start is fine — a launch detection that takes 90 seconds doesn't care about 500 ms cold start.

AWS CDK

AWS CDK (Cloud Development Kit) is infrastructure-as-code in real programming languages — TypeScript, Python, Java, Go. You write classes that instantiate AWS resources; CDK synthesizes them to CloudFormation templates; CloudFormation deploys them.

import * as cdk from 'aws-cdk-lib';
import { Bucket } from 'aws-cdk-lib/aws-s3';
import { Function, Runtime, Code } from 'aws-cdk-lib/aws-lambda';
import { S3EventSource } from 'aws-cdk-lib/aws-lambda-event-sources';
import { Table, AttributeType } from 'aws-cdk-lib/aws-dynamodb';

export class DetectionStack extends cdk.Stack {
  constructor(scope: cdk.App, id: string) {
    super(scope, id);
    const bucket = new Bucket(this, 'IngestBucket');
    const table = new Table(this, 'Detections', {
      partitionKey: { name: 'pk', type: AttributeType.STRING },
      sortKey: { name: 'sk', type: AttributeType.STRING }
    });
    const scorer = new Function(this, 'Scorer', {
      runtime: Runtime.PYTHON_3_13,
      handler: 'handler.handler',
      code: Code.fromAsset('lambda/scorer')
    });
    scorer.addEventSource(new S3EventSource(bucket, {
      events: [s3.EventType.OBJECT_CREATED]
    }));
    table.grantWriteData(scorer);
  }
}

The lab

You'll build a mini detection pipeline: a Lambda triggered by S3 PutObject, that reads a small GOES NetCDF, threshold-detects hotspots, and writes detection records to a DynamoDB table. Deploy with AWS CDK. This is the architecture of LaunchDetect's artgis-cluster-scorer Lambda in production, minus the ML scoring layer and parallax correction.

Hands-on lab: Mini detection pipeline

Build a Lambda triggered by S3 PutObject. The Lambda reads a small GOES NetCDF, threshold-detects hotspots, writes records to a DynamoDB table. Deploy with AWS CDK.

Open in Colab Download .ipynb

Quiz

Test yourself. Answer key on the certificate-track page (Gold-tier feature: progress tracking and auto-grading).

Q1. S3 → Lambda trigger is configured via:

S3 event notification to Lambda function ARN
Polling
SNS only
EventBridge only

Q2. EventBridge is best for:

Decoupled event routing, scheduled rules, cross-service orchestration
Database
Just cron
File storage

Q3. DynamoDB partition key choice impacts:

Distribution and hot-partition behavior
Cost only
Nothing
Display order

Q4. Lambda cold start matters for:

Latency-sensitive endpoints; less for event-driven batch
Always
Never
Only TypeScript

Q5. AWS CDK is:

Infrastructure-as-code in TypeScript / Python / Java / Go
Just a CLI
A database
A managed service