Week 27 · Space GIS Architect

Production pipelines: S3 → Lambda → EventBridge → DDB

Production geospatial isn't a notebook — it's a pipeline. This week is the real AWS architecture LaunchDetect runs in production: S3 ingest, Lambda compute, EventBridge schedule, DynamoDB state.

Learning objectives

Primer

Production geospatial isn't a notebook. It's a pipeline: data lands somewhere, code runs, results land somewhere else, the pipeline is monitored, alerts fire when something breaks. This week is the AWS architecture that LaunchDetect actually runs in production — minus a few enterprise-specific layers.

The core stack: S3 + Lambda + EventBridge + DynamoDB

Four services, each doing one thing well:

The detection pipeline

LaunchDetect's flow:

  1. NOAA writes a new GOES Band 7 mesoscale NetCDF to s3://noaa-goes18/....
  2. NOAA's bucket emits an S3 event; an EventBridge rule fans it out to LaunchDetect's scorer Lambda.
  3. The scorer Lambda fetches the NetCDF (range-requested for just the geographic window of interest), converts radiance to brightness temperature, threshold-detects hotspots, applies parallax correction, runs the Layer 3 ML classifier, and writes detection candidates to DynamoDB.
  4. A DynamoDB stream triggers a publisher Lambda that decides whether the candidate is a real launch (vs fire / glint / industrial source), writes the public detection JSON to S3, and emits a "launch detected" event to EventBridge.
  5. Subscribers (web dashboard, push-notification service, blog generator) receive the event and update their own state.

Total latency from NOAA file landing to a push notification on a user's phone: typically 30–90 seconds.

DynamoDB partition key design

DynamoDB's #1 footgun is hot partitions. Every item has a partition key (PK) and optionally a sort key (SK). DynamoDB hashes PK and routes the item to a physical partition. If 90% of your writes go to a single PK, you bottleneck on that one partition's WCU/RCU limit (3,000 reads / 1,000 writes per second).

Good PK choices spread writes evenly across partitions. For launch detections, a natural PK is DETECTION#{ulid} — ULIDs are time-ordered but have enough entropy that they distribute evenly. Bad PK: DATE#{yyyy-mm-dd} — all today's writes go to one partition.

Lambda cold starts

When Lambda receives a request and has no warm container available, it cold-starts: provision a sandbox, download the function code, initialize the runtime, run the handler. Cold start can be 200 ms (Python 3.13 lightweight) to 3+ seconds (heavy Java / large dependency tree).

For latency-sensitive request paths (API endpoints), cold start matters and you mitigate with: provisioned concurrency, smaller deployment packages, lighter runtimes, lazy imports. For event-driven batch (which is most space-GIS pipelines), cold start is fine — a launch detection that takes 90 seconds doesn't care about 500 ms cold start.

AWS CDK

AWS CDK (Cloud Development Kit) is infrastructure-as-code in real programming languages — TypeScript, Python, Java, Go. You write classes that instantiate AWS resources; CDK synthesizes them to CloudFormation templates; CloudFormation deploys them.

import * as cdk from 'aws-cdk-lib';
import { Bucket } from 'aws-cdk-lib/aws-s3';
import { Function, Runtime, Code } from 'aws-cdk-lib/aws-lambda';
import { S3EventSource } from 'aws-cdk-lib/aws-lambda-event-sources';
import { Table, AttributeType } from 'aws-cdk-lib/aws-dynamodb';

export class DetectionStack extends cdk.Stack {
  constructor(scope: cdk.App, id: string) {
    super(scope, id);
    const bucket = new Bucket(this, 'IngestBucket');
    const table = new Table(this, 'Detections', {
      partitionKey: { name: 'pk', type: AttributeType.STRING },
      sortKey: { name: 'sk', type: AttributeType.STRING }
    });
    const scorer = new Function(this, 'Scorer', {
      runtime: Runtime.PYTHON_3_13,
      handler: 'handler.handler',
      code: Code.fromAsset('lambda/scorer')
    });
    scorer.addEventSource(new S3EventSource(bucket, {
      events: [s3.EventType.OBJECT_CREATED]
    }));
    table.grantWriteData(scorer);
  }
}

The lab

You'll build a mini detection pipeline: a Lambda triggered by S3 PutObject, that reads a small GOES NetCDF, threshold-detects hotspots, and writes detection records to a DynamoDB table. Deploy with AWS CDK. This is the architecture of LaunchDetect's artgis-cluster-scorer Lambda in production, minus the ML scoring layer and parallax correction.

Hands-on lab: Mini detection pipeline

Build a Lambda triggered by S3 PutObject. The Lambda reads a small GOES NetCDF, threshold-detects hotspots, writes records to a DynamoDB table. Deploy with AWS CDK.

Quiz

Test yourself. Answer key on the certificate-track page (Gold-tier feature: progress tracking and auto-grading).

Q1. S3 → Lambda trigger is configured via:
  1. S3 event notification to Lambda function ARN
  2. Polling
  3. SNS only
  4. EventBridge only
Q2. EventBridge is best for:
  1. Decoupled event routing, scheduled rules, cross-service orchestration
  2. Database
  3. Just cron
  4. File storage
Q3. DynamoDB partition key choice impacts:
  1. Distribution and hot-partition behavior
  2. Cost only
  3. Nothing
  4. Display order
Q4. Lambda cold start matters for:
  1. Latency-sensitive endpoints; less for event-driven batch
  2. Always
  3. Never
  4. Only TypeScript
Q5. AWS CDK is:
  1. Infrastructure-as-code in TypeScript / Python / Java / Go
  2. Just a CLI
  3. A database
  4. A managed service