Week 22 · Space GIS Architect

ML for satellite imagery: CNNs and U-Net segmentation

Deep learning has rewritten remote sensing. CNNs (object detection) and U-Nets (semantic segmentation) are now standard. This week you train one on real GOES data.

Learning objectives

Train a CNN for object detection in raster imagery
Build a U-Net for semantic segmentation of clouds / plumes / fires
Generate training data via thresholding + manual labels
Evaluate with IoU and confusion matrices

Primer

Deep learning has rewritten the playbook for satellite imagery analysis over the past decade. Convolutional neural networks now do object detection, semantic segmentation, super-resolution, and change detection at production scale across every major Earth-observation platform. This week is the practical primer: when to use deep learning vs threshold rules, the U-Net architecture, and how to train one on real GOES data.

When deep learning beats thresholding

Threshold rules (Week 14's Band 7 > 320 K) work when the discriminator is a single scalar feature. They break down when:

The discriminator is spatial-contextual (a plume looks different from a wildfire in shape and spatial neighborhood, not just brightness).
You need probabilistic output (confidence scores) for downstream cost-of-error decisions.
You have many labeled examples and want a single classifier that captures complex patterns.

Threshold rules are great for fast, explainable, debuggable baseline detection. Deep learning shines for the next layer: scoring, classification, and segmentation refinement.

The U-Net architecture

U-Net (Ronneberger et al. 2015) is the workhorse for image segmentation in remote sensing. It's an encoder-decoder with skip connections:

Encoder — successive 3x3 convolutions + 2x2 max-pool, halving the spatial dimensions and doubling the channel count at each level. By the bottleneck, the feature map is small but channel-rich.
Decoder — successive 2x2 transpose convolutions + 3x3 convolutions, doubling the spatial dimensions and halving channels. Reconstructs the original resolution.
Skip connections — at each decoder level, concatenate the corresponding encoder feature map. This preserves fine-grained spatial detail that would otherwise be lost in the bottleneck.

The output is a same-size map of per-pixel class probabilities. For plume segmentation, the classes are {background, plume}; for multi-class fire/plume/cloud, expand accordingly.

import torch.nn as nn

class UNet(nn.Module):
    def __init__(self, in_ch=1, out_ch=1, n_features=32):
        super().__init__()
        # ... 4 down blocks + bottleneck + 4 up blocks ...
        # Each block: Conv3x3 → BatchNorm → ReLU → Conv3x3 → BatchNorm → ReLU
        # Down: Conv block + MaxPool2x2
        # Up: ConvTranspose2x2 + concat with skip + Conv block

Weak supervision

The training-data problem: who hand-labels rocket plumes in tens of thousands of GOES frames? Nobody. The trick is weak supervision — generate the training labels programmatically.

For plumes: run Week 14's threshold detector + Week 20's morphology cleanup over a year of GOES frames around known launches. Cross-check against the published launch schedule. Use those pixel masks as training labels. The labels are noisy (some false positives, some false negatives), but with enough volume the U-Net learns to denoise — it picks up on spatial context the threshold rule can't see.

Evaluation: IoU and confusion matrices

For segmentation, accuracy is misleading (a network that predicts "no plume everywhere" gets 99.99% accuracy because most pixels really are no plume). Use:

IoU (Intersection over Union) — area of overlap / area of union. 1.0 is perfect, 0.0 is no overlap. Compute per-class IoU and report the mean (mIoU).
Confusion matrix — true positives, false positives, false negatives, true negatives at the pixel level. Derive precision (TP / (TP + FP)) and recall (TP / (TP + FN)). For LaunchDetect's production gate, the rule is false positive rate must stay below 5%.

Small models, not big

For thermal plume segmentation in 200×200 pixel windows, a 32-feature U-Net (~1M parameters) is more than enough. Don't reach for big pretrained models — they need huge training sets, they're slow to deploy, and the feature distribution of satellite imagery is far enough from ImageNet that pretrained weights help less than you'd expect.

The lab

You'll generate weakly-supervised training data from threshold detections + morphology over a year of GOES Band 7 frames, train a small U-Net in PyTorch, evaluate on held-out launches with IoU + confusion matrices, and confirm the per-pixel false-positive rate is below 5%. This is the architecture for LaunchDetect's "Layer 3" classifier — the production model that scores threshold-detected hotspots for plume-vs-fire-vs-noise.

Hands-on lab: U-Net for plume segmentation

Generate training data from threshold-detected plumes in GOES Band 7. Train a small U-Net to segment plume pixels. Evaluate on held-out launches with IoU and confusion matrices.

Open in Colab Download .ipynb

Quiz

Test yourself. Answer key on the certificate-track page (Gold-tier feature: progress tracking and auto-grading).

Q1. U-Net architecture is:

Encoder-decoder with skip connections, ideal for segmentation
Just a CNN
Recurrent
Transformer-only

Q2. IoU (intersection over union) measures:

Overlap between predicted and ground-truth mask
Loss only
Reprojection error
Compression ratio

Q3. Generating training data via thresholding is called:

Weak supervision (programmatic labels)
Manual labeling
Synthetic data
Augmentation

Q4. CNNs work well on images because:

Translation invariance and locality
They're newest
Only choice
Marketing

Q5. Why use a small U-Net (not a giant model)?

Faster inference, less overfitting on small training sets, deployable to edge
Always smaller is worse
Required by law
No reason