Monthly Temporal Aggregation of NDVI for Land Cover Change

In automated MRV (Measurement, Reporting, and Verification) architectures, raw daily reflectance observations are statistically unusable for regulatory carbon accounting. Atmospheric interference, sensor geometry drift, and acquisition gaps introduce stochastic noise that obscures structural vegetation transitions. Monthly temporal aggregation of NDVI for land cover change resolves this by transforming heterogeneous pixel-level observations into deterministic, compliance-ready composites. The engineering objective is not statistical smoothing; it is the construction of an auditable temporal baseline that survives ESG scrutiny, aligns with GHG Protocol activity data requirements, and scales across multi-sensor tile streams.

flowchart TB A["Daily NDVI stack<br/>Sentinel-2 · Landsat"] --> B["QA clear-sky filter<br/>SCL / QA_PIXEL"] B --> C["Monthly resample<br/>median / p75"] C --> D{"≥ 3 clear obs<br/>per month?"} D -->|yes| E["Monthly composite"] D -->|no| F["90-day rolling<br/>median fallback"] E --> G["Audit manifest"] F --> G

Ingestion & QA Preprocessing

Pipeline ingestion begins with STAC API queries targeting Sentinel-2 MSI and Landsat 8/9 OLI. Cloud and shadow artifacts must be resolved before aggregation, as even 15% residual cirrus contamination can depress NDVI by 0.08–0.12, triggering false deforestation alerts.

Sentinel-2 SCL clear-sky classes: The Scene Classification Layer (SCL) designates the following pixel classes as radiometrically valid for NDVI calculation: 4 (Vegetation), 5 (Not-Vegetated / Bare Soil), 6 (Water), and 7 (Unclassified). Classes 3 (Cloud Shadow), 8 (Cloud Medium Probability), 9 (Cloud High Probability), and 10 (Thin Cirrus) must be excluded.

Landsat Collection 2 QA_PIXEL clear-sky parsing: The QA_PIXEL band uses a bitmask where bit 6 (value 64) indicates cloud and bits 8–9 encode cloud confidence. A pixel is considered clear when bit 6 is unset and cloud confidence bits are zero. Do not use bit 0, which encodes fill data. The minimum clear-sky check is: (qa_pixel & 0x0040) == 0 (cloud bit clear) combined with (qa_pixel & 0x0300) == 0 (no cloud confidence set).

When acquisition density falls below the compliance threshold (typically <3 clear observations per month), fallback routing should trigger a rolling 90-day median composite rather than propagating null values. This preserves temporal continuity for downstream change detection while explicitly flagging interpolated cells in the provenance metadata. For foundational methodologies on baseline construction, reference Temporal Aggregation for Land-Use Change.

Aggregation Architecture & Statistical Selection

The aggregation engine operates on chunked Cloud-Optimized GeoTIFFs using xarray and dask.array, enabling out-of-core processing across continental-scale extents. Strict CRS alignment (e.g., EPSG:4326 or project-specific UTM zones) must be enforced during stack assembly to prevent sub-pixel misregistration during temporal resampling. While NDVI_max composites are standard for phenological tracking, land cover change detection for carbon accounting typically requires NDVI_median or the 75th percentile to suppress transient soil moisture spikes and BRDF-induced edge effects.

Multi-sensor harmonization requires radiometric cross-calibration before stacking. Sentinel-2 MSI and Landsat 8/9 OLI must be normalized to a common reflectance scale (0–10000 or 0–1) using sensor-specific gain/offset tables. This prevents artificial NDVI step-changes at sensor handoff boundaries. For broader architectural patterns in emissions tracking, see Satellite Imagery Processing for Emissions Tracking.

Production Implementation

The following routine implements QA-aware monthly aggregation, enforces minimum observation thresholds, and generates an audit-ready provenance manifest. It leverages lazy evaluation via dask to maintain memory bounds during continental tiling.

import xarray as xr
import numpy as np
import dask.array as da
import pandas as pd
import rioxarray  # registers the xarray ".rio" accessor
from datetime import datetime

# Landsat Collection 2 QA_PIXEL: clear requires bit 6 (cloud) unset
# and bits 8-9 (cloud confidence) both zero.
LANDSAT_CLEAR_MASK = 0x0040 | 0x0300  # bits to check for cloud contamination

# Sentinel-2 SCL valid (clear-sky) classes
S2_SCL_CLEAR_CLASSES = {4, 5, 6, 7}  # Vegetation, Bare Soil, Water, Unclassified

def aggregate_monthly_ndvi(
    ndvi_stack: xr.DataArray, 
    qa_mask: xr.DataArray, 
    sensor: str = "S2",
    method: str = "median",
    min_obs: int = 3,
    target_crs: str = "EPSG:4326"
) -> tuple[xr.DataArray, dict]:
    """
    Aggregates daily NDVI into monthly composites with QA-aware filtering,
    CRS validation, and compliance observation thresholds.

    qa_mask semantics:
      - Sentinel-2: integer SCL band (4,5,6,7 = clear)
      - Landsat C2:  integer QA_PIXEL band (clear when cloud bits are unset)

    Returns: (monthly_composite, audit_manifest)
    """
    # Enforce strict CRS alignment
    if ndvi_stack.rio.crs is not None and str(ndvi_stack.rio.crs) != target_crs:
        raise ValueError(f"CRS mismatch: {ndvi_stack.rio.crs} != {target_crs}")

    # Build boolean clear-sky mask from sensor-specific QA
    if sensor == "S2":
        # True where SCL class is in the valid set
        clear_mask = xr.zeros_like(qa_mask, dtype=bool)
        for cls in S2_SCL_CLEAR_CLASSES:
            clear_mask = clear_mask | (qa_mask == cls)
    elif sensor in ("L8", "L9"):
        # True where cloud bit (6) and cloud-confidence bits (8-9) are all zero
        clear_mask = (qa_mask & LANDSAT_CLEAR_MASK) == 0
    else:
        raise ValueError(f"Unsupported sensor: {sensor}. Use 'S2', 'L8', or 'L9'.")

    ndvi_clean = ndvi_stack.where(clear_mask)

    # Group by calendar month start and apply aggregation
    if method == "max":
        monthly_composite = ndvi_clean.resample(time="1MS").max(dim="time")
    elif method == "p75":
        monthly_composite = ndvi_clean.resample(time="1MS").quantile(0.75, dim="time")
    else:
        monthly_composite = ndvi_clean.resample(time="1MS").median(dim="time")

    # Audit gate: enforce minimum clear-sky observation count
    obs_count = clear_mask.resample(time="1MS").sum(dim="time")
    low_density_mask = obs_count < min_obs

    # Mask cells below threshold — downstream fallback applies rolling 90-day composite
    monthly_composite = monthly_composite.where(~low_density_mask)

    # Generate audit manifest for MRV verification
    audit_manifest = {
        "pipeline_version": "2.4.1-mrv",
        "sensor": sensor,
        "aggregation_method": method,
        "min_obs_threshold": min_obs,
        "target_crs": target_crs,
        "temporal_range": f"{ndvi_stack.time.values[0]} to {ndvi_stack.time.values[-1]}",
        "low_density_cells_pct": float((low_density_mask.sum() / low_density_mask.size) * 100),
        "qa_pass_rate_pct": float((clear_mask.sum() / clear_mask.size) * 100),
        "generated_utc": datetime.utcnow().isoformat(),
        "compliance_standard": "GHG Protocol Scope 3 LULUCF Activity Data"
    }

    # Stamp provenance attributes directly into xarray dataset
    monthly_composite.attrs.update({
        "audit_manifest": audit_manifest,
        "qa_source": "SCL clear-sky classes {4,5,6,7} or Landsat QA_PIXEL cloud-bit check",
        "interpolation_flag": "90-day rolling median fallback applied where obs < min_obs"
    })

    return monthly_composite, audit_manifest

Compliance Gating & Audit Trail Generation

Regulatory carbon accounting requires transparent activity data lineage. The pipeline above embeds a machine-readable audit_manifest directly into the output xarray attributes. This manifest survives downstream serialization to Parquet/GeoTIFF and satisfies third-party verifier requirements under ISO 14064-3 and the GHG Protocol Corporate Accounting and Reporting Standard.

Key compliance gates enforced:

  1. Observation Density Threshold: Cells with <3 clear-sky acquisitions are masked and logged. Verifiers can trace interpolation flags to specific temporal windows.
  2. QA Pass Rate Tracking: The qa_pass_rate_pct metric documents atmospheric clearance performance per tile. Drops below 60% trigger automatic pipeline alerts for manual review.
  3. CRS & Geolocation Integrity: Strict projection enforcement prevents sub-pixel drift during temporal stacking, ensuring spatial consistency across multi-year baselines.
  4. Fallback Transparency: When rolling 90-day composites are invoked, the interpolation_flag attribute explicitly marks affected pixels, preventing false attribution of carbon stock changes to unverified data.

For authoritative QA parsing specifications, consult the USGS Landsat Collection 2 Quality Assessment Band and the ESA Sentinel-2 Level-2A Scene Classification Map.

Integration & Scaling

Deploy this aggregation routine within an async tile-processing framework. Use dask.distributed to schedule chunked COG reads across a cluster, routing STAC item lists through a priority queue that balances sensor availability and cloud cover forecasts. Post-aggregation, feed monthly composites into a CUSUM or Bayesian change detection module to generate deforestation alerts. The deterministic baseline ensures alert generation pipelines operate on statistically stable inputs, reducing false positives compared to raw daily stacks.

By enforcing strict QA filtering, statistical robustness, and embedded audit trails, monthly temporal aggregation of NDVI transforms noisy optical observations into verifiable carbon accounting inputs. This architecture scales across multi-sensor streams, survives regulatory scrutiny, and provides the deterministic foundation required for automated MRV compliance.