Core Concepts

This guide provides a foundational understanding of marEx’s design philosophy and core concepts.

Why marEx? The Challenge of Tracking Ocean Extremes

Modern oceanography generates massive datasets from satellites, ocean models, and observational networks. Within these datasets, extreme events like marine heatwaves represent critical phenomena affecting marine ecosystems, fisheries, and climate systems. However, identifying and tracking these events presents significant challenges:

  • Scale: Datasets often exceed hundreds of gigabytes or terabytes

  • Complexity: Events move, grow, shrink, merge, and split over time

  • Variability: Different scientific or industrial questions require different detection methods

  • Diversity: Data comes in varied formats (regular grids, irregular meshes, different resolutions)

The Goal of marEx

marEx (Marine Extremes) provides a scalable, flexible, and scientifically rigorous toolkit to automate the detection and tracking of marine extreme events. It handles the computational complexity so researchers can focus on scientific questions rather than implementation details.

What is a Marine Extreme Event?

Understanding marine extremes requires four foundational concepts:

Climatology

The climatology represents the long-term “normal” state of the ocean for a given location and time of year. For example, the average sea surface temperature in the North Atlantic during July, based on 30 years of data.

Think of climatology as the baseline we use to define what “typical” conditions look like.

Anomaly

An anomaly is the deviation from the climatological baseline:

Anomaly = Observed Value - Climatology

For example, if the climatology for a location in July is 20°C, and the observed temperature is 23°C, the anomaly is +3°C.

marEx provides multiple methods for calculating anomalies, each with different assumptions about trends and climate change (see User Guide for details).

Extreme Event

An extreme event is an anomaly that exceeds a statistical threshold, typically defined as a percentile of the anomaly distribution. For marine heatwaves, the standard definition uses the 95th percentile (Hobday et al. 2016):

Extreme Event = Anomaly > 95th percentile threshold

This creates a binary classification: at each location and time, conditions are either “extreme” or “not extreme.”

Tracked Object

A tracked object is a coherent extreme event that has been identified as a spatially connected region and followed through time. Tracking assigns each object a unique ID and records its evolution: position, area, lifetime, and relationships with other events (merges/splits).

The marEx Workflow: A Three-Step Process

marEx follows a clear three-stage pipeline that maps directly to its code architecture:

┌─────────────────┐      ┌─────────────────┐      ┌─────────────────┐
│  1. Detect      │  →   │  2. Track       │  →   │  3. Visualise   │
│    Extremes     │      │    Events       │      │     & Analyse   │
└─────────────────┘      └─────────────────┘      └─────────────────┘
        ↓                        ↓                        ↓
 preprocess_data()           tracker()                  plotX()
        ↓                        ↓                        ↓
 Binary extreme map        Tracked objects          Maps, animations,
                           with unique IDs           & statistics

Step 1: Detect Extremes

Function: marEx.preprocess_data()

Purpose: Transform raw oceanographic data (e.g., sea surface temperature) into a binary map showing where and when extreme conditions exist.

Process:

  1. Calculate anomalies relative to a baseline climatology

  2. Apply percentile thresholds to identify extreme values

  3. Generate a binary field: True where extremes occur, False elsewhere

Output: An xarray Dataset containing anomalies, binary extreme events, thresholds, and a data mask.

See: Detection Module (marEx.detect) for detailed algorithm descriptions and User Guide for method selection guidance.

Step 2: Track Events

Function: marEx.tracker()

Purpose: Group spatially connected extreme points into objects and follow these objects through time, handling merges and splits.

Process:

  1. Identify spatially connected regions (objects) in each time step

  2. Track objects across time by computing overlap between consecutive frames

  3. Handle merging (two events become one) and splitting (one event becomes two)

  4. Assign unique IDs to each tracked event

  5. Record event characteristics (area, centroid, lifetime)

Output: An xarray Dataset containing tracked event IDs, object statistics, and merge history.

See: Tracking Module (marEx.track) for tracking algorithms and User Guide for parameter tuning.

Step 3: Analyse & Visualise

Function: .plotX accessor

Purpose: Create publication-quality visualisations and perform statistical analysis of tracked events.

Capabilities:

  • Single-panel maps with customisable projections

  • Multi-panel comparisons (seasonal, regional)

  • Animated time series showing event evolution

  • Automatic grid detection (works with both structured and unstructured grids)

  • Statistical summaries (event frequency, duration, intensity)

Output: Matplotlib figures, saved images, MP4 animations.

See: Plotting Module (marEx.plotX) for visualisation options and Examples Gallery for real-world applications.

Key Feature: Handling All Ocean Data

A major strength of marEx is its ability to work seamlessly with different types of ocean data grids.

Structured Grids

Description: Regular rectangular grids with dimensions (time, lat, lon)

Examples:

  • Satellite-derived sea surface temperature (e.g., NOAA OISST)

  • Climate model output (e.g., CMIP6 models)

  • Reanalysis products (e.g., ERA5 ocean component)

Characteristics: Familiar latitude/longitude coordinates on a regular grid. Data at each grid point represents a rectangular area.

# Structured grid example
sst.dims        # ('time', 'lat', 'lon')
sst.coords      # time, lat, lon as coordinate arrays

Unstructured Grids

Description: Irregular meshes with dimensions (time, ncells) and separate coordinate arrays for lat/lon

Examples:

  • FESOM (Finite Element Sea ice-Ocean Model)

  • ICON-O (Icosahedral Nonhydrostatic Ocean model)

  • MPAS-Ocean (Model for Prediction Across Scales)

Characteristics: Irregular polygonal cells that allow variable resolution (e.g., higher resolution near coastlines). Requires connectivity information for spatial operations.

# Unstructured grid example
sst.dims        # ('time', 'ncells')
sst.coords      # time, lat, lon (lat/lon are coordinate arrays, not dimensions)

Transparent Grid Handling

marEx automatically detects the grid type based on the coordinate structure and applies appropriate algorithms. You write the same code for both grid types:

# Works identically for structured and unstructured grids
extremes = marEx.preprocess_data(sst, threshold_percentile=95)
tracked = marEx.tracker(extremes.extreme_events, extremes.mask).run()
fig, ax, im = tracked.ID_field.isel(time=0).plotX.single_plot(config)

For unstructured grids, specify grid metadata using marEx.specify_grid() to enable advanced features like spatial windowing (see User Guide).

Key Feature: Built for Scale with Dask

The Challenge of Big Data

Modern ocean datasets routinely exceed available computer memory:

  • Global 0.25° daily SST for 30 years: ~100 GB

  • High-resolution regional models: 200+ GB

  • Coupled climate model ensembles: 10+ TB

Traditional analysis tools that load entire datasets into memory fail with these data sizes.

The Dask Solution

marEx uses Dask as its computational backend. Dask is a parallel computing library that:

  1. Breaks data into chunks: Divides large arrays into manageable pieces

  2. Processes chunks in parallel: Utilises multiple CPU cores simultaneously

  3. Manages memory automatically: Only loads necessary chunks, discarding when done

  4. Scales from laptops to supercomputers: Same code works on 4-core laptop or 1000-core HPC cluster

How Dask Integration Works

For users: marEx requires input data to be Dask-backed xarray objects (use chunks={} when loading):

import xarray as xr

# Load data with Dask chunking
sst = xr.open_dataset('sst_data.nc', chunks={'time': 365}).sst

# marEx handles all Dask operations internally
extremes = marEx.preprocess_data(sst, threshold_percentile=95)

# Computation happens when you request results
result = extremes.extreme_events.compute()  # Triggers parallel computation

Performance benefits:

  • Process datasets 100-1000× larger than available RAM

  • Utilise all CPU cores for faster computation

  • Seamless scaling to HPC clusters with SLURM integration (see marEx.helper)

See: User Guide for chunking strategies and performance optimisation, Helper Module (marEx.helper) for HPC cluster setup.

Where to Go Next

Now that you understand marEx’s foundational concepts, here’s how to proceed:

For Hands-On Learning

  • Quickstart - Get started with a working example

  • Examples Gallery - Explore Jupyter notebooks showing complete workflows for gridded, regional, and unstructured data

For Detailed Guidance

  • User Guide - Complete guide covering:

    • Method selection (which anomaly/extreme detection method to use)

    • Scientific trade-offs between methods

    • Parameter tuning for different research questions

    • Performance optimisation strategies

    • Best practices for marine heatwave detection

For Technical Reference

For Troubleshooting

  • Troubleshooting - Common issues and solutions for installation, performance, and data problems

Key References

The scientific methods in marEx are based on established literature:

  • Hobday et al. (2016): “A hierarchical approach to defining marine heatwaves” Progress in Oceanography 141, 227-238. doi:10.1016/j.pocean.2015.12.014

    • Defines the standard marine heatwave detection methodology using day-of-year specific percentile thresholds

  • Sun et al. (2023): “Marine heatwaves in the Arctic Region: Variation in Different Ice Covers” Progress in Oceanography 203, 102947. doi:10.1016/j.pocean.2022.102947

    • Provides tracking methodology that marEx extends with improved merge/split partitioning