Why marEx?
marEx provides unique capabilities not available in alternative tools (e.g., ocetrac), making it the choice for extreme event analysis. This page highlights the distinguishing features that set marEx apart.
Advanced Tracking & Identification
Independent Event Tracking with Merge/Split Genealogy
The Mega-Event Problem: Basic tracking methods (e.g., 3D connected component labeling with ocetrac) treat time as just another spatial dimension, permanently merging ANY objects that touch at any point in space-time. This creates a chain reaction where distinct events that briefly touch become irrevocably linked, producing unrealistic basin-spanning “mega-events” that combine dozens of independent phenomena. These algorithmic artefacts have no coherent physical origin and produce meaningless statistics, making them unsuitable for mechanistic studies of extreme event dynamics.
marEx Solution: marEx tracks events through advection, spatio-temporal morphing, splitting, and merging with complete parent/child relationships recorded in merge_ledger. This genealogical reconstruction prevents mega-events by requiring significant overlap (not just touching) and maintaining individual event identities. This approach is critical for mechanistic studies of ocean extreme event evolution and enables accurate, unbiased Lagrangian statistics on physically realistic events.
Tracking Algorithm Comparison:
The video below demonstrates why marEx’s selective merging with overlap thresholds and genealogy tracking are crucial for scientifically meaningful event analysis:
Left (Basic Method — Chain-Reaction Merging): 3D connected components permanently merge any touching objects. Event A touches B → merged as AB. AB touches C → all become one “mega-event”. Over time, independent physical phenomena are spuriously linked into single, basin-spanning artefacts with no mechanistic relevance. Statistics derived from these mega-events are scientifically meaningless.
Right (marEx Advanced Method — Genealogy Tracking): Prevents mega-events using overlap_threshold to distinguish transient contact from true merging. Maintains individual event identities and records complete parent/child genealogy in merge_ledger. Enables reconstruction of physically realistic event evolution where tracked objects correspond to actual coherent phenomena, providing mechanistically relevant statistics for scientific analysis.
Key benefits:
Full parent/child ID tracking in
merge_ledgerdatasetOverlap requirement for each merge/split event
Enables reconstruction of complex event dynamics (e.g., fusion of two smaller marine heatwaves into one larger event)
Code reference: marEx.track.tracker with allow_merging=True parameter
Nearest-Neighbor Partitioning (nn_partitioning)
marEx implements advanced partitioning for split events based on the nearest-neighbor parent cell, which is more robust than simpler centroid-based methods.
Key benefits:
Prevents artefacts where small fragments “inherit” disproportionately large portions of new objects
Provides physically realistic allocation of area and identity when events split
Most accurate tracking method available
Code reference: marEx.track.tracker with nn_partitioning=True parameter
Temporal Gap Filling (T_fill)
marEx uses morphological closing along the time axis to maintain event continuity across short interruptions. This prevents a single, persistent event from being incorrectly split into multiple shorter events if it temporarily weakens below the detection threshold for a few time steps.
Key benefits:
Maintains continuity of persistent events across brief interruptions
More robust than simple rule-based gap filling
Configurable gap length (e.g., T_fill=4 fills gaps up to 4 days)
Code reference: marEx.track.tracker with T_fill parameter
Morphological Preprocessing (R_fill)
marEx applies morphological closing and opening operations to spatially fill gaps and remove noise before tracking begins. For unstructured grids, this is implemented with a highly efficient sparse matrix approach. This creates more coherent and less noisy binary event fields, leading to more stable and meaningful object tracking.
Key benefits:
Fills small holes within events and smooths boundaries before identification
Prevents spurious small objects and artificially fragmented events
Dual implementation: Dask-powered for structured grids, scipy sparse matrices for unstructured data
Code reference: marEx.track.tracker with R_fill parameter
Dual Area Filtering Strategies
marEx allows both a percentile-based (area_filter_quartile, adaptive to dataset) OR absolute (area_filter_absolute, reproducible across datasets) thresholds for object filtering.
Key benefits:
Quartile: Remove smallest X% of events (adaptive, useful for exploratory analysis)
Absolute: Fixed minimum area threshold (reproducible, useful for cross-dataset comparison)
Code reference: marEx.track.tracker with area_filter_quartile or area_filter_absolute
Flexible & Rigorous Anomaly Detection
Four Distinct Anomaly Methods
marEx provides four scientifically rigorous anomaly calculation methods with documented trade-offs.
Available methods:
shifting_baseline: Rolling climatology that adapts to changing climate (most accurate, default in v3.0+)
detrend_fixed_baseline: Polynomial detrending followed by fixed daily climatology (preserves full time-series length, removes long-term trends)
fixed_baseline: Simple daily climatology (keeps trends in anomaly, straightforward interpretation)
detrend_harmonic: Fast harmonic + polynomial model (efficient but may bias certain statistics)
Key benefits:
Choose between accuracy, computational efficiency, time-series preservation, and trend handling
Accommodate analyses that need to include/exclude long-term trends
Account for shifting seasonal cycles or maintain stationary baselines
Code reference: marEx.detect.preprocess_data with method_anomaly parameter
Hobday Spatial Window Extension
marEx extends/generalises the standard Hobday et al. (2016) temporal window by adding a spatial dimension (window_spatial_hobday). This creates a spatio-temporal cube of data points for calculating percentile thresholds (e.g., 5×5 spatial × 11 days = 275 samples per year), resulting in more robust and spatially coherent statistics. This is a major methodological advancement over the original Hobday definition.
Key benefits:
Produces spatially coherent thresholds by pooling neighbouring grid cells (motivated by spatio-temporal correlation lengthscale)
Reduces noise and statistical uncertainty in anomaly threshold calculations
Especially valuable for short time series, high percentile thresholds, or noisy data (e.g., satellite SST with gaps)
Limitations:
Structured grids only (not supported for unstructured/irregular grids)
Requires
method_percentile='approximate'
Code reference: marEx.detect.preprocess_data with window_spatial_hobday parameter (default=5)
Histogram-Based Approximate Percentiles
marEx implements a clever 2D histogram approach for percentile calculation that is highly memory-efficient and parallelisable with Dask. This method is uses 100× less memory than exact computation while maintaining ~0.01°C precision—sufficient for most studies. This method makes long time-series terabyte-scale percentile calculations feasible that were previously unachievable with daily data.
Key benefits:
Enables global-in-time calculations on massive datasets
~0.01°C precision adequate for marine heatwave studies
Overcomes the memory bottleneck of exact percentiles (which require loading entire time series)
Code reference: marEx.detect.preprocess_data with method_percentile='approximate' (default)
Extreme Scale & Performance
Terabyte-Scale Processing
marEx features a “Dask-first” architecture with mandatory Dask validation that processes datasets 100-1000× larger than available RAM. The package is designed from the ground up for exascale data, enabling baseline computations on 100+ years of daily global data.
Key benefits:
Process massive climate datasets efficiently with intelligent chunking
Explicit chunking control via
dask_chunksparameter throughout pipeline
Code reference: All functions in marEx.detect and marEx.track require Dask-backed arrays
HPC/SLURM Integration
marEx provides wrappers for easy deployment on supercomputers via the marEx.helper module with automatic cluster configuration, memory optimisation (256GB/512GB/1024GB nodes), and dashboard tunneling. Designed specifically for DKRZ Levante and adaptable to other HPC systems, it simplifies the process of scaling an analysis from a laptop to a supercomputer.
Key benefits:
Abstracts away the complexity of configuring Dask for specific HPC environments
Pre-configured memory settings for common node types
Dashboard tunneling for remote monitoring
System resource checking for local clusters
Code reference: marEx.helper.start_distributed_cluster for SLURM systems
JAX Acceleration
marEx can leverage JAX for significant performance gains (10-50× speedup reported) on critical-path calculations. The integration includes graceful fallbacks to NumPy+Numba if JAX is not installed, so users get acceleration if available, but code still works without it.
Key benefits:
Dramatically reduces computation time for large datasets on GPU/TPU systems
Moving from hours to minutes for key preprocessing steps
Automatic backend selection
Code reference: Install with pip install marEx[full] for JAX support
Numba JIT Compilation
marEx uses Numba’s just-in-time (JIT) compilation as a core dependency for CPU-bound operations, providing performance acceleration without requiring any user configuration. Numba compiles Python functions to optimised machine code at runtime, delivering near-C performance for numerical computations.
Key benefits:
Provides baseline acceleration, even without JAX/GPU
Transparent performance gains on CPU-intensive tracking and grid operations
Code reference: Numba is a required dependency installed automatically with marEx
Grid-Agnostic & Universal
Grid-Agnostic Processing
marEx provides the same API for structured (lat/lon), unstructured (FESOM/ICON/MPAS), regridded, coarse resolution, and regional domains. Specialised algorithms (e.g., sparse-matrix morphological operations for unstructured grids) adapt automatically based on grid type detection. Users can apply the exact same analysis workflow to data from traditional climate models, satellite products, and modern variable-resolution ocean models.
Key benefits:
Transparent grid handling—write code once, use everywhere
Automatic algorithm selection based on grid structure
Supports regular rectangular grids and irregular meshes with connectivity
Supported grid types:
Structured: Standard climate models (CMIP6), reanalysis, satellite data
Unstructured: Ocean models (FESOM, ICON-O, MPAS-Ocean), finite element output
Code reference: marEx.track.tracker with unstructured_grid parameter (auto-detected)
Polymorphic Visualisation (plotX)
marEx provides a visualisation system via an xarray accessor (.plotX) that automatically detects the grid type (structured vs. unstructured) and uses the appropriate plotting backend (GriddedPlotter vs UnstructuredPlotter). Same code produces single-panel plots, multi-panel comparisons, and MP4 animations for all grid types with automatic projection handling.
Key benefits:
Simplifies creation of publication-quality maps and animations
No need to write custom plotting logic for each grid type
Global caches for triangulation and KDTree data (unstructured grid performance)
Code reference: marEx.plotX via .plotX accessor with marEx.PlotConfig
Regional Tracker
marEx provides a convenience function regional_tracker() for spatially bounded analysis with coordinate unit specification (degrees/radians) for non-global domains. This handles for example high-resolution regional studies (e.g., 0.05° European domain).
Key benefits:
Dedicated support for regional/nested domains
Manual override for coordinate system when auto-detection insufficient
Same robust tracking algorithms applied to bounded regions
Code reference: marEx.regional_tracker() convenience function
Automatic Grid Cell Area Calculation
marEx provides transparent conversion from cell counts to physical areas (km²) using spherical geometry for regular lat/lon grids. The grid_resolution parameter calculates Area = R² × |sin(lat + dlat/2) - sin(lat - dlat/2)| × dlon without requiring manual pre-computation of cell areas.
Key benefits:
No need to provide pre-computed cell areas for regular grids
Automatic spherical geometry calculations
Transparent physical area reporting in tracking outputs
Code reference: marEx.track.tracker with grid_resolution parameter
Production-Ready Infrastructure
Configurable Logging System
marEx provides three logging modes (verbose/normal/quiet) with performance monitoring, timing decorators, and memory usage tracking.
Code reference: marEx.logging_config
Coordinate Auto-Detection & Unification
marEx automatically detects degrees vs radians (checks if longitude range is ~360° or ~2π). This provides transparent handling of different coordinate conventions. Manual override is available via regional_mode=True and coordinate_units for regional domains where auto-detection may fail.
Key benefits:
No need to manually convert coordinate systems
Works with different dataset conventions out of the box
Validation with informative errors when detection is ambiguous
Code reference: marEx.track.tracker coordinate detection logic
Summary
These capabilities position marEx as a high-performance, scalable, and scientifically rigorous tool for extreme event analysis.
Next Steps:
Installation - Get marEx installed
Quickstart - Start analysing extremes in 5 minutes
User Guide - Usage guide with method selection
Examples Gallery - Complete workflow demonstrations