User Guide

This guide covers all aspects of using marEx for marine extreme detection and tracking, from basic concepts to advanced workflows.

Introduction

marEx is a Python package for efficient identification and tracking of marine extremes (e.g., Marine Heatwaves) in oceanographic data. It provides a complete pipeline from raw data preprocessing to tracked event visualisation.

Key Features

  • Flexible Data Support: Both structured (lat/lon grids) and unstructured (irregular meshes) data

  • Multiple Methods: Various anomaly computation and threshold definition methods

  • Parallel Processing: Built on Dask for efficient computation on large datasets

  • Event Tracking: Sophisticated algorithms for tracking coherent events through time

  • Rich Visualisation: Automatic plotting system with publication-ready figures

  • HPC Integration: Optimised for supercomputing environments

Installation and Setup

For complete installation instructions including system requirements, optional dependencies, and HPC environments, see Installation.

Data Preparation

Input Data Requirements

marEx works with xarray DataArrays containing oceanographic time series data:

Required Dimensions:

  • Time dimension: Regular or irregular time steps (daily to monthly)

  • Spatial dimensions: Either structured (lat, lon) or unstructured (cell/ncells)

Supported Formats:

  • NetCDF (.nc)

  • Zarr (.zarr)

  • Any xarray-compatible format

Data Structure Examples:

For structured/gridded data:

data.dims = ('time', 'lat', 'lon')
# Coordinates: time, lat, lon as dimensions

For unstructured data:

data.dims = ('time', 'ncells')
# Coordinates: time as dimension, lat/lon as coordinates

Data Quality Requirements

  • Temporal coverage: Minimum 10 years for robust climatology

  • Spatial coverage: Consistent grid throughout time series

Data Loading Examples

Loading NetCDF files:

import xarray as xr

# Single file
ds = xr.open_dataset('sst_data.nc', chunks={})
sst = ds.sst

# Multiple files
ds = xr.open_mfdataset('sst_*.nc', parallel=True, chunks={})
sst = ds.sst

Loading Zarr datasets:

# Local Zarr
ds = xr.open_zarr('data.zarr', chunks={})

# Cloud-optimised Zarr
ds = xr.open_zarr('gs://bucket/data.zarr', chunks={})

Optimising data loading:

# Apply chunking for efficient processing (chunk sizes vary by operation)
sst = sst.chunk({'time': 365, 'lat': 50, 'lon': 100})

# Ensure Dask backing
if not marEx.is_dask_collection(sst.data):
    sst = sst.chunk()

Note: Optimal chunk sizes depend on your dataset size and the operation (preprocessing vs tracking). See the Performance Optimisation section below for detailed chunking strategies.

Core Workflow

The marEx workflow consists of three main steps. For a conceptual overview and foundations, see Core Concepts.

┌─────────────────┐      ┌─────────────────┐      ┌─────────────────┐
│  1. Detect      │  →   │  2. Track       │  →   │  3. Visualise   │
│    Extremes     │      │    Events       │      │     & Analyse   │
└─────────────────┘      └─────────────────┘      └─────────────────┘
        ↓                        ↓                        ↓
 preprocess_data()           tracker()                  plotX()
        ↓                        ↓                        ↓
 Binary extreme map        Tracked objects          Maps, animations,
                           with unique IDs           & statistics

Summary:

  1. Preprocessing: Convert raw data to anomalies and identify extremes

  2. Tracking: Track extreme events through time

  3. Visualisation: Analyse and visualise results

Step 1: Preprocessing

The preprocessing step transforms raw oceanographic data into anomalies and detects extreme event locations.

Basic preprocessing:

import marEx

# Basic preprocessing with default settings
extremes = marEx.preprocess_data(
    sst,
    threshold_percentile=95,
    method_anomaly='shifting_baseline',
    method_extreme='hobday_extreme'
)

Advanced configuration:

# Advanced preprocessing configuration
extremes = marEx.preprocess_data(
    sst,
    # Anomaly computation method
    method_anomaly='detrend_fixed_baseline',  # or 'detrend_harmonic', 'fixed_baseline', 'shifting_baseline'
    detrend_order=[1, 2],
    smooth_days_baseline=21,             # Smoothing for climatology

    # Extreme identification method
    method_extreme='hobday_extreme',     # or 'global_extreme'
    threshold_percentile=95,             # 95th percentile
    window_days_hobday=11,               # Days around each day of year
    window_spatial_hobday=5,             # Spatial window for percentile calculation

    # Output options
    dask_chunks={'time': 25}
)

The resulting xarray dataset extremes will have the following structure & entries:

xarray.Dataset
Dimensions:     (lat, lon, time)
Coordinates:
    lat         (lat)
    lon         (lon)
    time        (time)
Data variables:
    dat_anomaly     (time, lat, lon)        float64     dask.array
    mask            (lat, lon)              bool        dask.array
    extreme_events  (time, lat, lon)        bool        dask.array
    thresholds      (dayofyear, lat, lon)   float64     dask.array

where:

  • dat_anomaly (time, lat, lon): Anomaly data

  • extreme_events (time, lat, lon): Binary field locating extreme events (1=event, 0=background)

  • thresholds (dayofyear, lat, lon): Extreme event thresholds used to determine extreme events

  • mask (lat, lon): Valid data mask

See ./examples/unstructured data/01_preprocess_extremes.ipynb in Examples Gallery for a detailed example of pre-processing on an unstructured grid.

Step 2: Event Tracking

The tracking step identifies coherent extreme events and follows them through time.

Basic tracking:

tracked_events = marEx.tracker(
    extremes.extreme_events,
    extremes.mask,
    area_filter_absolute=100   # Remove objects smaller than 100 grid cells
    R_fill=8,                  # Radius for filling gaps (in grid cells)
).run()

Advanced tracking options:

# Initialise advanced tracker
tracker = marEx.tracker(
    extremes.extreme_events,
    extremes.mask,

    # Temporal criteria
    T_fill=4,                    # Fill gaps up to 4 days (to maintain continuous events)

    # Spatial criteria
    R_fill=8,                    # Fill small holes with radius up to 8 grid cells
    area_filter_quartile=0.5,    # Remove smallest 50% of events (alternative to area_filter_absolute)
    cell_areas=grid_areas,       # Optional: physical cell areas for structured grids (m²)

    # Merging criteria
    allow_merging=True,          # Allow merging of events (and keep track of merged IDs & events)
    overlap_threshold=0.5,       # 50% overlap for merging (otherwise events keep independent IDs)
    nn_partitioning=True,        # Use nearest-neighbour partitioning when splitting events
)

# Run tracking and return merging data
tracked_events, merge_events = tracker.run(return_merges=True)

The resulting xarray dataset tracked_events will have the following structure & entries:

xarray.Dataset
Dimensions: (lat, lon, time, ID, component, sibling_ID)
Coordinates:
    lat         (lat)
    lon         (lon)
    time        (time)
    ID          (ID)
Data variables:
    ID_field              (time, lat, lon)        int32       dask.array
    global_ID             (time, ID)              int32       ndarray
    area                  (time, ID)              float32     ndarray
    centroid              (component, time, ID)   float32     ndarray
    presence              (time, ID)              bool        ndarray
    time_start            (ID)                    datetime64  ndarray
    time_end              (ID)                    datetime64  ndarray
    merge_ledger          (time, ID, sibling_ID)  int32       ndarray

where:

  • ID_field: Field containing the IDs of tracked events (0=background)

  • global_ID: Unique global ID of each object; global_ID.sel(ID=10) maps event ID 10 to its original ID at each time

  • area: Area of each event as a function of time (in units of cell counts, or physical units if cell_areas/grid_resolution provided)

  • centroid: (x, y) centroid coordinates of each event as a function of time

  • presence: Presence (boolean) of each event at each time (anywhere in space)

  • time_start: Start time of each event

  • time_end: End time of each event

  • merge_ledger: Sibling IDs for merging events (matching ID_field); -1 indicates no merging event occurred

When running with return_merges=True, the resulting xarray dataset merge_events will have the following structure & entries:

xarray.Dataset
Dimensions: (merge_ID, parent_idx, child_idx)
Data variables:
    parent_IDs      (merge_ID, parent_idx)  int32       ndarray
    child_IDs       (merge_ID, child_idx)   int32       ndarray
    overlap_areas   (merge_ID, parent_idx)  int32       ndarray
    merge_time      (merge_ID)              datetime64  ndarray
    n_parents       (merge_ID)              int8        ndarray
    n_children      (merge_ID)              int8        ndarray

where:

  • parent_IDs: Original parent IDs of each merging event

  • child_IDs: Original child IDs of each merging event

  • overlap_areas: Area of overlap between parent and child objects in each merging event

  • merge_time: Time of each merging event

  • n_parents: Number of parent objects in each merging event

  • n_children: Number of child objects in each merging event

See ./examples/unstructured data/02_id_track_events.ipynb in Examples Gallery for a detailed example of identification, tracking, & merging on an unstructured grid.

Step 3: Visualisation

marEx includes a powerful visualisation system called plotX that automatically detects data types and creates appropriate plots.

Basic visualisation:

# Plot global extreme event frequency
event_frequency = (tracked_events.ID_field > 0).mean("time")

# Configure plot appearance
config = marEx.PlotConfig(
    var_units="MHW Frequency",
    cmap="hot_r",
    cperc=[0, 96],
    grid_labels=True
)

# Create single plot
fig, ax, im = event_frequency.plotX.single_plot(config)

Advanced visualisation:

# Multi-panel visualisation: seasonal extreme event frequency
seasonal_frequency = (tracked_events.ID_field > 0).groupby("time.season").mean(dim="time")

# Configure plot appearance
config = marEx.PlotConfig(
    var_units="MHW Frequency",
    cmap="hot_r",
    cperc=[0, 96],
    grid_labels=True
)

# Create multi-panel plot
fig, ax = seasonal_frequency.plotX.multi_plot(config, col="season", col_wrap=2)

# Create animation of tracked events
ID_field_subset = tracked_events.ID_field.sel(time=slice("2020-01-01", "2022-05-31"))
config = marEx.PlotConfig(plot_IDs=True)
ID_field_subset.plotX.animate(config, plot_dir=plot_dir, file_name="marine_heatwaves")

# Plot consecutive time periods
ID_field_subset = tracked_events.ID_field.sel(time=slice("2021-01-01", "2021-01-06"))
config = marEx.PlotConfig(plot_IDs=True)
fig, ax = ID_field_subset.plotX.multi_plot(config, col="time", col_wrap=3)

Method Selection Guide

This section provides practical guidance for selecting appropriate methods. For foundations and definitions, see Core Concepts.

Use this decision tree to select the most appropriate anomaly calculation method for your research:

┌─────────────────────────────────────────────────────────────────────┐
│ Anomaly Method Selection Decision Tree                              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│ Need full time series? ──No──> SHIFTING BASELINE                    │
│         │                       (most accurate, shortens data by    │
│        Yes                       window_year_baseline years)        │
│         │                                                           │
│         ├─> Remove trends? ──No──> FIXED BASELINE                   │
│         │         │                 (keeps trends in anomaly)       │
│         │        Yes                                                │
│         │         │                                                 │
│         │         └──> Need efficiency? ──Yes──> DETREND HARMONIC   │
│         │                      │                  (fast, biased)    │
│         │                     No                                    │
│         │                      │                                    │
│         │                      └──> DETREND FIXED BASELINE          │
│         │                           (accurate detrending)           │
└─────────────────────────────────────────────────────────────────────┘

Anomaly Methods

Harmonic Detrending (detrend_harmonic):

  • Detrends with an OLS 6+ coefficient model (mean, annual & semi-annual harmonics, and arbitrary polynomial trends)

  • Best for: Datasets with linear trends, operational monitoring

  • Pros: Fast & memory efficient

  • Cons: Does not capture phenological shifts and non-harmonic seasonal variability. Strongly biases certain statistics.

  • Use when: Real-time processing

Fixed Baseline (fixed_baseline):

  • Removes the daily climatology (calculated in a reference_period) – does not remove climate trends

  • Best for: Simple anomaly calculation without detrending

  • Pros: Straightforward interpretation, preserves long-term trends

  • Cons: Does not account for climate change trends, seasonal timing shifts

  • Use when: Baseline comparison studies, trend-inclusive analysis, public outreach

Detrend Fixed Baseline (detrend_fixed_baseline):

  • Polynomial detrending followed by fixed daily climatology – keeps full time-series of data, but does not account for trends in the timing of seasonal transitions

  • Also supports calculating the climatology only within a reference_period (detrending still uses all data)

  • Best for: Studies requiring detrending but maintaining full temporal data coverage

  • Pros: Removes long-term trends while preserving seasonal cycles, maintains full time series

  • Cons: Does not account for changes in seasonal timing or seasonality

  • Use when: Climate variability studies with trend removal

Shifting Baseline (shifting_baseline):

  • Removes seasonal & long-term trends using a smoothed rolling climatology

  • Best for: Climate change studies, non-stationary data

  • Pros: Captures non-linear trends, adapts to changing climate, and seasonal timing variability

  • Cons: Computationally expensive, shortens effective time series

  • Use when: Long-term climate studies, accurate & robust analysis

Extreme Identification Methods

Global Extreme (global_extreme):

  • Applies a global-in-time (i.e. constant in time) threshold

    N.B.: This method is a hack designed for ocetrac, which when paired with std_normalise=True can approximate hobday_extreme in very specific cases and with a number of caveats. (However, normalising by local STD is again memory-intensive, at which point there is little gained by this approximate method.)

  • Best for: Simple threshold, constant across seasons

  • Pros: Simple interpretation, fast computation (without std_normalise)

  • Cons: Seasonal bias, may miss winter extremes, variability distribution is skewed

  • Use when: Initial analysis, 0th order comparisons

Hobday Method (hobday_extreme):

  • Defines a local day-of-year specific threshold within a rolling window (equivalent to the Hobday et al. (2016) definition for simple time-series)

  • Best for: Long-term scientific climate studies, seasonal studies, biological impacts

  • Pros: Accounts for seasonal variability, literature standard

  • Cons: Complex threshold interpretation, computationally intensive

  • Use when: Ecological studies, seasonal analysis, literature comparison

Spatial Window for Hobday Extreme (window_spatial_hobday)

New in v3.0+: The Hobday extreme method now supports optional spatial windowing to create more statistically robust thresholds.

# Traditional Hobday (temporal window only)
extremes_traditional = marEx.preprocess_data(
    sst,
    method_extreme='hobday_extreme',
    window_days_hobday=11,          # 11-day temporal window
    window_spatial_hobday=None      # No spatial window
)

# With spatial windowing (recommended for most applications)
extremes_window = marEx.preprocess_data(
    sst,
    method_extreme='hobday_extreme',
    window_days_hobday=11,          # 11-day temporal window
    window_spatial_hobday=5         # 5×5 spatial window (marEx default; use 1 for strict Hobday et al. 2016 definition for time-series)
)

How Spatial Windowing Works:

Traditional Hobday:                  With Spatial Window (5×5):
Single cell samples                  25 cells × 11 days = 275 samples
(lat, lon) ──> 11 days               ┌───┬───┬───┬───┬───┐
                                     │   │   │   │   │   │
Only temporal pooling                ├───┼───┼───┼───┼───┤
                                     │   │   │   │   │   │
                                     ├───┼───┼─●─┼───┼───┤ Central cell
                                     │   │   │   │   │   │
                                     ├───┼───┼───┼───┼───┤
                                     │   │   │   │   │   │
                                     └───┴───┴───┴───┴───┘
                                     Spatial + temporal pooling

Benefits:

  • Spatially coherent thresholds: Reduces noise from individual grid cells

  • More robust statistics: Larger sample size for robust percentile calculation

Limitations:

  • Structured grids only: Not supported for unstructured (irregular) grids

  • Requires approximate method: Only works with method_percentile='approximate'

When to use:

  • Short time-series of data

  • High percentile thresholds defining the extremes

  • Noisy SST fields (e.g., satellite with gaps)

  • Default (window_spatial_hobday=5) works well for most cases

Example: Comparing threshold noise:

# Compare threshold variability
threshold_std_traditional = extremes_traditional.thresholds.std(dim='dayofyear').mean()
threshold_std_smooth = extremes_smooth.thresholds.std(dim='dayofyear').mean()

print(f"Traditional threshold noise: {threshold_std_traditional.compute():.4f}")
print(f"Windowed threshold noise: {threshold_std_window.compute():.4f}")

Threshold Percentiles

  • 90th percentile: More events, captures moderate extremes

  • 95th percentile: Standard for marine heatwaves, balanced approach

  • 99th percentile: Only most extreme events, rare events focus

Performance Optimisation

This section provides guidance for optimising marEx performance across different computing environments and dataset sizes.

Memory Management and Chunking

Chunking Strategy

Optimal chunking strategies vary by dataset size and operation type:

# Small datasets (< 10GB): Chunk time dimension
sst = sst.chunk({'time': 365, 'lat': -1, 'lon': -1})

# Large datasets (> 100GB): Chunk all dimensions
sst = sst.chunk({'time': 365, 'lat': 180, 'lon': 360})

# For tracking: Use smaller time chunks
sst = sst.chunk({'time': 25, 'lat': -1, 'lon': -1})  # Recommended for tracker

Memory Requirements Estimation

Operation

Memory (per core)

Optimal Time Chunk

Notes

preprocess_data

2-4 GB

365+ days

Depends on method_anomaly

tracker

1-2 GB

25-50 days

Keep spatial dims unbounded (-1)

Exact percentiles

8-16 GB

Full time series

Requires careful chunking

Approximate percentiles

1-2 GB

Any

Memory efficient

Exact vs Approximate Percentile Methods

When to Use Each Method

APPROXIMATE (default): For large datasets

extremes = marEx.preprocess_data(
    sst,
    method_percentile='approximate',
    precision=0.01,        # ~0.01°C bins
    max_anomaly=5.0,       # Histogram range ±5°C
    dask_chunks={'time': 365}  # Any chunking works
)
# ✓ Memory efficient (1-2 GB/core)
# ✓ Fast parallel computation
# ✓ ~0.01°C precision (sufficient for most studies)

EXACT: For small datasets or high precision needs

extremes_exact = marEx.preprocess_data(
    sst,
    method_percentile='exact',
    dask_chunks={'time': -1, 'lat': 100, 'lon': 100}  # Careful chunking required!
)
# ⚠ High memory (8-16 GB/core), depending on time-series length
# ⚠ Slower computation
# ✓ Mathematically exact percentiles

JAX Acceleration

When JAX Helps Most

import marEx

# Check JAX availability
if marEx.has_dependency('jax'):
    print("JAX acceleration active")
    # Best for:
    # - Harmonic detrending (10-50× speedup)
    # - Large spatial operations
    # - GPU/TPU available systems
else:
    print("Install JAX: pip install marEx[full]")
    # Falls back to NumPy (still fast with Numba JIT)

Performance Gains

Operation

NumPy+Numba

JAX (CPU)

JAX (GPU)

Harmonic detrend

1× (baseline)

15× faster

50× faster

Percentile calc

1× (similar)

2-3× faster

Tracking

1× (similar)

N/A

HPC Cluster Setup

SLURM Distributed Cluster

For supercomputers:

import marEx

# Start distributed Dask cluster on SLURM
client = marEx.helper.start_distributed_cluster(
    n_workers=64,           # Number of workers
    cores_per_worker=4,     # Cores per worker
    memory_per_worker='8GB' # Memory per worker
)

# Get dashboard URL for monitoring
marEx.helper.get_cluster_info(client)

# Run analysis
extremes = marEx.preprocess_data(sst, ...)
events = marEx.tracker(extremes.extreme_events, ...).run()

# Clean up
client.close()

Local Cluster

For workstations:

# Auto-detect system resources
client = marEx.helper.start_local_cluster(
    n_workers=8,            # Leave some cores for system
    threads_per_worker=2,
    memory_limit='16GB'     # Per worker
)

Example Batch Scripts

The examples/batch jobs/ directory provides ready-to-use scripts designed to be run from a login node on HPC systems with SLURM:

Workflow:

  1. run_detect.py: Launches a distributed Dask job to execute the preprocessing and extreme detection pipeline

  2. run_track.py: Launches a distributed Dask job for event identification and tracking

Both scripts utilise marEx.helper.start_distributed_cluster() for cluster management and are configured via environment variables.

Configuration Environment Variables:

  • DASK_N_WORKERS: Number of Dask workers to launch (default varies by script: 128 for detect, 32 for track)

  • DASK_WORKERS_PER_NODE: Workers per SLURM node (default varies by script: 64 for detect, 32 for track)

  • DASK_RUNTIME: Maximum runtime in minutes (default: 39 for detect, 89 for track)

  • SLURM_ACCOUNT: SLURM account for billing (default: ‘bk1377’)

  • Additional for run_track.py: RUN_BASIC_TRACKER, GRID_RESOLUTION, AREA_FILTER, R_FILL, T_FILL, OVERLAP_THRESHOLD

Example Usage:

# Run preprocessing on SLURM cluster
export DASK_N_WORKERS=256
export DASK_RUNTIME=120
export SLURM_ACCOUNT=my_project
python examples/batch\ jobs/run_detect.py

# Run tracking after preprocessing completes
export DASK_N_WORKERS=64
export DASK_RUNTIME=180
python examples/batch\ jobs/run_track.py

Customisation:

These scripts are designed to be copied and modified for your specific:

  • Data file paths and chunk sizes

  • Preprocessing parameters (anomaly method, thresholds)

  • Tracking parameters (spatial filters, merge/split settings)

  • Cluster resources (workers, memory, runtime)

The scripts handle cluster setup, data processing, and saving results to zarr/netCDF files on the scratch filesystem.

Checkpointing for Large Datasets

When processing very large datasets on HPC systems, Dask may recompute intermediate results multiple times, especially for complex operations like 2D histogram-based percentile calculations. This can significantly increase computation time and memory usage.

The Problem:

Dask builds a computation graph that tracks all operations. For large preprocessing pipelines, this graph can become deeply nested, causing Dask to recompute expensive intermediate steps (climatologies, anomalies, thresholds) whenever downstream operations need them.

The Solution:

The use_temp_checkpoints=True parameter breaks the Dask computation graph by saving intermediate results to temporary zarr stores and immediately reloading them. This prevents expensive recomputations at the cost of some disk I/O.

How It Works:

  1. Intermediate arrays (anomalies, climatologies, thresholds, extremes) are saved to temporary zarr files

  2. The saved data is immediately reloaded as a fresh Dask array

  3. This breaks the dependency chain in the computation graph

  4. Temporary files are automatically cleaned up after reloading

When to Use:

  • Large datasets (>100 GB) where preprocessing takes hours

  • HPC environments with fast scratch storage

  • When 2D histogram percentile calculations are a bottleneck

  • When you notice Dask recomputing the same operations multiple times

Example Usage:

import xarray as xr
import marEx

# Load large dataset
sst = xr.open_zarr('large_dataset.zarr', chunks={'time': 30}).sst

# Enable checkpointing to prevent expensive recomputations
extremes = marEx.preprocess_data(
    sst,
    method_anomaly='shifting_baseline',
    method_extreme='hobday_extreme',
    threshold_percentile=95,
    use_temp_checkpoints=True,  # Enable checkpointing
    dask_chunks={'time': 25}
)

Performance Trade-offs:

  • Benefit: Prevents expensive recomputations, reduces memory pressure

  • Cost: Requires fast disk I/O for temporary zarr stores

  • Recommendation: Enable on HPC systems with high-speed scratch storage; test on your specific dataset to measure performance impact

Note: Temporary files are automatically removed after data is reloaded, so no manual cleanup is required.

Grid Types and Coordinate Systems

Structured/Gridded Data

Regular latitude-longitude grids are the most common data type:

# Typical structured data
print(sst.dims)         # e.g. ('time', 'lat', 'lon')
print(sst.coords)       # e.g. ('time', 'lat', 'lon')

# marEx automatically detects structured grids
processed = marEx.preprocess_data(sst)

Unstructured Data

Irregular meshes common in coastal and global ocean models:

# Typical unstructured data (e.g., ICON model)
print(sst.dims)         # e.g. ('time', 'ncells')
print(sst.coords)       # e.g. ('time', 'lat', 'lon') as coordinates, not dims

# Specify grid configuration
marEx.specify_grid(
    grid_type='unstructured',
    fpath_tgrid='grid_info.nc',      # Grid topology file
    fpath_ckdtree='kdtree.pkl'       # Spatial index (optional)
)

Requirements:

  • Spatial dimension name (ncells, cell, etc.)

  • Latitude/longitude as coordinate arrays

  • Optional: Grid topology information for advanced/unstructured features

Advanced Features

Compound Events

Analyse events that exceed multiple thresholds or variables:

# Multiple variable analysis
sst_extremes = marEx.preprocess_data(sst, threshold_percentile=95)
salinity_extremes = marEx.preprocess_data(salinity, threshold_percentile=5)  # Low salinity

# Compound events
compound_events = sst_extremes.extreme_events & salinity_extremes.extreme_events

Best Practices and Guidelines

Research Workflow

  1. Exploratory Analysis: Start with basic preprocessing to understand data

  2. Method Comparison: Test different methods on subset of data

  3. Quality Control: Validate results thoroughly

  4. Full Processing: Apply chosen method to complete dataset

  5. Validation: Compare with known events and literature

Literature Compliance

For marine heatwave studies following Hobday et al. (2016):

# Standard MHW definition
mhw_config = {
    'method_anomaly': 'shifting_baseline',
    'method_extreme': 'hobday_extreme',
    'threshold_percentile': 90,
    'window_days_hobday': 11,
    'window_year_baseline': 30,
    'smooth_days_baseline': 11,
    'window_spatial_hobday': 1,  # Hobday et al. (2016) considers only single points
}

Getting Help and Support

For troubleshooting common issues, diagnostic checklists, and support resources, see Troubleshooting.

Citations and References

When using marEx in publications, please cite:

  • marEx package

  • Marine heatwave methods: Hobday et al. (2016) for standard MHW definition

For more detailed examples and advanced usage, see the Examples Gallery.