========== User Guide ========== This guide covers all aspects of using marEx for marine extreme detection and tracking, from basic concepts to advanced workflows. .. contents:: Table of Contents :local: :depth: 3 Introduction ============ marEx is a Python package for efficient identification and tracking of marine extremes (e.g., Marine Heatwaves) in oceanographic data. It provides a complete pipeline from raw data preprocessing to tracked event visualisation. Key Features ------------ * **Flexible Data Support**: Both structured (lat/lon grids) and unstructured (irregular meshes) data * **Multiple Methods**: Various anomaly computation and threshold definition methods * **Parallel Processing**: Built on Dask for efficient computation on large datasets * **Event Tracking**: Sophisticated algorithms for tracking coherent events through time * **Rich Visualisation**: Automatic plotting system with publication-ready figures * **HPC Integration**: Optimised for supercomputing environments Installation and Setup ====================== For complete installation instructions including system requirements, optional dependencies, and HPC environments, see :doc:`installation`. Data Preparation ================ Input Data Requirements ----------------------- marEx works with xarray DataArrays containing oceanographic time series data: **Required Dimensions:** * **Time dimension**: Regular or irregular time steps (daily to monthly) * **Spatial dimensions**: Either structured (lat, lon) or unstructured (cell/ncells) **Supported Formats:** * NetCDF (.nc) * Zarr (.zarr) * Any xarray-compatible format **Data Structure Examples:** For structured/gridded data:: data.dims = ('time', 'lat', 'lon') # Coordinates: time, lat, lon as dimensions For unstructured data:: data.dims = ('time', 'ncells') # Coordinates: time as dimension, lat/lon as coordinates Data Quality Requirements ------------------------- * **Temporal coverage**: Minimum 10 years for robust climatology * **Spatial coverage**: Consistent grid throughout time series Data Loading Examples --------------------- **Loading NetCDF files:** .. code-block:: python import xarray as xr # Single file ds = xr.open_dataset('sst_data.nc', chunks={}) sst = ds.sst # Multiple files ds = xr.open_mfdataset('sst_*.nc', parallel=True, chunks={}) sst = ds.sst **Loading Zarr datasets:** .. code-block:: python # Local Zarr ds = xr.open_zarr('data.zarr', chunks={}) # Cloud-optimised Zarr ds = xr.open_zarr('gs://bucket/data.zarr', chunks={}) **Optimising data loading:** .. code-block:: python # Apply chunking for efficient processing (chunk sizes vary by operation) sst = sst.chunk({'time': 365, 'lat': 50, 'lon': 100}) # Ensure Dask backing if not marEx.is_dask_collection(sst.data): sst = sst.chunk() **Note**: Optimal chunk sizes depend on your dataset size and the operation (preprocessing vs tracking). See the Performance Optimisation section below for detailed chunking strategies. Core Workflow ============= The marEx workflow consists of three main steps. For a conceptual overview and foundations, see :doc:`concepts`. .. code-block:: text ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ 1. Detect │ → │ 2. Track │ → │ 3. Visualise │ │ Extremes │ │ Events │ │ & Analyse │ └─────────────────┘ └─────────────────┘ └─────────────────┘ ↓ ↓ ↓ preprocess_data() tracker() plotX() ↓ ↓ ↓ Binary extreme map Tracked objects Maps, animations, with unique IDs & statistics **Summary:** 1. **Preprocessing**: Convert raw data to anomalies and identify extremes 2. **Tracking**: Track extreme events through time 3. **Visualisation**: Analyse and visualise results Step 1: Preprocessing --------------------- The preprocessing step transforms raw oceanographic data into anomalies and detects extreme event locations. **Basic preprocessing:** .. code-block:: python import marEx # Basic preprocessing with default settings extremes = marEx.preprocess_data( sst, threshold_percentile=95, method_anomaly='shifting_baseline', method_extreme='hobday_extreme' ) **Advanced configuration:** .. code-block:: python # Advanced preprocessing configuration extremes = marEx.preprocess_data( sst, # Anomaly computation method method_anomaly='detrend_fixed_baseline', # or 'detrend_harmonic', 'fixed_baseline', 'shifting_baseline' detrend_order=[1, 2], smooth_days_baseline=21, # Smoothing for climatology # Extreme identification method method_extreme='hobday_extreme', # or 'global_extreme' threshold_percentile=95, # 95th percentile window_days_hobday=11, # Days around each day of year window_spatial_hobday=5, # Spatial window for percentile calculation # Output options dask_chunks={'time': 25} ) The resulting xarray dataset ``extremes`` will have the following structure & entries:: xarray.Dataset Dimensions: (lat, lon, time) Coordinates: lat (lat) lon (lon) time (time) Data variables: dat_anomaly (time, lat, lon) float64 dask.array mask (lat, lon) bool dask.array extreme_events (time, lat, lon) bool dask.array thresholds (dayofyear, lat, lon) float64 dask.array where: * ``dat_anomaly`` (time, lat, lon): Anomaly data * ``extreme_events`` (time, lat, lon): Binary field locating extreme events (1=event, 0=background) * ``thresholds`` (dayofyear, lat, lon): Extreme event thresholds used to determine extreme events * ``mask`` (lat, lon): Valid data mask See ``./examples/unstructured data/01_preprocess_extremes.ipynb`` in :doc:`examples` for a detailed example of pre-processing on an *unstructured* grid. Step 2: Event Tracking ---------------------- The tracking step identifies coherent extreme events and follows them through time. **Basic tracking:** .. code-block:: python tracked_events = marEx.tracker( extremes.extreme_events, extremes.mask, area_filter_absolute=100 # Remove objects smaller than 100 grid cells R_fill=8, # Radius for filling gaps (in grid cells) ).run() **Advanced tracking options:** .. code-block:: python # Initialise advanced tracker tracker = marEx.tracker( extremes.extreme_events, extremes.mask, # Temporal criteria T_fill=4, # Fill gaps up to 4 days (to maintain continuous events) # Spatial criteria R_fill=8, # Fill small holes with radius up to 8 grid cells area_filter_quartile=0.5, # Remove smallest 50% of events (alternative to area_filter_absolute) cell_areas=grid_areas, # Optional: physical cell areas for structured grids (m²) # Merging criteria allow_merging=True, # Allow merging of events (and keep track of merged IDs & events) overlap_threshold=0.5, # 50% overlap for merging (otherwise events keep independent IDs) nn_partitioning=True, # Use nearest-neighbour partitioning when splitting events ) # Run tracking and return merging data tracked_events, merge_events = tracker.run(return_merges=True) The resulting xarray dataset ``tracked_events`` will have the following structure & entries:: xarray.Dataset Dimensions: (lat, lon, time, ID, component, sibling_ID) Coordinates: lat (lat) lon (lon) time (time) ID (ID) Data variables: ID_field (time, lat, lon) int32 dask.array global_ID (time, ID) int32 ndarray area (time, ID) float32 ndarray centroid (component, time, ID) float32 ndarray presence (time, ID) bool ndarray time_start (ID) datetime64 ndarray time_end (ID) datetime64 ndarray merge_ledger (time, ID, sibling_ID) int32 ndarray where: * ``ID_field``: Field containing the IDs of tracked events (0=background) * ``global_ID``: Unique global ID of each object; ``global_ID.sel(ID=10)`` maps event ID 10 to its original ID at each time * ``area``: Area of each event as a function of time (in units of cell counts, or physical units if cell_areas/grid_resolution provided) * ``centroid``: (x, y) centroid coordinates of each event as a function of time * ``presence``: Presence (boolean) of each event at each time (anywhere in space) * ``time_start``: Start time of each event * ``time_end``: End time of each event * ``merge_ledger``: Sibling IDs for merging events (matching ``ID_field``); ``-1`` indicates no merging event occurred When running with ``return_merges=True``, the resulting xarray dataset ``merge_events`` will have the following structure & entries:: xarray.Dataset Dimensions: (merge_ID, parent_idx, child_idx) Data variables: parent_IDs (merge_ID, parent_idx) int32 ndarray child_IDs (merge_ID, child_idx) int32 ndarray overlap_areas (merge_ID, parent_idx) int32 ndarray merge_time (merge_ID) datetime64 ndarray n_parents (merge_ID) int8 ndarray n_children (merge_ID) int8 ndarray where: * ``parent_IDs``: Original parent IDs of each merging event * ``child_IDs``: Original child IDs of each merging event * ``overlap_areas``: Area of overlap between parent and child objects in each merging event * ``merge_time``: Time of each merging event * ``n_parents``: Number of parent objects in each merging event * ``n_children``: Number of child objects in each merging event See ``./examples/unstructured data/02_id_track_events.ipynb`` in :doc:`examples` for a detailed example of identification, tracking, & merging on an *unstructured* grid. Step 3: Visualisation --------------------- marEx includes a powerful visualisation system called ``plotX`` that automatically detects data types and creates appropriate plots. **Basic visualisation:** .. code-block:: python # Plot global extreme event frequency event_frequency = (tracked_events.ID_field > 0).mean("time") # Configure plot appearance config = marEx.PlotConfig( var_units="MHW Frequency", cmap="hot_r", cperc=[0, 96], grid_labels=True ) # Create single plot fig, ax, im = event_frequency.plotX.single_plot(config) **Advanced visualisation:** .. code-block:: python # Multi-panel visualisation: seasonal extreme event frequency seasonal_frequency = (tracked_events.ID_field > 0).groupby("time.season").mean(dim="time") # Configure plot appearance config = marEx.PlotConfig( var_units="MHW Frequency", cmap="hot_r", cperc=[0, 96], grid_labels=True ) # Create multi-panel plot fig, ax = seasonal_frequency.plotX.multi_plot(config, col="season", col_wrap=2) # Create animation of tracked events ID_field_subset = tracked_events.ID_field.sel(time=slice("2020-01-01", "2022-05-31")) config = marEx.PlotConfig(plot_IDs=True) ID_field_subset.plotX.animate(config, plot_dir=plot_dir, file_name="marine_heatwaves") # Plot consecutive time periods ID_field_subset = tracked_events.ID_field.sel(time=slice("2021-01-01", "2021-01-06")) config = marEx.PlotConfig(plot_IDs=True) fig, ax = ID_field_subset.plotX.multi_plot(config, col="time", col_wrap=3) Method Selection Guide ====================== This section provides practical guidance for selecting appropriate methods. For foundations and definitions, see :doc:`concepts`. Choosing the Right Anomaly Method ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Use this decision tree to select the most appropriate anomaly calculation method for your research: .. code-block:: text ┌─────────────────────────────────────────────────────────────────────┐ │ Anomaly Method Selection Decision Tree │ ├─────────────────────────────────────────────────────────────────────┤ │ │ │ Need full time series? ──No──> SHIFTING BASELINE │ │ │ (most accurate, shortens data by │ │ Yes window_year_baseline years) │ │ │ │ │ ├─> Remove trends? ──No──> FIXED BASELINE │ │ │ │ (keeps trends in anomaly) │ │ │ Yes │ │ │ │ │ │ │ └──> Need efficiency? ──Yes──> DETREND HARMONIC │ │ │ │ (fast, biased) │ │ │ No │ │ │ │ │ │ │ └──> DETREND FIXED BASELINE │ │ │ (accurate detrending) │ └─────────────────────────────────────────────────────────────────────┘ Anomaly Methods --------------- **Harmonic Detrending** (``detrend_harmonic``): * Detrends with an OLS 6+ coefficient model (mean, annual & semi-annual harmonics, and arbitrary polynomial trends) * **Best for**: Datasets with linear trends, operational monitoring * **Pros**: Fast & memory efficient * **Cons**: Does not capture phenological shifts and non-harmonic seasonal variability. Strongly biases certain statistics. * **Use when**: Real-time processing **Fixed Baseline** (``fixed_baseline``): * Removes the daily climatology (calculated in a ``reference_period``) -- does not remove climate trends * **Best for**: Simple anomaly calculation without detrending * **Pros**: Straightforward interpretation, preserves long-term trends * **Cons**: Does not account for climate change trends, seasonal timing shifts * **Use when**: Baseline comparison studies, trend-inclusive analysis, public outreach **Detrend Fixed Baseline** (``detrend_fixed_baseline``): * Polynomial detrending followed by fixed daily climatology -- keeps full time-series of data, but does not account for trends in the timing of seasonal transitions * Also supports calculating the climatology only within a ``reference_period`` (detrending still uses all data) * **Best for**: Studies requiring detrending but maintaining full temporal data coverage * **Pros**: Removes long-term trends while preserving seasonal cycles, maintains full time series * **Cons**: Does not account for changes in seasonal timing or seasonality * **Use when**: Climate variability studies with trend removal **Shifting Baseline** (``shifting_baseline``): * Removes seasonal & long-term trends using a smoothed rolling climatology * **Best for**: Climate change studies, non-stationary data * **Pros**: Captures non-linear trends, adapts to changing climate, and seasonal timing variability * **Cons**: Computationally expensive, shortens effective time series * **Use when**: Long-term climate studies, accurate & robust analysis Extreme Identification Methods ------------------------------ **Global Extreme** (``global_extreme``): * Applies a global-in-time (i.e. constant in time) threshold N.B.: This method is a hack designed for ``ocetrac``, which when paired with ``std_normalise=True`` can approximate ``hobday_extreme`` in very specific cases and with a number of caveats. (However, normalising by local STD is again memory-intensive, at which point there is little gained by this approximate method.) * **Best for**: Simple threshold, constant across seasons * **Pros**: Simple interpretation, fast computation (without ``std_normalise``) * **Cons**: Seasonal bias, may miss winter extremes, variability distribution is skewed * **Use when**: Initial analysis, 0th order comparisons **Hobday Method** (``hobday_extreme``): * Defines a local day-of-year specific threshold within a rolling window (equivalent to the Hobday et al. (2016) definition for simple time-series) * **Best for**: Long-term scientific climate studies, seasonal studies, biological impacts * **Pros**: Accounts for seasonal variability, literature standard * **Cons**: Complex threshold interpretation, computationally intensive * **Use when**: Ecological studies, seasonal analysis, literature comparison Spatial Window for Hobday Extreme (``window_spatial_hobday``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **New in v3.0+**: The Hobday extreme method now supports optional spatial windowing to create more statistically robust thresholds. .. code-block:: python # Traditional Hobday (temporal window only) extremes_traditional = marEx.preprocess_data( sst, method_extreme='hobday_extreme', window_days_hobday=11, # 11-day temporal window window_spatial_hobday=None # No spatial window ) # With spatial windowing (recommended for most applications) extremes_window = marEx.preprocess_data( sst, method_extreme='hobday_extreme', window_days_hobday=11, # 11-day temporal window window_spatial_hobday=5 # 5×5 spatial window (marEx default; use 1 for strict Hobday et al. 2016 definition for time-series) ) **How Spatial Windowing Works**:: Traditional Hobday: With Spatial Window (5×5): Single cell samples 25 cells × 11 days = 275 samples (lat, lon) ──> 11 days ┌───┬───┬───┬───┬───┐ │ │ │ │ │ │ Only temporal pooling ├───┼───┼───┼───┼───┤ │ │ │ │ │ │ ├───┼───┼─●─┼───┼───┤ Central cell │ │ │ │ │ │ ├───┼───┼───┼───┼───┤ │ │ │ │ │ │ └───┴───┴───┴───┴───┘ Spatial + temporal pooling **Benefits**: * **Spatially coherent thresholds**: Reduces noise from individual grid cells * **More robust statistics**: Larger sample size for robust percentile calculation **Limitations**: * **Structured grids only**: Not supported for unstructured (irregular) grids * **Requires approximate method**: Only works with ``method_percentile='approximate'`` **When to use**: * Short time-series of data * High percentile thresholds defining the extremes * Noisy SST fields (e.g., satellite with gaps) * Default (``window_spatial_hobday=5``) works well for most cases **Example: Comparing threshold noise**: .. code-block:: python # Compare threshold variability threshold_std_traditional = extremes_traditional.thresholds.std(dim='dayofyear').mean() threshold_std_smooth = extremes_smooth.thresholds.std(dim='dayofyear').mean() print(f"Traditional threshold noise: {threshold_std_traditional.compute():.4f}") print(f"Windowed threshold noise: {threshold_std_window.compute():.4f}") Threshold Percentiles --------------------- * **90th percentile**: More events, captures moderate extremes * **95th percentile**: Standard for marine heatwaves, balanced approach * **99th percentile**: Only most extreme events, rare events focus Performance Optimisation ======================== This section provides guidance for optimising marEx performance across different computing environments and dataset sizes. Memory Management and Chunking ------------------------------- Chunking Strategy ~~~~~~~~~~~~~~~~~ Optimal chunking strategies vary by dataset size and operation type: .. code-block:: python # Small datasets (< 10GB): Chunk time dimension sst = sst.chunk({'time': 365, 'lat': -1, 'lon': -1}) # Large datasets (> 100GB): Chunk all dimensions sst = sst.chunk({'time': 365, 'lat': 180, 'lon': 360}) # For tracking: Use smaller time chunks sst = sst.chunk({'time': 25, 'lat': -1, 'lon': -1}) # Recommended for tracker Memory Requirements Estimation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :header-rows: 1 :widths: 20 20 20 40 * - Operation - Memory (per core) - Optimal Time Chunk - Notes * - ``preprocess_data`` - 2-4 GB - 365+ days - Depends on method_anomaly * - ``tracker`` - 1-2 GB - 25-50 days - Keep spatial dims unbounded (-1) * - Exact percentiles - 8-16 GB - Full time series - Requires careful chunking * - Approximate percentiles - 1-2 GB - Any - Memory efficient Exact vs Approximate Percentile Methods ---------------------------------------- When to Use Each Method ~~~~~~~~~~~~~~~~~~~~~~~ **APPROXIMATE (default)**: For large datasets .. code-block:: python extremes = marEx.preprocess_data( sst, method_percentile='approximate', precision=0.01, # ~0.01°C bins max_anomaly=5.0, # Histogram range ±5°C dask_chunks={'time': 365} # Any chunking works ) # ✓ Memory efficient (1-2 GB/core) # ✓ Fast parallel computation # ✓ ~0.01°C precision (sufficient for most studies) **EXACT**: For small datasets or high precision needs .. code-block:: python extremes_exact = marEx.preprocess_data( sst, method_percentile='exact', dask_chunks={'time': -1, 'lat': 100, 'lon': 100} # Careful chunking required! ) # ⚠ High memory (8-16 GB/core), depending on time-series length # ⚠ Slower computation # ✓ Mathematically exact percentiles JAX Acceleration ---------------- When JAX Helps Most ~~~~~~~~~~~~~~~~~~~ .. code-block:: python import marEx # Check JAX availability if marEx.has_dependency('jax'): print("JAX acceleration active") # Best for: # - Harmonic detrending (10-50× speedup) # - Large spatial operations # - GPU/TPU available systems else: print("Install JAX: pip install marEx[full]") # Falls back to NumPy (still fast with Numba JIT) Performance Gains ~~~~~~~~~~~~~~~~~ .. list-table:: :header-rows: 1 :widths: 25 25 25 25 * - Operation - NumPy+Numba - JAX (CPU) - JAX (GPU) * - Harmonic detrend - 1× (baseline) - 15× faster - 50× faster * - Percentile calc - 1× - 1× (similar) - 2-3× faster * - Tracking - 1× - 1× (similar) - N/A HPC Cluster Setup ----------------- SLURM Distributed Cluster ~~~~~~~~~~~~~~~~~~~~~~~~~~ For supercomputers: .. code-block:: python import marEx # Start distributed Dask cluster on SLURM client = marEx.helper.start_distributed_cluster( n_workers=64, # Number of workers cores_per_worker=4, # Cores per worker memory_per_worker='8GB' # Memory per worker ) # Get dashboard URL for monitoring marEx.helper.get_cluster_info(client) # Run analysis extremes = marEx.preprocess_data(sst, ...) events = marEx.tracker(extremes.extreme_events, ...).run() # Clean up client.close() Local Cluster ~~~~~~~~~~~~~ For workstations: .. code-block:: python # Auto-detect system resources client = marEx.helper.start_local_cluster( n_workers=8, # Leave some cores for system threads_per_worker=2, memory_limit='16GB' # Per worker ) Example Batch Scripts ~~~~~~~~~~~~~~~~~~~~~ The ``examples/batch jobs/`` directory provides ready-to-use scripts designed to be run from a login node on HPC systems with SLURM: **Workflow:** 1. **run_detect.py**: Launches a distributed Dask job to execute the preprocessing and extreme detection pipeline 2. **run_track.py**: Launches a distributed Dask job for event identification and tracking Both scripts utilise ``marEx.helper.start_distributed_cluster()`` for cluster management and are configured via environment variables. **Configuration Environment Variables:** * ``DASK_N_WORKERS``: Number of Dask workers to launch (default varies by script: 128 for detect, 32 for track) * ``DASK_WORKERS_PER_NODE``: Workers per SLURM node (default varies by script: 64 for detect, 32 for track) * ``DASK_RUNTIME``: Maximum runtime in minutes (default: 39 for detect, 89 for track) * ``SLURM_ACCOUNT``: SLURM account for billing (default: 'bk1377') * **Additional for run_track.py**: ``RUN_BASIC_TRACKER``, ``GRID_RESOLUTION``, ``AREA_FILTER``, ``R_FILL``, ``T_FILL``, ``OVERLAP_THRESHOLD`` **Example Usage:** .. code-block:: bash # Run preprocessing on SLURM cluster export DASK_N_WORKERS=256 export DASK_RUNTIME=120 export SLURM_ACCOUNT=my_project python examples/batch\ jobs/run_detect.py # Run tracking after preprocessing completes export DASK_N_WORKERS=64 export DASK_RUNTIME=180 python examples/batch\ jobs/run_track.py **Customisation:** These scripts are designed to be copied and modified for your specific: * Data file paths and chunk sizes * Preprocessing parameters (anomaly method, thresholds) * Tracking parameters (spatial filters, merge/split settings) * Cluster resources (workers, memory, runtime) The scripts handle cluster setup, data processing, and saving results to zarr/netCDF files on the scratch filesystem. Checkpointing for Large Datasets ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When processing very large datasets on HPC systems, Dask may recompute intermediate results multiple times, especially for complex operations like 2D histogram-based percentile calculations. This can significantly increase computation time and memory usage. **The Problem:** Dask builds a computation graph that tracks all operations. For large preprocessing pipelines, this graph can become deeply nested, causing Dask to recompute expensive intermediate steps (climatologies, anomalies, thresholds) whenever downstream operations need them. **The Solution:** The ``use_temp_checkpoints=True`` parameter breaks the Dask computation graph by saving intermediate results to temporary zarr stores and immediately reloading them. This prevents expensive recomputations at the cost of some disk I/O. **How It Works:** 1. Intermediate arrays (anomalies, climatologies, thresholds, extremes) are saved to temporary zarr files 2. The saved data is immediately reloaded as a fresh Dask array 3. This breaks the dependency chain in the computation graph 4. Temporary files are automatically cleaned up after reloading **When to Use:** * Large datasets (>100 GB) where preprocessing takes hours * HPC environments with fast scratch storage * When 2D histogram percentile calculations are a bottleneck * When you notice Dask recomputing the same operations multiple times **Example Usage:** .. code-block:: python import xarray as xr import marEx # Load large dataset sst = xr.open_zarr('large_dataset.zarr', chunks={'time': 30}).sst # Enable checkpointing to prevent expensive recomputations extremes = marEx.preprocess_data( sst, method_anomaly='shifting_baseline', method_extreme='hobday_extreme', threshold_percentile=95, use_temp_checkpoints=True, # Enable checkpointing dask_chunks={'time': 25} ) **Performance Trade-offs:** * **Benefit**: Prevents expensive recomputations, reduces memory pressure * **Cost**: Requires fast disk I/O for temporary zarr stores * **Recommendation**: Enable on HPC systems with high-speed scratch storage; test on your specific dataset to measure performance impact **Note:** Temporary files are automatically removed after data is reloaded, so no manual cleanup is required. Grid Types and Coordinate Systems ================================= Structured/Gridded Data ----------------------- Regular latitude-longitude grids are the most common data type: .. code-block:: python # Typical structured data print(sst.dims) # e.g. ('time', 'lat', 'lon') print(sst.coords) # e.g. ('time', 'lat', 'lon') # marEx automatically detects structured grids processed = marEx.preprocess_data(sst) Unstructured Data ----------------- Irregular meshes common in coastal and global ocean models: .. code-block:: python # Typical unstructured data (e.g., ICON model) print(sst.dims) # e.g. ('time', 'ncells') print(sst.coords) # e.g. ('time', 'lat', 'lon') as coordinates, not dims # Specify grid configuration marEx.specify_grid( grid_type='unstructured', fpath_tgrid='grid_info.nc', # Grid topology file fpath_ckdtree='kdtree.pkl' # Spatial index (optional) ) **Requirements:** * Spatial dimension name (``ncells``, ``cell``, etc.) * Latitude/longitude as coordinate arrays * Optional: Grid topology information for advanced/unstructured features Advanced Features ================= Compound Events --------------- Analyse events that exceed multiple thresholds or variables: .. code-block:: python # Multiple variable analysis sst_extremes = marEx.preprocess_data(sst, threshold_percentile=95) salinity_extremes = marEx.preprocess_data(salinity, threshold_percentile=5) # Low salinity # Compound events compound_events = sst_extremes.extreme_events & salinity_extremes.extreme_events Best Practices and Guidelines ============================= Research Workflow ----------------- 1. **Exploratory Analysis**: Start with basic preprocessing to understand data 2. **Method Comparison**: Test different methods on subset of data 3. **Quality Control**: Validate results thoroughly 4. **Full Processing**: Apply chosen method to complete dataset 5. **Validation**: Compare with known events and literature Literature Compliance -------------------- For marine heatwave studies following Hobday et al. (2016): .. code-block:: python # Standard MHW definition mhw_config = { 'method_anomaly': 'shifting_baseline', 'method_extreme': 'hobday_extreme', 'threshold_percentile': 90, 'window_days_hobday': 11, 'window_year_baseline': 30, 'smooth_days_baseline': 11, 'window_spatial_hobday': 1, # Hobday et al. (2016) considers only single points } Getting Help and Support ======================== For troubleshooting common issues, diagnostic checklists, and support resources, see :doc:`troubleshooting`. Citations and References ======================== When using marEx in publications, please cite: * **marEx package** * **Marine heatwave methods**: Hobday et al. (2016) for standard MHW definition For more detailed examples and advanced usage, see the :doc:`examples`.