Tracking Module (marEx.track)
The marEx.track module provides sophisticated algorithms for identifying and tracking
extreme events through time. It handles complex scenarios including event merging, splitting,
and temporal evolution using advanced image processing and graph theory techniques.
Overview
The tracking module implements a comprehensive workflow for converting binary extreme event masks into tracked event objects with unique identifiers, statistical properties, and temporal evolution information. It includes advanced algorithms for handling event lifecycles, including merging and splitting events whilst maintaining unique identities.
Key Features:
Binary Object Tracking: Advanced connected component analysis with dask parallelisation
Merge/Split Handling: Algorithms for event lifecycle management
Morphological Operations: Image processing for event preprocessing and hole filling
Statistical Analysis: Comprehensive event property calculation
Memory Efficient: Optimised for large datasets with intelligent chunking
Grid Agnostic: Works with both structured and unstructured grids
Main Classes
|
Tracker identifies and tracks arbitrary binary objects in spatial data through time. |
Tracking Algorithm Comparison
The video below demonstrates the critical difference between basic connectivity-based tracking and marEx’s advanced merge/split algorithm:
Left (Basic Method — 3D Connected Components): Uses simple 3D connected component labeling where ANY objects that touch at any point in space-time are permanently merged into the same event. This creates a chain reaction: Event A touches Event B → merged. The merged AB touches Event C → all become one event. Over time, this produces unrealistic basin-spanning “mega-events” that spuriously link dozens of independent physical phenomena. These mega-events are algorithmically-induced artifacts with no coherent physical origin, making their statistics (size, duration, intensity) meaningless for mechanistic analysis.
Right (marEx Advanced Method — Genealogy Tracking): Prevents mega-events using overlap_threshold (requires significant overlap, not just touching) and maintains individual event identities through merge/split events using nearest-neighbor partitioning. Records complete parent/child genealogy in merge_ledger, enabling reconstruction of physically realistic event evolution. Tracked events correspond to actual coherent phenomena, providing mechanistically relevant statistics for scientific analysis.
Detailed Documentation
Tracker Class
- class marEx.track.tracker(data_bin, mask, R_fill, area_filter_quartile=None, area_filter_absolute=None, temp_dir=None, T_fill=2, allow_merging=True, nn_partitioning=False, overlap_threshold=0.5, unstructured_grid=False, dimensions=None, coordinates=None, neighbours=None, cell_areas=None, grid_resolution=None, max_iteration=40, checkpoint=None, debug=0, verbose=None, quiet=None, regional_mode=False, coordinate_units=None)[source]
Bases:
objectTracker identifies and tracks arbitrary binary objects in spatial data through time.
The tracker supports both structured (regular grid) and unstructured data, and seamlessly handles splitting & merging of objects. It identifies connected regions in binary data at each time step, and tracks these as evolving events through time.
Main workflow:
Preprocessing: Fill spatiotemporal holes, filter small objects
Object identification: Label connected components at each time
Tracking: Determine object correspondences across time
Optional splitting & merging: Handle complex event evolution
- Parameters:
data_bin (xarray.DataArray) – Binary field of extreme points to group, label, and track (True = object, False = background) Must represent and underlying dask array.
mask (xarray.DataArray) – Binary mask indicating valid regions (True = valid, False = invalid)
R_fill (int) – The radius of the kernel used in morphological opening & closing, relating to the largest hole/gap that can be filled. In units of grid cells.
area_filter_quartile (float, optional) – The fraction of the smallest objects to discard, i.e. the quantile defining the smallest area object retained. Quantile must be in (0-1) (e.g., 0.25 removes smallest 25%). Mutually exclusive with area_filter_absolute. Default is 0.5 if neither parameter is provided.
area_filter_absolute (int, optional) – The minimum area (in grid cells) for an object to be retained. Mutually exclusive with area_filter_quartile. Use this for fixed minimum area thresholds (e.g., 10 cells minimum).
temp_dir (str, optional) – Path to temporary directory for storing intermediate results
T_fill (int, default=2) – The permissible temporal gap (in days) between objects for tracking continuity to be maintained (must be even)
allow_merging (bool, default=True) – Allow objects to split and merge across time. Apply splitting & merging criteria, track merge events, and maintain original identities of merged objects across time. N.B.: False reverts to classical ndmeasure.label with simplar time connectivity, i.e. Scannell et al.
nn_partitioning (bool, default=False) – Implement a better partitioning of merged child objects based on closest parent cell. False reverts to using parent centroids to determine partitioning between new child objects, i.e. Di Sun & Bohai Zhang 2023. N.B.: Centroid-based partitioning has major problems with small merging objects suddenly obtaining unrealistically-large (and often disjoint) fractions of the larger object.
overlap_threshold (float, default=0.5) – The fraction of the smaller object’s area that must overlap with the larger object’s area to be considered the same event and continue tracking with the same ID.
unstructured_grid (bool, default=False) – Whether data is on an unstructured grid
dimensions (dict, default={"time": "time", "x": "lon", "y": "lat"}) – Mapping of dimensions to names in the data
coordinates (dict, optional) – Coordinate names for unstructured grids. Should contain ‘x’ and ‘y’ keys for x and y coordinates. May also contain ‘time’ if the time coordinate name is different from the dimension name.
neighbours (xarray.DataArray, optional) – For unstructured grid, indicates connectivity between cells
cell_areas (xarray.DataArray, optional) – For unstructured grid, area of each cell (required). For structured grid, area of each cell (optional). If not provided, defaults to 1.0 for each cell (resulting in cell counts as areas). Note: Overridden by grid_resolution if provided for structured grids.
grid_resolution (float, optional) – Grid resolution in degrees for structured grids only (ignored for unstructured grids). When provided, automatically calculates cell areas using spherical geometry. Overrides any provided cell_areas parameter.
max_iteration (int, default=40) – Maximum number of iterations for merging/splitting algorithm
checkpoint (str, default='None') – Checkpoint strategy (‘save’, ‘load’, or None)
debug (int, default=0) – Debug level (0-2)
verbose (bool, optional) – Enable verbose logging with detailed progress information. If None, uses current global logging configuration.
quiet (bool, optional) – Enable quiet logging with minimal output (warnings and errors only). If None, uses current global logging configuration. Note: quiet takes precedence over verbose if both are True.
regional_mode (bool, default=False) – Enable regional mode for non-global coordinate ranges. When True, coordinate_units must be specified.
coordinate_units (str, optional) – Coordinate units when regional_mode=True. Must be either ‘degrees’ or ‘radians’.
Examples
Basic tracking of marine heatwave events from preprocessed data:
>>> import xarray as xr >>> import marEx >>> >>> # Load preprocessed extreme events data >>> processed = xr.open_dataset('extreme_events.nc', chunks={}) >>> extreme_events = processed.extreme_events # Boolean array >>> mask = processed.mask # Ocean/land mask >>> >>> # Initialise tracker with basic parameters >>> tracker = marEx.tracker( ... extreme_events, ... mask, ... R_fill=8, # Fill holes up to 8 grid cells ... area_filter_quartile=0.5 # Remove smallest 50% of objects ... allow_merging=False # Basic tracking without splitting/merging ... ) >>> >>> # Run tracking algorithm >>> events = tracker.run() >>> print(f"Identified {events.ID.max().compute()} distinct events") Identified 1247 distinct events
Using automatic grid area calculation from resolution:
>>> # For regular lat/lon grids, automatically calculate physical areas >>> grid_tracker = marEx.tracker( ... extreme_events, ... mask, ... R_fill=8, ... area_filter_quartile=0.5, ... grid_resolution=0.25 # Grid resolution in degrees ... ) >>> # Cell areas are calculated automatically using spherical geometry >>> grid_events = grid_tracker.run()
Advanced tracking with merging and splitting enabled:
>>> # More sophisticated tracking with temporal gap filling >>> advanced_tracker = marEx.tracker( ... extreme_events, ... mask, ... R_fill=12, # Larger spatial gap filling ... T_fill=4, # Fill up to 4-day temporal gaps ... area_filter_quartile=0.25, # More aggressive size filtering ... allow_merging=True, # Enable split/merge detection ... overlap_threshold=0.3 # Lower threshold for object linking ... ) >>> >>> events_advanced, merges_log = advanced_tracker.run(return_merges=True) >>> print(events_advanced.data_vars) Data variables: event (time, lat, lon) int32 dask.array<chunksize=(25, 180, 360)> event_centroid (time, lat, lon) int32 dask.array<chunksize=(25, 180, 360)> ID_field (time, lat, lon) int32 dask.array<chunksize=(25, 180, 360)> global_ID (time, ID) int32 dask.array<chunksize=(25, 1247)> area (time, ID) float32 dask.array<chunksize=(25, 1247)> centroid (component, time, ID) float64 dask.array<chunksize=(2, 25, 1247)> presence (time, ID) bool dask.array<chunksize=(25, 1247)> time_start (ID) datetime64[ns] dask.array<chunksize=(1247,)> time_end (ID) datetime64[ns] dask.array<chunksize=(1247,)> merge_ledger (time, ID, sibling_ID) int32 dask.array<chunksize=(25, 1247, 10)>
Processing unstructured ocean model data (ICON):
>>> # Load ICON ocean model data with connectivity >>> icon_data = xr.open_dataset('icon_extremes.nc', chunks={}) >>> icon_extremes = icon_data.extreme_events # (time, ncells) >>> icon_mask = icon_data.mask >>> neighbours = icon_data.neighbours # Cell connectivity >>> cell_areas = icon_data.cell_areas # Physical areas >>> >>> # Track events on unstructured grid >>> unstructured_tracker = marEx.tracker( ... icon_extremes, ... icon_mask, ... R_fill=5, # 5-neighbor radius for gap filling ... area_filter_quartile=0.6, # Remove 60% of smallest events ... unstructured_grid=True, # Enable unstructured mode ... dimensions={"x": "ncells"}, # Must specify the name of the spatial dimension ... coordinates={"x": "lon", "y": "lat"}, # Spatial coordinate names ... neighbours=neighbours, # Required for unstructured ... cell_areas=cell_areas # Required for area calculations ... ) >>> unstructured_events = unstructured_tracker.run()
Memory management and checkpointing for large datasets:
>>> # Use checkpointing for very large datasets >>> large_tracker = marEx.tracker( ... extreme_events, ... mask, ... R_fill=8, ... area_filter_quartile=0.5, ... temp_dir='/scratch/user/tracking_temp', # Temporary storage ... checkpoint='save' # Save intermediate results ... ) >>> # Processing can be resumed if interrupted >>> large_events = large_tracker.run()
Comparing different filtering strategies:
>>> # Conservative filtering - keep more events >>> conservative = marEx.tracker( ... extreme_events, mask, R_fill=5, area_filter_quartile=0.1 ... ) >>> conservative_events = conservative.run() >>> >>> # Aggressive filtering - focus on largest events >>> aggressive = marEx.tracker( ... extreme_events, mask, R_fill=15, area_filter_quartile=0.8 ... ) >>> aggressive_events = aggressive.run() >>> >>> print(f"Conservative: {conservative_events.ID.max().compute()} events") >>> print(f"Aggressive: {aggressive_events.ID.max().compute()} events")
Using absolute area filtering instead of percentile-based:
>>> # Filter objects smaller than 25 grid cells >>> absolute_tracker = marEx.tracker( ... extreme_events, mask, R_fill=8, area_filter_absolute=25 ... ) >>> absolute_events = absolute_tracker.run() >>> >>> # Default behavior (area_filter_quartile=0.5) when no parameters provided >>> default_tracker = marEx.tracker(extreme_events, mask, R_fill=8) >>> default_events = default_tracker.run() # Uses quartile=0.5 filtering
Using physical cell areas for structured grids:
>>> # Load data with irregular grid cell areas >>> grid_areas = xr.open_dataset('grid_areas.nc').cell_area # (lat, lon) in m² >>> >>> # Track events using physical areas instead of cell counts >>> physical_tracker = marEx.tracker( ... extreme_events, ... mask, ... R_fill=8, ... area_filter_quartile=0.5, ... cell_areas=grid_areas # Physical areas in m² ... ) >>> events = physical_tracker.run() >>> # Now events.area contains physical areas in m² instead of cell counts
Integration with full marEx workflow:
>>> # Complete workflow from raw data to tracked events >>> raw_sst = xr.open_dataset('sst_data.nc', chunks={}).sst.chunk({'time': 30}) >>> >>> # Step 1: Preprocess to identify extremes >>> processed = marEx.preprocess_data(raw_sst, threshold_percentile=95) >>> >>> # Step 2: Track extreme events >>> tracker = marEx.tracker( ... processed.extreme_events, ... processed.mask, ... R_fill=8, ... area_filter_quartile=0.5 ... ) >>> tracked_events = tracker.run()
Initialise the tracker with parameters and data.
- __init__(data_bin, mask, R_fill, area_filter_quartile=None, area_filter_absolute=None, temp_dir=None, T_fill=2, allow_merging=True, nn_partitioning=False, overlap_threshold=0.5, unstructured_grid=False, dimensions=None, coordinates=None, neighbours=None, cell_areas=None, grid_resolution=None, max_iteration=40, checkpoint=None, debug=0, verbose=None, quiet=None, regional_mode=False, coordinate_units=None)[source]
Initialise the tracker with parameters and data.
- Parameters:
data_bin (DataArray)
mask (DataArray)
area_filter_quartile (float | None)
area_filter_absolute (int | None)
temp_dir (str | None)
T_fill (int)
allow_merging (bool)
nn_partitioning (bool)
overlap_threshold (float)
unstructured_grid (bool)
neighbours (DataArray | None)
cell_areas (DataArray | None)
grid_resolution (float | None)
max_iteration (int)
checkpoint (Literal['save', 'load', 'None'] | None)
debug (int)
verbose (bool | None)
quiet (bool | None)
regional_mode (bool)
coordinate_units (Literal['degrees', 'radians'] | None)
- Return type:
None
- run(return_merges=False, checkpoint=None)[source]
Run the complete object identification and tracking pipeline.
This method executes the full workflow:
Preprocessing: morphological operations and size filtering
Identification and tracking of objects through time
Computing and attaching statistics to the results
- Parameters:
- Returns:
events_ds (xarray.Dataset) – Dataset containing tracked events and their properties
merges_ds (xarray.Dataset, optional) – Dataset with merge event information (only if return_merges=True)
- Return type:
- run_preprocess(checkpoint=None)[source]
Preprocess binary data to prepare for tracking.
This performs morphological operations to fill holes/gaps in both space and time, then filters small objects according to the area_filter_quartile or area_filter_absolute.
- run_tracking(data_bin_preprocessed)[source]
Track objects through time to identify events.
- Parameters:
data_bin_preprocessed (xarray.DataArray) – Preprocessed binary data
- Returns:
events_ds (xarray.Dataset) – Dataset containing tracked events
merges_ds (xarray.Dataset) – Dataset with merge information
N_events_final (int) – Final number of unique events
- Return type:
- run_stats_attributes(events_ds, merges_ds, object_stats, N_events_final)[source]
Add statistics and attributes to the events dataset.
- Parameters:
events_ds (xarray.Dataset) – Dataset containing tracked events
merges_ds (xarray.Dataset) – Dataset with merge information
object_stats (tuple) – Preprocessed object statistics
N_events_final (int) – Final number of events
- Returns:
events_ds – Dataset with added statistics and attributes
- Return type:
- compute_area(data_bin)[source]
Compute the total area of binary data at each time.
- Parameters:
data_bin (xarray.DataArray) – Binary data
- Returns:
area – Total area at each time (units: pixels for structured grid, matching cell_area for unstructured)
- Return type:
- fill_holes(data_bin, R_fill=None)[source]
Fill holes and gaps using morphological operations.
This performs closing (dilation followed by erosion) to fill small gaps, then opening (erosion followed by dilation) to remove small isolated objects.
- Parameters:
data_bin (xarray.DataArray) – Binary data to process
R_fill (int, optional) – Fill radius override
- Returns:
data_bin_filled – Binary data with holes/gaps filled
- Return type:
- fill_time_gaps(data_bin)[source]
Fill temporal gaps between objects.
Performs binary closing (dilation then erosion) along the time dimension to fill small time gaps between objects.
- Parameters:
data_bin (xarray.DataArray) – Binary data to process
- Returns:
data_bin_filled – Binary data with temporal gaps filled
- Return type:
- refresh_dask_graph(data_bin)[source]
Clear and reset the Dask graph via save/load cycle.
This is needed to work around a memory leak bug in Dask where “Unmanaged Memory” builds up within loops.
- Parameters:
data_bin (xarray.DataArray) – Data to refresh
- Returns:
data_new – Data with fresh Dask graph
- Return type:
- filter_small_objects(data_bin)[source]
Remove objects smaller than a threshold area.
- Parameters:
data_bin (xarray.DataArray) – Binary data to filter
- Returns:
data_bin_filtered (xarray.DataArray) – Binary data with small objects removed
area_threshold (float) – Area threshold used for filtering
object_areas (xarray.DataArray) – Areas of all objects pre-filtering
N_objects_prefiltered (int) – Number of objects before filtering
N_objects_filtered (int) – Number of objects after filtering
- Return type:
- identify_objects(data_bin, time_connectivity)[source]
Identify connected regions in binary data.
- Parameters:
data_bin (xarray.DataArray) – Binary data to identify objects in
time_connectivity (bool) – Whether to connect objects across time
- Returns:
object_id_field (xarray.DataArray) – Field of integer IDs for each object
None (NoneType) – Placeholder for compatibility with track_objects
N_objects (int) – Number of objects identified
- Return type:
- calculate_centroid(binary_mask, original_centroid=None)[source]
Calculate object centroid, handling edge cases for periodic boundaries.
- Parameters:
binary_mask (numpy.ndarray) – 2D binary array where True indicates the object (dimensions are (y,x))
original_centroid (tuple, optional) – (y_centroid, x_centroid) from regionprops_table
- Returns:
(y_centroid, x_centroid)
- Return type:
- calculate_object_properties(object_id_field, properties=None)[source]
Calculate properties of objects from ID field.
- Parameters:
object_id_field (xarray.DataArray) – Field containing object IDs
properties (list, optional) – List of properties to calculate (defaults to [‘label’, ‘area’])
- Returns:
object_props – Dataset containing calculated properties with ‘ID’ dimension
- Return type:
- check_overlap_slice(ids_t0, ids_next)[source]
Find overlapping objects between two consecutive time slices.
- Parameters:
ids_t0 (numpy.ndarray) – Object IDs at current time
ids_next (numpy.ndarray) – Object IDs at next time
- Returns:
Array of shape (n_overlaps, 3) with [id_t0, id_next, overlap_area]
- Return type:
- find_overlapping_objects(object_id_field)[source]
Find all overlapping objects across time.
- Parameters:
object_id_field (xarray.DataArray) – Field containing object IDs
- Returns:
overlap_objects_list_unique_filtered – Array of object ID pairs that overlap across time, with overlap area The object in the first column precedes the second column in time. The third column contains:
For structured grid: number of overlapping pixels (int32)
For unstructured grid: total overlapping area in m^2 (float32)
- Return type:
(N x 3) numpy.ndarray
- enforce_overlap_threshold(overlap_objects_list, object_props)[source]
Filter object pairs based on overlap threshold.
- Parameters:
overlap_objects_list ((N x 3) numpy.ndarray) – Array of object ID pairs with overlap area
object_props (xarray.Dataset) – Object properties including area
- Returns:
overlap_objects_list_filtered – Filtered array of object ID pairs that meet the overlap threshold
- Return type:
(M x 3) numpy.ndarray
- consolidate_object_ids(data_t_minus_2, data_t_minus_1, object_props, timestep)[source]
Consolidate object IDs between t-2 and t-1 to ensure consistent tracking.
This identifies objects at t-1 that are actually continuations of objects from t-2 (but got different IDs due to partitioning) and renames them to maintain consistent IDs across timesteps.
- Parameters:
data_t_minus_2 (xr.DataArray) – Object field at timestep t-2
data_t_minus_1 (xr.DataArray) – Object field at timestep t-1 (will be modified)
object_props (xr.Dataset) – Object properties dataset (will be modified)
timestep (int) – Current timestep number for logging purposes
- Returns:
data_t_minus_1_consolidated (xr.DataArray) – Updated t-1 field with consolidated IDs
object_props_updated (xr.Dataset) – Updated object properties with merged/deleted objects
- Return type:
Notes
Uses self.overlap_threshold for determining consolidation eligibility
Updates object properties by recalculating for consolidated objects
Removes redundant child objects from object_props
- compute_id_time_dict(da, child_objects, max_objects, all_objects=True)[source]
Generate lookup table mapping object IDs to their time index.
- Parameters:
da (xarray.DataArray) – Field of object IDs
child_objects (list or array) – Object IDs to include in the dictionary
max_objects (int) – Maximum number of objects
all_objects (bool, default=True) – Whether to process all objects or just child_objects
- Returns:
time_index_map – Dictionary mapping object IDs to time indices
- Return type:
- track_objects(data_bin)[source]
Track objects through time to form events.
This is the main tracking method that handles splitting and merging of objects.
- Parameters:
data_bin (xarray.DataArray) – Preprocessed binary data: Field of globally unique integer IDs of each element in connected regions. ID = 0 indicates no object.
- Returns:
split_merged_events_ds (xarray.Dataset) – Dataset containing tracked events
merge_events (xarray.Dataset) – Dataset with merge information
N_events (int) – Final number of events
- Return type:
- cluster_rename_objects_and_props(object_id_field_unique, object_props, overlap_objects_list, merge_events)[source]
Cluster the object pairs and relabel to determine final event IDs.
- Parameters:
object_id_field_unique (xarray.DataArray) – Field of unique object IDs. IDs must not be repeated across time.
object_props (xarray.Dataset) – Properties of each object that also need to be relabeled.
overlap_objects_list ((N x 2) numpy.ndarray) – Array of object ID pairs that indicate which objects are in the same event. The object in the first column precedes the second column in time.
merge_events (xarray.Dataset) – Information about merge events
- Returns:
split_merged_events_ds – Dataset with relabeled events and their properties. ID = 0 indicates no object.
- Return type:
- split_and_merge_objects(object_id_field_unique, object_props)[source]
Implement object splitting and merging logic.
This identifies and processes cases where objects split or merge over time, creating new object IDs as needed.
- Parameters:
object_id_field_unique (xarray.DataArray) – Field of unique object IDs. IDs are required to be monotonically increasing with time.
object_props (xarray.Dataset) – Properties of each object
- Returns:
(object_id_field, object_props, overlap_objects_list, merge_events)
- Return type:
- split_and_merge_objects_parallel(object_id_field_unique, object_props)[source]
Optimised parallel implementation of object splitting and merging.
This version is specifically designed for unstructured grids with more efficient memory handling and better parallelism than the standard split_and_merge_objects method. It processes data in chunks, handles merging events, and efficiently updates object IDs.
- Parameters:
object_id_field_unique (xarray.DataArray) – Field of unique object IDs
object_props (xarray.Dataset) – Properties of each object
- Returns:
(object_id_field, object_props, overlap_objects_list, merge_events)
- Return type:
Basic Usage Examples
Simple Event Tracking
import xarray as xr
import marEx
# Load preprocessed extreme events data
extremes_ds = xr.open_dataset('extreme_events.nc')
# Basic tracking with default parameters
event_tracker = marEx.tracker(
extremes_ds.extreme_events, # Binary extreme events field
extremes_ds.mask, # Land-sea mask
R_fill=8, # Fill holes with radius < 8 cells
area_filter_absolute=100 # Remove objects smaller than 100 grid cells
)
# Run tracking algorithm
tracked_events = event_tracker.run()
Advanced Tracking Configuration
# Advanced tracking with merge/split handling
event_tracker = marEx.tracker(
extremes_ds.extreme_events,
extremes_ds.mask,
R_fill=8, # Fill holes with radius < 8 cells
area_filter_quartile=0.5, # Remove smallest 50% of events (alternative to area_filter_absolute)
T_fill=2, # Allow 2-day gaps in tracking
allow_merging=True, # Enable merge/split tracking
overlap_threshold=0.5, # 50% overlap required for continuity
nn_partitioning=True, # Use nearest-neighbor partitioning
cell_areas=grid_areas # Optional: physical cell areas (m²)
)
# Run tracking and get merge information
tracked_events, merges_ds = event_tracker.run(return_merges=True)
Unstructured Grid Tracking
# For unstructured grids, specify dimensions
dimensions = {'time': 'time', 'x': 'ncells'}
coordinates = {'time': 'time', 'x': 'lon', 'y': 'lat'}
event_tracker = marEx.tracker(
extremes_ds.extreme_events,
extremes_ds.mask,
R_fill=5, # Adjusted for unstructured grid
area_filter_quartile=0.4,
dimensions=dimensions,
coordinates=coordinates
)
tracked_events = event_tracker.run()
Parameter Reference
Required Parameters
- data_binxarray.DataArray
Binary data to identify and track objects in (True = event, False = background). Must be Dask-backed and boolean type.
- maskxarray.DataArray
Binary mask indicating valid regions (True = valid, False = invalid). Must be boolean type.
- R_fillint
Radius for filling holes/gaps in spatial domain (in grid cells). Controls morphological operations to fill small gaps in binary objects.
How R_fill Works:
The R_fill parameter controls a two-step morphological cleaning operation: closing followed by opening. This process makes identified events spatially coherent and removes small, isolated noise.
Physical Rationale:
Closing (Fill Holes): Marine heatwaves are generally coherent spatial phenomena. Small “holes” within a larger event are often artefacts of data gridding or minor fluctuations below the threshold. Closing (dilation → erosion) fills these internal gaps, making the event representation physically realistic.
Opening (Remove Noise): Very small, isolated pixels flagged as events are often statistical noise or sensor errors rather than genuine marine heatwaves. Opening (erosion → dilation) removes these spurious objects, ensuring only spatially significant events are tracked.
Visual Process (using
R_fill=1as example):Initial State After Closing After Opening ┌──────────────────────┐ ┌──────────────────────┐ ┌──────────────────────┐ │ █████ │ │ █████ │ │ █████ │ │ ███████ │ │ ███████ │ │ ███████ │ │ ███ █████ │ │ █████████ │ │ █████████ │ │ ███ ████ █ │ │ █████████ █ │ │ █████████ │ │ ████████ │ │ ████████ │ │ ████████ │ │ ███████ │ │ ███████ │ │ ███████ │ │ █████ │ │ █████ │ │ █████ │ │ █ │ │ █ │ │ │ └──────────────────────┘ └──────────────────────┘ └──────────────────────┘ │ │ │ └─ Main event with └─ Closing fills internal └─ Opening removes 2 small holes inside holes, making event isolated noise pixels, + 2 isolated noise pixels more coherent leaving clean event
Structuring Element (Kernel):
Operations use a disk-shaped kernel with diameter =
2 * R_fill + 1. ForR_fill=2, the kernel diameter is 5 pixels:┌──────────────┐ │ ██ │ Any hole or isolated object smaller │ ██ ██ ██ │ than this disk will be filled or removed │██ ██ ██ ██ ██│ │ ██ ██ ██ │ │ ██ │ └──────────────┘
Example:
R_fill=8uses a (circular) disk kernel with diameter 17 pixels, filling holes and removing objects up to ~8 grid cells in radius.When to Adjust:
Smaller R_fill (e.g., 3-5): Preserves small-scale features; use for coarse data or fine structures
Larger R_fill (e.g., 10-15): Creates more coherent objects; use for noisy data or large-scale events
- area_filter_quartilefloat
Quantile (0-1) for filtering smallest objects. For example, 0.25 removes smallest 25% of objects, 0.5 removes smallest 50%. Mutually exclusive with area_filter_absolute.
- area_filter_absoluteint
Minimum area (in grid cells) for an object to be retained. For example, 25 keeps only events with 25 or more grid cells. Mutually exclusive with area_filter_quartile.
Core Tracking Parameters
- T_fillint, default=2
Number of timesteps for filling temporal gaps (must be even). Allows events to be tracked across short temporal interruptions.
How T_fill Works:
T_fill = Maximum temporal gap (in days) for event continuity Timeline with T_fill=2: Day 0: ████████ ← Event detected Day 1: ← Gap (no detection) Day 2: ← Gap (no detection) } T_fill=2 days Day 3: ████████ ← Event detected again │ │ └────────┴─ SAME event ID (gap ≤ T_fill) Timeline with T_fill=0 (no gap filling): Day 0: ████████ ← Event ID: 1 Day 1: ← Gap Day 2: ████████ ← NEW event ID: 2 (no gap tolerance)Example:
T_fill=2allows 2-day gaps while maintaining the same event ID.When to Adjust:
T_fill=0: No gap filling, strict continuity requirement, or for weekly/monthly sampling
T_fill=2-4: Standard gap filling for daily data (recommended)
T_fill=6-10: More permissive for noisy data
- allow_mergingbool, default=True
Allow objects to split and merge across time.
True: Apply splitting & merging criteria, track merge events, and maintain original identitiesFalse: Classical connected component labeling with simple time connectivity
- nn_partitioningbool, default=True
Use nearest-neighbor cell-based partitioning for merging events (improved algorithm).
True: Partition merged child objects based on closest parent cell (recommended)False: Use parent centroids for partitioning (legacy method with known issues)
Why This Matters: When objects merge, their new area must be partitioned and assigned back to the original parent identities. The choice of algorithm for this partitioning is critical.
The
centroidmethod partitions based on proximity to parent centroids. This can be problematic for non-convex or L-shaped features, where the centroid may lie far from the feature’s cells, leading to physically unrealistic partitions.In contrast, the
nn(nearest-neighbor) method assigns each cell to the parent of the closest cell from the previous timestep. This results in a more intuitive and physically-based partition that respects the geometry of the parent features.Visual Comparison:
Scenario: A large C-shaped feature (A) and a small nearby feature (B) merge into a single large object. We compare how centroid-based vs nearest-neighbor methods partition the merged child back to parent lineages.
1. Initial State (Time t)
Two distinct parent features with centroids marked by ●.
┌───┬───┬───┬───┬───┐ 5 │ A │ A │ A │ A │ A │ ├───┼───┼───┼───┼───┤ 4 │ A │ A │ │ │ │ ├───┼───┼───┼───┼───┤ 3 │ A │ ● │ │ B ● B │ ← Centroid A at (3, 2), Centroid B at (3, 4.5) ├───┼───┼───┼───┼───┤ 2 │ A │ A │ │ │ │ ├───┼───┼───┼───┼───┤ 1 │ A │ A │ A │ A │ A │ └───┴───┴───┴───┴───┘ 1 2 3 4 52. Merged State (Time t+1)
The features grow and merge into a single large object marked with ‘#’. Question: how should these 20 cells be attributed to lineages A and B ?
┌───┬───┬───┬───┬───┐ 5 │ │ # │ # │ # │ # │ ├───┼───┼───┼───┼───┤ 4 │ # │ # │ # │ │ │ ├───┼───┼───┼───┼───┤ 3 │ # │ # │ # │ # │ # │ ├───┼───┼───┼───┼───┤ 2 │ # │ # │ # │ │ │ ├───┼───┼───┼───┼───┤ 1 │ │ # │ # │ # │ # │ └───┴───┴───┴───┴───┘ 1 2 3 4 53. Centroid Partition (nn_partitioning=False) - ❌ PROBLEMATIC
Each cell assigned to nearest parent centroid. The partition line ║ creates a geometric boundary that ignores actual cell topology. Result: B is an odd disjoint object now !
┌───┬───┬──╦┬───┬───┐ 5 │ │ A │ A║│ B │ B │ ├───┼───┼──╫┼───┼───┤ 4 │ A │ A │ A║│ │ │ ├───┼───┼──╫┼───┼───┤ 3 │ A │ A │ A║│ B │ B │ ├───┼───┼──╫┼───┼───┤ 2 │ A │ A │ A║│ │ │ ├───┼───┼──╫┼───┼───┤ 1 │ │ A │ A║│ B │ B │ └───┴───┴──╩┴───┴───┘ 1 2 3 4 5Problem: B is an odd disjoint object with portions unrelated to the original parent. The original C-shaped object means that the centroid misrepresents A’s main body, thus creating unrealistic/nonrepresentative partitions.
**4. Nearest-Neighbour Partition (nn_partitioning=True)
Each cell assigned to parent of nearest parent cell. Natural boundaries respect actual spatial connectivity. Result: B remains contiguous (realistic growth pattern).
┌───┬───┬───┬───┬───┐ 5 │ │ A │ A │ A │ A │ ├───┼───┼───┼───┼───┤ 4 │ A │ A │ A │ │ │ ├───┼───┼───┼───┼───┤ 3 │ A │ A │ B │ B │ B │ ├───┼───┼───┼───┼───┤ 2 │ A │ A │ A │ │ │ ├───┼───┼───┼───┼───┤ 1 │ │ A │ A │ A │ A │ └───┴───┴───┴───┴───┘ 1 2 3 4 5Solution: Growth of the resulting merged object is correctly attributed to A’s lineage based on cell-level proximity, not abstract geometric centroids.
Example: Small Object Problem:
# Centroid method (old, has issues) tracker_centroid = marEx.tracker( extremes, mask, R_fill=8, allow_merging=True, nn_partitioning=False # Small objects get unrealistic portions ) # Nearest-neighbor method (new, recommended) tracker_nn = marEx.tracker( extremes, mask, R_fill=8, allow_merging=True, nn_partitioning=True # Realistic partitioning )
When to Use:
Always use ``nn_partitioning=True`` (default) for accurate merging/splitting
Only use
nn_partitioning=Falsefor comparison with legacy results or Sun et al. (2023) methodology
- overlap_thresholdfloat, default=0.5
Minimum fraction of overlap between objects to consider them the same event. Fraction of the smaller object’s area that must overlap with the larger object’s area.
How overlap_threshold Works - Two-Stage Process:
═══════════════════════════════════════════════════════════════════════════════ STAGE 1: OVERLAP CHECKING (Determines IF objects are related) ═══════════════════════════════════════════════════════════════════════════════ Formula: overlap_fraction = overlap_area / min(area_parent, area_child) Decision: IF overlap_fraction >= overlap_threshold → Objects are LINKED This stage applies to ALL transitions (1→1, 1→many, many→1, many→many) This stage is INDEPENDENT of nn_partitioning choice ─────────────────────────────────────────────────────────────────────────────── Example 1: Threshold MET (overlap_threshold = 0.5) ─────────────────────────────────────────────────────────────────────────────── Time t=0: Time t=1: Overlap Check: ┌──────────┐ ┌─────────┐ overlap_area = 60 cells │ Object │ │ Object │ min(100, 80) = 80 │ A │ → │ C │ │ 100 │ │ 80 │ overlap_fraction = 60/80 = 0.75 │ cells │ │ cells │ └──────────┘ └─────────┘ 0.75 >= 0.5 ✓ → LINKED (same event) ─────────────────────────────────────────────────────────────────────────────── Example 2: Threshold NOT MET (overlap_threshold = 0.5) ─────────────────────────────────────────────────────────────────────────────── Time t=0: Time t=1: Overlap Check: ┌──────────┐ ┌─────────┐ overlap_area = 30 cells │ Object │ │ Object │ min(100, 80) = 80 │ A │ ╳ │ C │ │ 100 │ │ 80 │ overlap_fraction = 30/80 = 0.375 │ cells │ │ cells │ └──────────┘ └─────────┘ 0.375 < 0.5 ✗ → NOT LINKED Result: A terminates, C starts as NEW event ═══════════════════════════════════════════════════════════════════════════════ STAGE 2: PARTITIONING (Determines HOW to divide merged areas) ═══════════════════════════════════════════════════════════════════════════════ This stage ONLY applies when a MERGE is detected (many parents → one child), which also requires the overlap criterion (Stage 1) to be met for each parent. This stage does NOT apply to splits (one parent → many children) Choice of nn_partitioning (True/False) affects ONLY this stage ─────────────────────────────────────────────────────────────────────────────── Example 3: MERGE Scenario - Partitioning Applies ─────────────────────────────────────────────────────────────────────────────── Time t=0: Time t=1: ┌─────────┐ ┌──────────────┐ │ Object A│ │ Object C │ │ 100 │──┐ │ 120 │ │ cells │ │ │ cells │ └─────────┘ │ └──────────────┘ │ MERGE ┌─────────┐ │ │ Object B│──┘ STAGE 1 - Overlap Checking: │ 40 │ ──────────────────────────── │ cells │ A→C: 80/min(100,120) = 80/100 = 0.8 >= 0.5 ✓ LINKED └─────────┘ B→C: 35/min(40,120) = 35/40 = 0.875 >= 0.5 ✓ LINKED Both A and B linked to C → MERGE DETECTED! STAGE 2 - Partitioning (triggered by merge): ───────────────────────────────────────────── Question: How to divide C's 120 cells between A and B lineages? if nn_partitioning=True: Use nearest parent CELL if nn_partitioning=False: Use nearest parent CENTROID (See nn_partitioning documentation for partition methods) ─────────────────────────────────────────────────────────────────────────────── Example 4: SPLIT Scenario - No Partitioning Occurs ─────────────────────────────────────────────────────────────────────────────── Time t=0: Time t=1: ┌─────────┐ ┌─────────┐ │ Object A│───┬───→│ Object B│ │ 100 │ │ │ 60 │ │ cells │ │ │ cells │ └─────────┘ │ └─────────┘ │ │ ┌─────────┐ └───→│ Object C│ │ 50 │ │ cells │ └─────────┘ STAGE 1 - Overlap Checking: ──────────────────────────── A→B: 55/min(100,60) = 55/60 = 0.917 >= 0.5 ✓ LINKED A→C: 45/min(100,50) = 45/50 = 0.9 >= 0.5 ✓ LINKED One parent linked to two children → SPLIT DETECTED! STAGE 2 - Partitioning: ──────────────────────── N.B.: NO PARTITIONING occurs for splits! All objects (A, B, C) grouped into same event with same ID Object B & C continue to evolve disjoint, yet will ultimately be labelled with the same event IDKey Principles:
Overlap checking happens FIRST for all object pairs between consecutive timesteps
Partitioning happens SECOND and ONLY when merges are detected (many→one)
nn_partitioning choice does NOT affect which objects are linked in Stage 1
Splits (one→many) do NOT trigger partitioning - all objects share the same event ID
When to Adjust:
Lower threshold (e.g., 0.3): More continuous tracking, may link distant objects
Higher threshold (e.g., 0.7): More conservative, stricter continuity requirement
Default (0.5): Balanced approach for most applications
Grid Configuration Parameters
- dimensionsdict, default={‘time’: ‘time’, ‘x’: ‘lon’, ‘y’: ‘lat’}
Mapping of conceptual dimensions to actual dimension names. For unstructured grids, use:
{'time': 'time', 'x': 'ncells'}- coordinatesdict, default={‘time’: ‘time’, ‘x’: ‘lon’, ‘y’: ‘lat’}
Mapping of conceptual coordinates to actual coordinate names. For unstructured grids, use:
{'time': 'time', 'x': 'lon', 'y': 'lat'}- grid_resolutionfloat, optional
Grid resolution in degrees for structured grids (ignored for unstructured grids). When provided, automatically calculates physical cell areas using spherical geometry. Overrides any provided
cell_areasparameter for structured grids.# Automatic from grid resolution tracker = marEx.tracker( extremes, mask, R_fill=8, grid_resolution=0.25 # Automatically calculates spherical areas for 0.25° grid ) # This is equivalent to manually provide 2D data array of cell areas tracker = marEx.tracker( extremes, mask, R_fill=8, cell_areas=my_calculated_areas # Manual calculation required )
The calculation accounts for latitude-dependent cell areas using spherical geometry, providing accurate physical areas (in km²) for object/event size calculations.
- cell_areasxr.DataArray, optional
Physical cell areas for area calculations.
For structured grids: Optional. If not provided, defaults to 1.0 for each cell (i.e. cell counts). N.B.: If neither
cell_areasnorgrid_resolutionis provided, areas are in units of cells/pixels. Note: Overridden bygrid_resolutionif provided.For unstructured grids: Required for physical area calculations.
Output Data Structure
The tracking algorithm returns an xarray Dataset with the following structure:
Main Tracking Dataset
# tracked_events Dataset structure:
xarray.Dataset
Dimensions: (lat, lon, time, ID, component, sibling_ID)
Coordinates:
lat (lat)
lon (lon)
time (time)
ID (ID)
Data variables:
ID_field (time, lat, lon) int32 # Event ID field
global_ID (time, ID) int32 # Global ID mapping
area (time, ID) float32 # Event areas
centroid (component, time, ID) float32 # Event centroids
presence (time, ID) bool # Event presence
time_start (ID) datetime64 # Start times
time_end (ID) datetime64 # End times
merge_ledger (time, ID, sibling_ID) int32 # Merge information
Key Variables:
ID_field: Binary field with tracked event IDs
global_ID: Unique ID mapping for each event at each time
area: Spatial area of each event through time (in units of cell counts, or physical units if cell_areas/grid_resolution provided)
centroid: (x,y) centroid coordinates of each event
presence: Boolean indicating event presence at each time
time_start/time_end: Temporal bounds of each event
merge_ledger: Sibling IDs for merging events (-1 = no merge)
Merge Information Dataset
# merges_ds Dataset structure (when return_merges=True):
xarray.Dataset
Dimensions: (merge_ID, parent_idx, child_idx)
Data variables:
parent_IDs (merge_ID, parent_idx) int32 # Parent event IDs
child_IDs (merge_ID, child_idx) int32 # Child event IDs
overlap_areas (merge_ID, parent_idx) int32 # Overlap areas
merge_time (merge_ID) datetime64 # Merge timestamps
n_parents (merge_ID) int8 # Number of parents
n_children (merge_ID) int8 # Number of children
Key Variables:
parent_IDs/child_IDs: Original parent and child IDs in merge events
overlap_areas: Spatial overlap between parent and child objects
merge_time: Timestamp of each merge event
n_parents/n_children: Number of objects involved in each merge
Advanced Usage Examples
Analysing Event Properties
# Run tracking
tracked_events, merges_ds = event_tracker.run(return_merges=True)
# Analyse event durations
event_durations = (tracked_events.time_end - tracked_events.time_start).dt.days
long_events = tracked_events.ID.where(event_durations > 10, drop=True)
# Find events present at specific time
time_slice = tracked_events.sel(time='2020-07-15')
active_events = time_slice.ID.where(time_slice.presence, drop=True)
# Calculate maximum event areas
max_areas = tracked_events.area.max(dim='time')
large_events = tracked_events.ID.where(max_areas > threshold, drop=True)
Merge Event Analysis
# Analyse merge events
if merges_ds is not None:
# Find complex merge events (multiple parents/children)
complex_merges = merges_ds.where(
(merges_ds.n_parents > 1) | (merges_ds.n_children > 1),
drop=True
)
# Analyse merge timing
merge_times = merges_ds.merge_time.values
seasonal_merges = merges_ds.groupby(merges_ds.merge_time.dt.season).count()
# Find parent-child relationships
for merge_id in complex_merges.merge_ID:
parents = merges_ds.parent_IDs.sel(merge_ID=merge_id).values
children = merges_ds.child_IDs.sel(merge_ID=merge_id).values
print(f"Merge {merge_id}: {parents} → {children}")
Visualisation Integration
# Plot tracked events
config = marEx.PlotConfig(
title='Tracked Marine Heatwave Events',
plot_IDs=True, # Special handling for event IDs
cmap='tab20' # Discrete colormap
)
# Plot single timestep
fig, ax, im = tracked_events.ID_field.isel(time=0).plotX.single_plot(config)
# Create animation
movie_path = tracked_events.ID_field.plotX.animate(
config,
plot_dir='./animations',
file_name='tracked_events'
)
Performance Optimisation
Chunking Strategy
# Optimal chunking for tracking
chunk_size = {'time': 25, 'lat': -1, 'lon': -1} # Keep spatial dims together
extremes_ds = xr.open_dataset('extremes.nc', chunks=chunk_size)
# For very large datasets
event_tracker = marEx.tracker(
extremes_ds.extreme_events,
extremes_ds.mask,
R_fill=8,
area_filter_quartile=0.5,
# Performance optimizations
dask_chunks=chunk_size
)
Memory Management
# For memory-constrained environments
event_tracker = marEx.tracker(
extremes_ds.extreme_events,
extremes_ds.mask,
R_fill=6, # Smaller fill radius
area_filter_quartile=0.75, # Remove more small events
T_fill=1, # Reduce temporal gap filling
allow_merging=False # Disable merge tracking for speed
)
Algorithm Details
Tracking Workflow
The tracking algorithm follows these key steps:
Binary Object Identification * Connected component labeling on each time slice * Morphological operations (opening/closing) for noise reduction * Size filtering based on area quartiles
Temporal Matching * Overlap-based matching between consecutive time steps * Handles one-to-one, one-to-many, and many-to-one relationships * Gap filling for short-duration interruptions
Event Lifecycle Management * Track initialisation, continuation, merging, and splitting * Unique ID assignment and genealogy tracking * Statistical property calculation
Post-processing * Merge event documentation * Output dataset construction
Merge/Split Algorithm
The advanced merge/split tracking implements:
# Improved merge partitioning logic:
# - Partition child objects based on nearest parent cell
# - Maintain original event identities across merges
# - Track genealogy of splitting and merging events
# - Use overlap threshold to determine event continuity
Error Handling
Common Issues and Solutions
Memory Errors:
# Solution: Reduce chunk sizes and increase filtering
event_tracker = marEx.tracker(
data_bin, mask,
R_fill=6, # Smaller fill radius
area_filter_quartile=0.8, # Remove more small events
T_fill=1, # Reduce temporal gap filling
dask_chunks={'time': 15} # Smaller time chunks
)
Performance Issues:
# Solution: Optimise parameters for your data
event_tracker = marEx.tracker(
data_bin, mask,
R_fill=4, # Reduce morphological operations
area_filter_quartile=0.75, # Remove more small events
allow_merging=False # Disable merge tracking
)
Tracking Quality Issues:
# Solution: Tune overlap and temporal parameters
event_tracker = marEx.tracker(
data_bin, mask,
R_fill=8,
area_filter_quartile=0.5,
overlap_threshold=0.3, # Lower overlap requirement
T_fill=4, # Allow longer temporal gaps
allow_merging=True # Enable merge tracking
)
Integration with Preprocessing
Complete Workflow Example
import xarray as xr
import marEx
# Step 1: Preprocess data
sst = xr.open_dataset('sst_daily.nc', chunks={'time': 365}).sst
extremes_ds = marEx.preprocess_data(
sst,
method_anomaly='shifting_baseline',
method_extreme='hobday_extreme',
threshold_percentile=95,
window_year_baseline=15,
smooth_days_baseline=21,
window_days_hobday=11,
dask_chunks={'time': 25}
)
# Step 2: Track events
event_tracker = marEx.tracker(
extremes_ds.extreme_events,
extremes_ds.mask,
R_fill=8,
area_filter_quartile=0.5,
T_fill=2,
allow_merging=True,
overlap_threshold=0.5,
nn_partitioning=True
)
# Step 3: Run tracking
tracked_events, merges_ds = event_tracker.run(return_merges=True)
# Step 4: Save results
tracked_events.to_netcdf('tracked_events.nc')
if merges_ds is not None:
merges_ds.to_netcdf('merge_events.nc')
See Also
marEx.detect- Data preprocessing modulemarEx.plotX- Visualisation modulemarEx.helper- HPC utilities