marEx.track.tracker
- class marEx.track.tracker(data_bin, mask, R_fill, area_filter_quartile=None, area_filter_absolute=None, temp_dir=None, T_fill=2, allow_merging=True, nn_partitioning=False, overlap_threshold=0.5, unstructured_grid=False, dimensions=None, coordinates=None, neighbours=None, cell_areas=None, grid_resolution=None, max_iteration=40, checkpoint=None, debug=0, verbose=None, quiet=None, regional_mode=False, coordinate_units=None)[source]
Bases:
objectTracker identifies and tracks arbitrary binary objects in spatial data through time.
The tracker supports both structured (regular grid) and unstructured data, and seamlessly handles splitting & merging of objects. It identifies connected regions in binary data at each time step, and tracks these as evolving events through time.
Main workflow:
Preprocessing: Fill spatiotemporal holes, filter small objects
Object identification: Label connected components at each time
Tracking: Determine object correspondences across time
Optional splitting & merging: Handle complex event evolution
- Parameters:
data_bin (xarray.DataArray) – Binary field of extreme points to group, label, and track (True = object, False = background) Must represent and underlying dask array.
mask (xarray.DataArray) – Binary mask indicating valid regions (True = valid, False = invalid)
R_fill (int) – The radius of the kernel used in morphological opening & closing, relating to the largest hole/gap that can be filled. In units of grid cells.
area_filter_quartile (float, optional) – The fraction of the smallest objects to discard, i.e. the quantile defining the smallest area object retained. Quantile must be in (0-1) (e.g., 0.25 removes smallest 25%). Mutually exclusive with area_filter_absolute. Default is 0.5 if neither parameter is provided.
area_filter_absolute (int, optional) – The minimum area (in grid cells) for an object to be retained. Mutually exclusive with area_filter_quartile. Use this for fixed minimum area thresholds (e.g., 10 cells minimum).
temp_dir (str, optional) – Path to temporary directory for storing intermediate results
T_fill (int, default=2) – The permissible temporal gap (in days) between objects for tracking continuity to be maintained (must be even)
allow_merging (bool, default=True) – Allow objects to split and merge across time. Apply splitting & merging criteria, track merge events, and maintain original identities of merged objects across time. N.B.: False reverts to classical ndmeasure.label with simplar time connectivity, i.e. Scannell et al.
nn_partitioning (bool, default=False) – Implement a better partitioning of merged child objects based on closest parent cell. False reverts to using parent centroids to determine partitioning between new child objects, i.e. Di Sun & Bohai Zhang 2023. N.B.: Centroid-based partitioning has major problems with small merging objects suddenly obtaining unrealistically-large (and often disjoint) fractions of the larger object.
overlap_threshold (float, default=0.5) – The fraction of the smaller object’s area that must overlap with the larger object’s area to be considered the same event and continue tracking with the same ID.
unstructured_grid (bool, default=False) – Whether data is on an unstructured grid
dimensions (dict, default={"time": "time", "x": "lon", "y": "lat"}) – Mapping of dimensions to names in the data
coordinates (dict, optional) – Coordinate names for unstructured grids. Should contain ‘x’ and ‘y’ keys for x and y coordinates. May also contain ‘time’ if the time coordinate name is different from the dimension name.
neighbours (xarray.DataArray, optional) – For unstructured grid, indicates connectivity between cells
cell_areas (xarray.DataArray, optional) – For unstructured grid, area of each cell (required). For structured grid, area of each cell (optional). If not provided, defaults to 1.0 for each cell (resulting in cell counts as areas). Note: Overridden by grid_resolution if provided for structured grids.
grid_resolution (float, optional) – Grid resolution in degrees for structured grids only (ignored for unstructured grids). When provided, automatically calculates cell areas using spherical geometry. Overrides any provided cell_areas parameter.
max_iteration (int, default=40) – Maximum number of iterations for merging/splitting algorithm
checkpoint (str, default='None') – Checkpoint strategy (‘save’, ‘load’, or None)
debug (int, default=0) – Debug level (0-2)
verbose (bool, optional) – Enable verbose logging with detailed progress information. If None, uses current global logging configuration.
quiet (bool, optional) – Enable quiet logging with minimal output (warnings and errors only). If None, uses current global logging configuration. Note: quiet takes precedence over verbose if both are True.
regional_mode (bool, default=False) – Enable regional mode for non-global coordinate ranges. When True, coordinate_units must be specified.
coordinate_units (str, optional) – Coordinate units when regional_mode=True. Must be either ‘degrees’ or ‘radians’.
Examples
Basic tracking of marine heatwave events from preprocessed data:
>>> import xarray as xr >>> import marEx >>> >>> # Load preprocessed extreme events data >>> processed = xr.open_dataset('extreme_events.nc', chunks={}) >>> extreme_events = processed.extreme_events # Boolean array >>> mask = processed.mask # Ocean/land mask >>> >>> # Initialise tracker with basic parameters >>> tracker = marEx.tracker( ... extreme_events, ... mask, ... R_fill=8, # Fill holes up to 8 grid cells ... area_filter_quartile=0.5 # Remove smallest 50% of objects ... allow_merging=False # Basic tracking without splitting/merging ... ) >>> >>> # Run tracking algorithm >>> events = tracker.run() >>> print(f"Identified {events.ID.max().compute()} distinct events") Identified 1247 distinct events
Using automatic grid area calculation from resolution:
>>> # For regular lat/lon grids, automatically calculate physical areas >>> grid_tracker = marEx.tracker( ... extreme_events, ... mask, ... R_fill=8, ... area_filter_quartile=0.5, ... grid_resolution=0.25 # Grid resolution in degrees ... ) >>> # Cell areas are calculated automatically using spherical geometry >>> grid_events = grid_tracker.run()
Advanced tracking with merging and splitting enabled:
>>> # More sophisticated tracking with temporal gap filling >>> advanced_tracker = marEx.tracker( ... extreme_events, ... mask, ... R_fill=12, # Larger spatial gap filling ... T_fill=4, # Fill up to 4-day temporal gaps ... area_filter_quartile=0.25, # More aggressive size filtering ... allow_merging=True, # Enable split/merge detection ... overlap_threshold=0.3 # Lower threshold for object linking ... ) >>> >>> events_advanced, merges_log = advanced_tracker.run(return_merges=True) >>> print(events_advanced.data_vars) Data variables: event (time, lat, lon) int32 dask.array<chunksize=(25, 180, 360)> event_centroid (time, lat, lon) int32 dask.array<chunksize=(25, 180, 360)> ID_field (time, lat, lon) int32 dask.array<chunksize=(25, 180, 360)> global_ID (time, ID) int32 dask.array<chunksize=(25, 1247)> area (time, ID) float32 dask.array<chunksize=(25, 1247)> centroid (component, time, ID) float64 dask.array<chunksize=(2, 25, 1247)> presence (time, ID) bool dask.array<chunksize=(25, 1247)> time_start (ID) datetime64[ns] dask.array<chunksize=(1247,)> time_end (ID) datetime64[ns] dask.array<chunksize=(1247,)> merge_ledger (time, ID, sibling_ID) int32 dask.array<chunksize=(25, 1247, 10)>
Processing unstructured ocean model data (ICON):
>>> # Load ICON ocean model data with connectivity >>> icon_data = xr.open_dataset('icon_extremes.nc', chunks={}) >>> icon_extremes = icon_data.extreme_events # (time, ncells) >>> icon_mask = icon_data.mask >>> neighbours = icon_data.neighbours # Cell connectivity >>> cell_areas = icon_data.cell_areas # Physical areas >>> >>> # Track events on unstructured grid >>> unstructured_tracker = marEx.tracker( ... icon_extremes, ... icon_mask, ... R_fill=5, # 5-neighbor radius for gap filling ... area_filter_quartile=0.6, # Remove 60% of smallest events ... unstructured_grid=True, # Enable unstructured mode ... dimensions={"x": "ncells"}, # Must specify the name of the spatial dimension ... coordinates={"x": "lon", "y": "lat"}, # Spatial coordinate names ... neighbours=neighbours, # Required for unstructured ... cell_areas=cell_areas # Required for area calculations ... ) >>> unstructured_events = unstructured_tracker.run()
Memory management and checkpointing for large datasets:
>>> # Use checkpointing for very large datasets >>> large_tracker = marEx.tracker( ... extreme_events, ... mask, ... R_fill=8, ... area_filter_quartile=0.5, ... temp_dir='/scratch/user/tracking_temp', # Temporary storage ... checkpoint='save' # Save intermediate results ... ) >>> # Processing can be resumed if interrupted >>> large_events = large_tracker.run()
Comparing different filtering strategies:
>>> # Conservative filtering - keep more events >>> conservative = marEx.tracker( ... extreme_events, mask, R_fill=5, area_filter_quartile=0.1 ... ) >>> conservative_events = conservative.run() >>> >>> # Aggressive filtering - focus on largest events >>> aggressive = marEx.tracker( ... extreme_events, mask, R_fill=15, area_filter_quartile=0.8 ... ) >>> aggressive_events = aggressive.run() >>> >>> print(f"Conservative: {conservative_events.ID.max().compute()} events") >>> print(f"Aggressive: {aggressive_events.ID.max().compute()} events")
Using absolute area filtering instead of percentile-based:
>>> # Filter objects smaller than 25 grid cells >>> absolute_tracker = marEx.tracker( ... extreme_events, mask, R_fill=8, area_filter_absolute=25 ... ) >>> absolute_events = absolute_tracker.run() >>> >>> # Default behavior (area_filter_quartile=0.5) when no parameters provided >>> default_tracker = marEx.tracker(extreme_events, mask, R_fill=8) >>> default_events = default_tracker.run() # Uses quartile=0.5 filtering
Using physical cell areas for structured grids:
>>> # Load data with irregular grid cell areas >>> grid_areas = xr.open_dataset('grid_areas.nc').cell_area # (lat, lon) in m² >>> >>> # Track events using physical areas instead of cell counts >>> physical_tracker = marEx.tracker( ... extreme_events, ... mask, ... R_fill=8, ... area_filter_quartile=0.5, ... cell_areas=grid_areas # Physical areas in m² ... ) >>> events = physical_tracker.run() >>> # Now events.area contains physical areas in m² instead of cell counts
Integration with full marEx workflow:
>>> # Complete workflow from raw data to tracked events >>> raw_sst = xr.open_dataset('sst_data.nc', chunks={}).sst.chunk({'time': 30}) >>> >>> # Step 1: Preprocess to identify extremes >>> processed = marEx.preprocess_data(raw_sst, threshold_percentile=95) >>> >>> # Step 2: Track extreme events >>> tracker = marEx.tracker( ... processed.extreme_events, ... processed.mask, ... R_fill=8, ... area_filter_quartile=0.5 ... ) >>> tracked_events = tracker.run()
Initialise the tracker with parameters and data.
- __init__(data_bin, mask, R_fill, area_filter_quartile=None, area_filter_absolute=None, temp_dir=None, T_fill=2, allow_merging=True, nn_partitioning=False, overlap_threshold=0.5, unstructured_grid=False, dimensions=None, coordinates=None, neighbours=None, cell_areas=None, grid_resolution=None, max_iteration=40, checkpoint=None, debug=0, verbose=None, quiet=None, regional_mode=False, coordinate_units=None)[source]
Initialise the tracker with parameters and data.
- Parameters:
data_bin (DataArray)
mask (DataArray)
area_filter_quartile (float | None)
area_filter_absolute (int | None)
temp_dir (str | None)
T_fill (int)
allow_merging (bool)
nn_partitioning (bool)
overlap_threshold (float)
unstructured_grid (bool)
neighbours (DataArray | None)
cell_areas (DataArray | None)
grid_resolution (float | None)
max_iteration (int)
checkpoint (Literal['save', 'load', 'None'] | None)
debug (int)
verbose (bool | None)
quiet (bool | None)
regional_mode (bool)
coordinate_units (Literal['degrees', 'radians'] | None)
- Return type:
None
Methods
__init__(data_bin, mask, R_fill[, ...])Initialise the tracker with parameters and data.
calculate_centroid(binary_mask[, ...])Calculate object centroid, handling edge cases for periodic boundaries.
calculate_object_properties(object_id_field)Calculate properties of objects from ID field.
check_overlap_slice(ids_t0, ids_next)Find overlapping objects between two consecutive time slices.
Cluster the object pairs and relabel to determine final event IDs.
compute_area(data_bin)Compute the total area of binary data at each time.
compute_id_time_dict(da, child_objects, ...)Generate lookup table mapping object IDs to their time index.
Filter object pairs based on overlap threshold.
fill_holes(data_bin[, R_fill])Fill holes and gaps using morphological operations.
fill_time_gaps(data_bin)Fill temporal gaps between objects.
filter_small_objects(data_bin)Remove objects smaller than a threshold area.
find_overlapping_objects(object_id_field)Find all overlapping objects across time.
identify_objects(data_bin, time_connectivity)Identify connected regions in binary data.
refresh_dask_graph(data_bin)Clear and reset the Dask graph via save/load cycle.
run([return_merges, checkpoint])Run the complete object identification and tracking pipeline.
run_preprocess([checkpoint])Preprocess binary data to prepare for tracking.
run_stats_attributes(events_ds, merges_ds, ...)Add statistics and attributes to the events dataset.
run_tracking(data_bin_preprocessed)Track objects through time to identify events.
Implement object splitting and merging logic.
Optimised parallel implementation of object splitting and merging.
track_objects(data_bin)Track objects through time to form events.
- __init__(data_bin, mask, R_fill, area_filter_quartile=None, area_filter_absolute=None, temp_dir=None, T_fill=2, allow_merging=True, nn_partitioning=False, overlap_threshold=0.5, unstructured_grid=False, dimensions=None, coordinates=None, neighbours=None, cell_areas=None, grid_resolution=None, max_iteration=40, checkpoint=None, debug=0, verbose=None, quiet=None, regional_mode=False, coordinate_units=None)[source]
Initialise the tracker with parameters and data.
- Parameters:
data_bin (DataArray)
mask (DataArray)
area_filter_quartile (float | None)
area_filter_absolute (int | None)
temp_dir (str | None)
T_fill (int)
allow_merging (bool)
nn_partitioning (bool)
overlap_threshold (float)
unstructured_grid (bool)
neighbours (DataArray | None)
cell_areas (DataArray | None)
grid_resolution (float | None)
max_iteration (int)
checkpoint (Literal['save', 'load', 'None'] | None)
debug (int)
verbose (bool | None)
quiet (bool | None)
regional_mode (bool)
coordinate_units (Literal['degrees', 'radians'] | None)
- Return type:
None
- run(return_merges=False, checkpoint=None)[source]
Run the complete object identification and tracking pipeline.
This method executes the full workflow:
Preprocessing: morphological operations and size filtering
Identification and tracking of objects through time
Computing and attaching statistics to the results
- Parameters:
- Returns:
events_ds (xarray.Dataset) – Dataset containing tracked events and their properties
merges_ds (xarray.Dataset, optional) – Dataset with merge event information (only if return_merges=True)
- Return type:
- run_preprocess(checkpoint=None)[source]
Preprocess binary data to prepare for tracking.
This performs morphological operations to fill holes/gaps in both space and time, then filters small objects according to the area_filter_quartile or area_filter_absolute.
- run_tracking(data_bin_preprocessed)[source]
Track objects through time to identify events.
- Parameters:
data_bin_preprocessed (xarray.DataArray) – Preprocessed binary data
- Returns:
events_ds (xarray.Dataset) – Dataset containing tracked events
merges_ds (xarray.Dataset) – Dataset with merge information
N_events_final (int) – Final number of unique events
- Return type:
- run_stats_attributes(events_ds, merges_ds, object_stats, N_events_final)[source]
Add statistics and attributes to the events dataset.
- Parameters:
events_ds (xarray.Dataset) – Dataset containing tracked events
merges_ds (xarray.Dataset) – Dataset with merge information
object_stats (tuple) – Preprocessed object statistics
N_events_final (int) – Final number of events
- Returns:
events_ds – Dataset with added statistics and attributes
- Return type:
- compute_area(data_bin)[source]
Compute the total area of binary data at each time.
- Parameters:
data_bin (xarray.DataArray) – Binary data
- Returns:
area – Total area at each time (units: pixels for structured grid, matching cell_area for unstructured)
- Return type:
- fill_holes(data_bin, R_fill=None)[source]
Fill holes and gaps using morphological operations.
This performs closing (dilation followed by erosion) to fill small gaps, then opening (erosion followed by dilation) to remove small isolated objects.
- Parameters:
data_bin (xarray.DataArray) – Binary data to process
R_fill (int, optional) – Fill radius override
- Returns:
data_bin_filled – Binary data with holes/gaps filled
- Return type:
- fill_time_gaps(data_bin)[source]
Fill temporal gaps between objects.
Performs binary closing (dilation then erosion) along the time dimension to fill small time gaps between objects.
- Parameters:
data_bin (xarray.DataArray) – Binary data to process
- Returns:
data_bin_filled – Binary data with temporal gaps filled
- Return type:
- refresh_dask_graph(data_bin)[source]
Clear and reset the Dask graph via save/load cycle.
This is needed to work around a memory leak bug in Dask where “Unmanaged Memory” builds up within loops.
- Parameters:
data_bin (xarray.DataArray) – Data to refresh
- Returns:
data_new – Data with fresh Dask graph
- Return type:
- filter_small_objects(data_bin)[source]
Remove objects smaller than a threshold area.
- Parameters:
data_bin (xarray.DataArray) – Binary data to filter
- Returns:
data_bin_filtered (xarray.DataArray) – Binary data with small objects removed
area_threshold (float) – Area threshold used for filtering
object_areas (xarray.DataArray) – Areas of all objects pre-filtering
N_objects_prefiltered (int) – Number of objects before filtering
N_objects_filtered (int) – Number of objects after filtering
- Return type:
- identify_objects(data_bin, time_connectivity)[source]
Identify connected regions in binary data.
- Parameters:
data_bin (xarray.DataArray) – Binary data to identify objects in
time_connectivity (bool) – Whether to connect objects across time
- Returns:
object_id_field (xarray.DataArray) – Field of integer IDs for each object
None (NoneType) – Placeholder for compatibility with track_objects
N_objects (int) – Number of objects identified
- Return type:
- calculate_centroid(binary_mask, original_centroid=None)[source]
Calculate object centroid, handling edge cases for periodic boundaries.
- Parameters:
binary_mask (numpy.ndarray) – 2D binary array where True indicates the object (dimensions are (y,x))
original_centroid (tuple, optional) – (y_centroid, x_centroid) from regionprops_table
- Returns:
(y_centroid, x_centroid)
- Return type:
- calculate_object_properties(object_id_field, properties=None)[source]
Calculate properties of objects from ID field.
- Parameters:
object_id_field (xarray.DataArray) – Field containing object IDs
properties (list, optional) – List of properties to calculate (defaults to [‘label’, ‘area’])
- Returns:
object_props – Dataset containing calculated properties with ‘ID’ dimension
- Return type:
- check_overlap_slice(ids_t0, ids_next)[source]
Find overlapping objects between two consecutive time slices.
- Parameters:
ids_t0 (numpy.ndarray) – Object IDs at current time
ids_next (numpy.ndarray) – Object IDs at next time
- Returns:
Array of shape (n_overlaps, 3) with [id_t0, id_next, overlap_area]
- Return type:
- find_overlapping_objects(object_id_field)[source]
Find all overlapping objects across time.
- Parameters:
object_id_field (xarray.DataArray) – Field containing object IDs
- Returns:
overlap_objects_list_unique_filtered – Array of object ID pairs that overlap across time, with overlap area The object in the first column precedes the second column in time. The third column contains:
For structured grid: number of overlapping pixels (int32)
For unstructured grid: total overlapping area in m^2 (float32)
- Return type:
(N x 3) numpy.ndarray
- enforce_overlap_threshold(overlap_objects_list, object_props)[source]
Filter object pairs based on overlap threshold.
- Parameters:
overlap_objects_list ((N x 3) numpy.ndarray) – Array of object ID pairs with overlap area
object_props (xarray.Dataset) – Object properties including area
- Returns:
overlap_objects_list_filtered – Filtered array of object ID pairs that meet the overlap threshold
- Return type:
(M x 3) numpy.ndarray
- consolidate_object_ids(data_t_minus_2, data_t_minus_1, object_props, timestep)[source]
Consolidate object IDs between t-2 and t-1 to ensure consistent tracking.
This identifies objects at t-1 that are actually continuations of objects from t-2 (but got different IDs due to partitioning) and renames them to maintain consistent IDs across timesteps.
- Parameters:
data_t_minus_2 (xr.DataArray) – Object field at timestep t-2
data_t_minus_1 (xr.DataArray) – Object field at timestep t-1 (will be modified)
object_props (xr.Dataset) – Object properties dataset (will be modified)
timestep (int) – Current timestep number for logging purposes
- Returns:
data_t_minus_1_consolidated (xr.DataArray) – Updated t-1 field with consolidated IDs
object_props_updated (xr.Dataset) – Updated object properties with merged/deleted objects
- Return type:
Notes
Uses self.overlap_threshold for determining consolidation eligibility
Updates object properties by recalculating for consolidated objects
Removes redundant child objects from object_props
- compute_id_time_dict(da, child_objects, max_objects, all_objects=True)[source]
Generate lookup table mapping object IDs to their time index.
- Parameters:
da (xarray.DataArray) – Field of object IDs
child_objects (list or array) – Object IDs to include in the dictionary
max_objects (int) – Maximum number of objects
all_objects (bool, default=True) – Whether to process all objects or just child_objects
- Returns:
time_index_map – Dictionary mapping object IDs to time indices
- Return type:
- track_objects(data_bin)[source]
Track objects through time to form events.
This is the main tracking method that handles splitting and merging of objects.
- Parameters:
data_bin (xarray.DataArray) – Preprocessed binary data: Field of globally unique integer IDs of each element in connected regions. ID = 0 indicates no object.
- Returns:
split_merged_events_ds (xarray.Dataset) – Dataset containing tracked events
merge_events (xarray.Dataset) – Dataset with merge information
N_events (int) – Final number of events
- Return type:
- cluster_rename_objects_and_props(object_id_field_unique, object_props, overlap_objects_list, merge_events)[source]
Cluster the object pairs and relabel to determine final event IDs.
- Parameters:
object_id_field_unique (xarray.DataArray) – Field of unique object IDs. IDs must not be repeated across time.
object_props (xarray.Dataset) – Properties of each object that also need to be relabeled.
overlap_objects_list ((N x 2) numpy.ndarray) – Array of object ID pairs that indicate which objects are in the same event. The object in the first column precedes the second column in time.
merge_events (xarray.Dataset) – Information about merge events
- Returns:
split_merged_events_ds – Dataset with relabeled events and their properties. ID = 0 indicates no object.
- Return type:
- split_and_merge_objects(object_id_field_unique, object_props)[source]
Implement object splitting and merging logic.
This identifies and processes cases where objects split or merge over time, creating new object IDs as needed.
- Parameters:
object_id_field_unique (xarray.DataArray) – Field of unique object IDs. IDs are required to be monotonically increasing with time.
object_props (xarray.Dataset) – Properties of each object
- Returns:
(object_id_field, object_props, overlap_objects_list, merge_events)
- Return type:
- split_and_merge_objects_parallel(object_id_field_unique, object_props)[source]
Optimised parallel implementation of object splitting and merging.
This version is specifically designed for unstructured grids with more efficient memory handling and better parallelism than the standard split_and_merge_objects method. It processes data in chunks, handles merging events, and efficiently updates object IDs.
- Parameters:
object_id_field_unique (xarray.DataArray) – Field of unique object IDs
object_props (xarray.Dataset) – Properties of each object
- Returns:
(object_id_field, object_props, overlap_objects_list, merge_events)
- Return type: