Troubleshooting

This guide helps you diagnose and resolve common issues when using marEx for marine extreme event analysis.

Quick Diagnostic Checklist 

Before diving into specific issues, run through this quick checklist:

Environment Check:

import marEx

# Check version
print(f"marEx version: {getattr(marEx, '__version__', 'development')}")

# Check dependencies
marEx.print_dependency_status()

# Check Dask
import dask
print(f"Dask version: {dask.__version__}")

Data Validation:

# Check data structure
print(f"Data dimensions: {your_data.dims}")
print(f"Data shape: {your_data.shape}")
print(f"Data coordinates: {list(your_data.coords)}")
print(f"Is Dask array: {marEx.is_dask_collection(your_data.data)}")

Installation Issues 

Package Not Found 

Problem: ModuleNotFoundError: No module named 'marEx'

Solutions:

Install marEx:
```
pip install marEx
```
Check Python environment:
```
which python
pip list | grep marEx
```

Virtual environment issues:

# Activate correct environment
conda activate your-env
# or
source your-venv/bin/activate

Development installation:

# If working with source code
pip install -e ./marEx

Dependency Conflicts 

Problem: Version conflicts with scientific packages

Solutions:

Create clean environment:

conda create -n marex-env python=3.10
conda activate marex-env
pip install marEx[full]

Update dependencies:

pip install --upgrade marEx
pip install --upgrade dask xarray

Pin problematic versions:
```
pip install "dask>=2024.7.0,<2026.0.0"
```

Missing Optional Dependencies 

Problem: Features not working due to missing optional packages

Solutions:

Install full package:
```
pip install marEx[full]
```

Install specific features:

pip install marEx[dev]     # Development tools
pip install jax jaxlib     # GPU acceleration
pip install dask-jobqueue  # HPC support

Check what’s missing:

python -c "import marEx; marEx.print_dependency_status()"

Coordinate Issues 

Problem: KeyError: 'lat' or coordinate not found

Solutions:

Check coordinate names:
```
print(data.coords)
print(data.dims)
```

For unstructured data:

# Ensure lat/lon are coordinates, not dimensions
print(f"Spatial dimensions: {[d for d in data.dims if d not in ['time']]}")

Chunking Problems 

Problem: ValueError: Can't rechunk from chunks or memory issues

Solutions:

Check current chunks:

print(f"Current chunks: {data.chunks}")

Rechunk appropriately:

# For preprocessing
data = data.chunk({'time': 30, 'lat': -1, 'lon': -1})

Avoid very small chunks:

# Bad: too many small chunks
data = data.chunk({'time': 1, 'lat': 1, 'lon': 1})

# Good: balanced chunks
data = data.chunk({'time': 'auto', 'lat': -1, 'lon': -1})

Processing Issues 

Memory Errors 

Problem: MemoryError or KilledWorker during processing

Solutions:

Reduce chunk sizes:

# Smaller chunks use less memory
data = data.chunk({'time': 30, 'lat': 100, 'lon': 100})

Reduce worker memory:

client = marEx.helper.start_local_cluster(
    n_workers=2,
    memory_limit='4GB'  # Reduce from default
)

Use spill-to-disk:

import dask
dask.config.set({'distributed.worker.memory.spill': 0.8})

Slow Performance 

Problem: Processing takes much longer than expected

Solutions:

Analyse the Dask Dashboard:

Optimise chunks for operation:

# For preprocessing (time series operations)
data = data.chunk({'time': 1000, 'lat': 'auto', 'lon': 'auto'})

# For spatial operations
data = data.chunk({'time': 30, 'lat': -1, 'lon': -1})

Use more workers:

client = marEx.helper.start_local_cluster(
    n_workers=min(32, os.cpu_count()),
    threads_per_worker=1
)

Profile performance:

from dask.distributed import performance_report

with performance_report(filename="marex-profile.html"):
    result = marEx.preprocess_data(data)

Wrong Results 

Problem: Unexpected values or patterns in results

Solutions:

Check input data quality:

print(f"Data range: {data.min().values} to {data.max().values}")
print(f"Missing values: {data.isnull().sum().values}")

Validate preprocessing parameters:

# Check baseline period
baseline_data = data.sel(time=slice('1990', '2020'))
if len(baseline_data.time) < 365 * 10:
    print("Warning: Baseline period too short")

Check anomaly mean:

# Anomalies should have near-zero mean
anomaly_mean = processed['dat_anomaly'].mean().values
if abs(anomaly_mean) > 0.1:
    print(f"Warning: Anomaly mean not zero: {anomaly_mean}")

Verify extreme frequency:

# Should be close to threshold percentile
extreme_freq = processed['extreme_events'].mean().values * 100
expected_freq = 100 - threshold_percentile
if abs(extreme_freq - expected_freq) > 2:
    print(f"Warning: Extreme frequency {extreme_freq:.1f}% != expected {expected_freq}%")

Worker Failures 

Problem: Workers dying or becoming unresponsive

Solutions:

Check system resources:

import psutil
print(f"CPU usage: {psutil.cpu_percent()}%")
print(f"Memory usage: {psutil.virtual_memory().percent}%")
print(f"Available memory: {psutil.virtual_memory().available / 1e9:.1f} GB")

Reduce worker load:

client = marEx.helper.start_local_cluster(
    n_workers=8,           # Fewer workers
    threads_per_worker=1,  # Fewer threads
    memory_limit='4GB'     # Less memory per worker
)

Configure worker limits:

dask.config.set({
    'distributed.worker.memory.target': 0.8,
    'distributed.worker.memory.spill': 0.9,
    'distributed.worker.memory.pause': 0.95,
    'distributed.worker.memory.terminate': 0.98
})

HPC-Specific Issues 

SLURM Job Failures 

Problem: Jobs killed or failing on HPC systems

Solutions:

Check resource limits:
```
scontrol show job $SLURM_JOB_ID
```
Increase walltime:
```
#SBATCH --time=12:00:00
```
Request more memory:
```
#SBATCH --mem=128G
```
Use exclusive nodes:
```
#SBATCH --exclusive
```

Performance Issues 

General Performance Tips 

Tune your cluster:

# Don't use too many small workers
# Better: fewer workers with more resources
client = marEx.helper.start_local_cluster(
    n_workers=16,
    threads_per_worker=1,
    memory_limit='16GB'
)

Optimise chunk sizes:

# Target 100-400 MB chunks
# Check with: data.nbytes / 1e6

Monitor progress:

from dask.distributed import progress
progress(result)

Getting Help 

Community Resources 

GitHub Issues: Report bugs and request features
Discussions: Ask questions and share experiences
Documentation: Check latest documentation online
Examples: Browse example notebooks and scripts

Reporting Issues 

When reporting issues, include:

marEx version: marEx.__version__
Python version: python --version
Operating system: OS and version
Data description: Size, format, structure
Full error message: Complete traceback
Minimal example: Simplified code that reproduces the issue