marEx.helper.start_local_cluster
- marEx.helper.start_local_cluster(n_workers=4, threads_per_worker=1, scratch_dir=None, verbose=None, quiet=None, **kwargs)[source]
Start a local Dask cluster.
- Parameters:
n_workers (int, default=4) – Number of worker processes to start.
threads_per_worker (int, default=1) – Number of threads per worker.
scratch_dir (str or Path, optional) – Directory to use for temporary files.
verbose (bool, optional) – Enable verbose logging with detailed cluster startup information. If None, uses current global logging configuration.
quiet (bool, optional) – Enable quiet logging with minimal output (warnings and errors only). If None, uses current global logging configuration. Note: quiet takes precedence over verbose if both are True.
**kwargs – Additional keyword arguments to pass to LocalCluster.
- Returns:
Dask client connected to the local cluster.
- Return type:
Client
Examples
Basic local cluster for development:
>>> import marEx >>> >>> # Start simple local cluster >>> client = marEx.start_local_cluster(n_workers=2, threads_per_worker=1) >>> print(client) <Client: 'tcp://127.0.0.1:xxxxx' processes=2 threads=2, memory=15.7 GB> >>> >>> # Check cluster status >>> print(f"Dashboard: {client.dashboard_link}") Dashboard: http://127.0.0.1:8787/status >>> >>> client.close()
Optimised cluster for CPU-intensive work:
>>> # Use one worker per physical core >>> client = marEx.start_local_cluster( ... n_workers=8, # Number of physical cores ... threads_per_worker=1 # Avoid hyperthreading for compute ... ) >>> >>> # Process data with the cluster >>> import xarray as xr >>> data = xr.open_dataset('large_data.nc').chunk({'time': 25}) >>> result = data.mean().compute() >>> client.close()
Memory-optimised cluster:
>>> # Configure for large datasets >>> client = marEx.start_local_cluster( ... n_workers=4, ... threads_per_worker=2, ... memory_limit='8GB', # Limit memory per worker ... scratch_dir='/tmp/dask' # Fast (e.g. Lustre) local storage ... )
Integration with marEx preprocessing:
>>> # Start cluster then process data >>> client = marEx.start_local_cluster(n_workers=16) >>> >>> # Load and preprocess SST data >>> sst = xr.open_dataset('sst_data.nc').sst.chunk({'time': 30}) >>> processed = marEx.preprocess_data(sst, threshold_percentile=95) >>> >>> # Track events using the cluster >>> tracker = marEx.tracker( ... processed.extreme_events, ... processed.mask, ... R_fill=8, ... area_filter_quartile=0.5 ... ) >>> events = tracker.run() >>> >>> print(f"Processed {len(sst.time)} time steps with {client.nthreads()} threads") >>> client.close()
Custom worker configuration:
>>> # Advanced configuration for specific workloads >>> client = marEx.start_local_cluster( ... n_workers=4, ... threads_per_worker=2, ... processes=True, # Use separate processes (default) ... silence_logs=False, # Keep logs for debugging ... dashboard_address=':8787' # Specific dashboard port ... )