marEx.helper.start_local_cluster

marEx.helper.start_local_cluster(n_workers=4, threads_per_worker=1, scratch_dir=None, verbose=None, quiet=None, **kwargs)[source]

Start a local Dask cluster.

Parameters:
  • n_workers (int, default=4) – Number of worker processes to start.

  • threads_per_worker (int, default=1) – Number of threads per worker.

  • scratch_dir (str or Path, optional) – Directory to use for temporary files.

  • verbose (bool, optional) – Enable verbose logging with detailed cluster startup information. If None, uses current global logging configuration.

  • quiet (bool, optional) – Enable quiet logging with minimal output (warnings and errors only). If None, uses current global logging configuration. Note: quiet takes precedence over verbose if both are True.

  • **kwargs – Additional keyword arguments to pass to LocalCluster.

Returns:

Dask client connected to the local cluster.

Return type:

Client

Examples

Basic local cluster for development:

>>> import marEx
>>>
>>> # Start simple local cluster
>>> client = marEx.start_local_cluster(n_workers=2, threads_per_worker=1)
>>> print(client)
<Client: 'tcp://127.0.0.1:xxxxx' processes=2 threads=2, memory=15.7 GB>
>>>
>>> # Check cluster status
>>> print(f"Dashboard: {client.dashboard_link}")
Dashboard: http://127.0.0.1:8787/status
>>>
>>> client.close()

Optimised cluster for CPU-intensive work:

>>> # Use one worker per physical core
>>> client = marEx.start_local_cluster(
...     n_workers=8,           # Number of physical cores
...     threads_per_worker=1   # Avoid hyperthreading for compute
... )
>>>
>>> # Process data with the cluster
>>> import xarray as xr
>>> data = xr.open_dataset('large_data.nc').chunk({'time': 25})
>>> result = data.mean().compute()
>>> client.close()

Memory-optimised cluster:

>>> # Configure for large datasets
>>> client = marEx.start_local_cluster(
...     n_workers=4,
...     threads_per_worker=2,
...     memory_limit='8GB',      # Limit memory per worker
...     scratch_dir='/tmp/dask'  # Fast (e.g. Lustre) local storage
... )

Integration with marEx preprocessing:

>>> # Start cluster then process data
>>> client = marEx.start_local_cluster(n_workers=16)
>>>
>>> # Load and preprocess SST data
>>> sst = xr.open_dataset('sst_data.nc').sst.chunk({'time': 30})
>>> processed = marEx.preprocess_data(sst, threshold_percentile=95)
>>>
>>> # Track events using the cluster
>>> tracker = marEx.tracker(
...     processed.extreme_events,
...     processed.mask,
...     R_fill=8,
...     area_filter_quartile=0.5
... )
>>> events = tracker.run()
>>>
>>> print(f"Processed {len(sst.time)} time steps with {client.nthreads()} threads")
>>> client.close()

Custom worker configuration:

>>> # Advanced configuration for specific workloads
>>> client = marEx.start_local_cluster(
...     n_workers=4,
...     threads_per_worker=2,
...     processes=True,         # Use separate processes (default)
...     silence_logs=False,     # Keep logs for debugging
...     dashboard_address=':8787'  # Specific dashboard port
... )