DigitalOcean

DropletCluster([region, size, image, debug])

Cluster running on Digital Ocean droplets.

Overview

Authentication

To authenticate with DigitalOcean you must first generate a personal access token.

Then you must put this in your Dask configuration at cloudprovider.digitalocean.token. This can be done by adding the token to your YAML configuration or exporting an environment variable.

# ~/.config/dask/cloudprovider.yaml

cloudprovider:
  digitalocean:
    token: "yourtoken"
$ export DASK_CLOUDPROVIDER__DIGITALOCEAN__TOKEN="yourtoken"

Droplet

class dask_cloudprovider.digitalocean.DropletCluster(region: Optional[str] = None, size: Optional[str] = None, image: Optional[str] = None, debug: bool = False, **kwargs)[source]

Cluster running on Digital Ocean droplets.

VMs in DigitalOcean (DO) are referred to as droplets. This cluster manager constructs a Dask cluster running on VMs.

When configuring your cluster you may find it useful to install the doctl tool for querying the DO API for available options.

https://www.digitalocean.com/docs/apis-clis/doctl/how-to/install/

Parameters
region: str

The DO region to launch you cluster in. A full list can be obtained with doctl compute region list.

size: str

The VM size slug. You can get a full list with doctl compute size list. The default is s-1vcpu-1gb which is 1GB RAM and 1 vCPU

image: str

The image ID to use for the host OS. This should be a Ubuntu variant. You can list available images with doctl compute image list --public | grep ubuntu.*x64.

worker_module: str

The Dask worker module to start on worker VMs.

n_workers: int

Number of workers to initialise the cluster with. Defaults to 0.

worker_module: str

The Python module to run for the worker. Defaults to distributed.cli.dask_worker

worker_options: dict

Params to be passed to the worker class. See distributed.worker.Worker for default worker class. If you set worker_module then refer to the docstring for the custom worker class.

scheduler_options: dict

Params to be passed to the scheduler class. See distributed.scheduler.Scheduler.

docker_image: string (optional)

The Docker image to run on all instances.

This image must have a valid Python environment and have dask installed in order for the dask-scheduler and dask-worker commands to be available. It is recommended the Python environment matches your local environment where EC2Cluster is being created from.

For GPU instance types the Docker image much have NVIDIA drivers and dask-cuda installed.

By default the daskdev/dask:latest image will be used.

docker_args: string (optional)

Extra command line arguments to pass to Docker.

extra_bootstrap: list[str] (optional)

Extra commands to be run during the bootstrap phase.

env_vars: dict (optional)

Environment variables to be passed to the worker.

silence_logs: bool

Whether or not we should silence logging when setting up the cluster.

asynchronous: bool

If this is intended to be used directly within an event loop with async/await

securitySecurity or bool, optional

Configures communication security in this cluster. Can be a security object, or True. If True, temporary self-signed credentials will be created automatically. Default is True.

debug: bool, optional

More information will be printed when constructing clusters to enable debugging.

Examples

Create the cluster.

>>> from dask_cloudprovider.digitalocean import DropletCluster
>>> cluster = DropletCluster(n_workers=1)
Creating scheduler instance
Created droplet dask-38b817c1-scheduler
Waiting for scheduler to run
Scheduler is running
Creating worker instance
Created droplet dask-38b817c1-worker-dc95260d

Connect a client.

>>> from dask.distributed import Client
>>> client = Client(cluster)

Do some work.

>>> import dask.array as da
>>> arr = da.random.random((1000, 1000), chunks=(100, 100))
>>> arr.mean().compute()
0.5001550986751964

Close the cluster

>>> client.close()
>>> cluster.close()
Terminated droplet dask-38b817c1-worker-dc95260d
Terminated droplet dask-38b817c1-scheduler

You can also do this all in one go with context managers to ensure the cluster is created and cleaned up.

>>> with DropletCluster(n_workers=1) as cluster:
...     with Client(cluster) as client:
...         print(da.random.random((1000, 1000), chunks=(100, 100)).mean().compute())
Creating scheduler instance
Created droplet dask-48efe585-scheduler
Waiting for scheduler to run
Scheduler is running
Creating worker instance
Created droplet dask-48efe585-worker-5181aaf1
0.5000558682356162
Terminated droplet dask-48efe585-worker-5181aaf1
Terminated droplet dask-48efe585-scheduler
Attributes
asynchronous

Are we running in the event loop?

auto_shutdown
bootstrap
command
dashboard_link
docker_image
gpu_instance
loop
name
observed
plan
requested
scheduler_address
scheduler_class
worker_class

Methods

adapt([Adaptive, minimum, maximum, ...])

Turn on adaptivity

call_async(f, *args, **kwargs)

Run a blocking function in a thread as a coroutine.

from_name(name)

Create an instance of this class to represent an existing cluster by name.

get_client()

Return client for the cluster

get_logs([cluster, scheduler, workers])

Return logs for the cluster, scheduler and workers

get_tags()

Generate tags to be applied to all resources.

new_worker_spec()

Return name and spec for the next worker

scale([n, memory, cores])

Scale cluster to n workers

scale_up([n, memory, cores])

Scale cluster to n workers

sync(func, *args[, asynchronous, ...])

Call func with args synchronously or asynchronously depending on the calling context

wait_for_workers([n_workers, timeout])

Blocking call to wait for n workers before continuing

close

get_cloud_init

logs

render_cloud_init

render_process_cloud_init

scale_down