DigitalOcean
Contents
DigitalOcean¶
|
Cluster running on Digital Ocean droplets. |
Overview¶
Authentication¶
To authenticate with DigitalOcean you must first generate a personal access token.
Then you must put this in your Dask configuration at cloudprovider.digitalocean.token
. This can be done by
adding the token to your YAML configuration or exporting an environment variable.
# ~/.config/dask/cloudprovider.yaml
cloudprovider:
digitalocean:
token: "yourtoken"
$ export DASK_CLOUDPROVIDER__DIGITALOCEAN__TOKEN="yourtoken"
Droplet¶
- class dask_cloudprovider.digitalocean.DropletCluster(region: str = None, size: str = None, image: str = None, debug: bool = False, **kwargs)[source]¶
Cluster running on Digital Ocean droplets.
VMs in DigitalOcean (DO) are referred to as droplets. This cluster manager constructs a Dask cluster running on VMs.
When configuring your cluster you may find it useful to install the
doctl
tool for querying the DO API for available options.https://www.digitalocean.com/docs/apis-clis/doctl/how-to/install/
- Parameters
- region: str
The DO region to launch you cluster in. A full list can be obtained with
doctl compute region list
.- size: str
The VM size slug. You can get a full list with
doctl compute size list
. The default iss-1vcpu-1gb
which is 1GB RAM and 1 vCPU- image: str
The image ID to use for the host OS. This should be a Ubuntu variant. You can list available images with
doctl compute image list --public | grep ubuntu.*x64
.- worker_module: str
The Dask worker module to start on worker VMs.
- n_workers: int
Number of workers to initialise the cluster with. Defaults to
0
.- worker_module: str
The Python module to run for the worker. Defaults to
distributed.cli.dask_worker
- worker_options: dict
Params to be passed to the worker class. See
distributed.worker.Worker
for default worker class. If you setworker_module
then refer to the docstring for the custom worker class.- scheduler_options: dict
Params to be passed to the scheduler class. See
distributed.scheduler.Scheduler
.- docker_image: string (optional)
The Docker image to run on all instances.
This image must have a valid Python environment and have
dask
installed in order for thedask-scheduler
anddask-worker
commands to be available. It is recommended the Python environment matches your local environment whereEC2Cluster
is being created from.For GPU instance types the Docker image much have NVIDIA drivers and
dask-cuda
installed.By default the
daskdev/dask:latest
image will be used.- docker_args: string (optional)
Extra command line arguments to pass to Docker.
- extra_bootstrap: list[str] (optional)
Extra commands to be run during the bootstrap phase.
- env_vars: dict (optional)
Environment variables to be passed to the worker.
- silence_logs: bool
Whether or not we should silence logging when setting up the cluster.
- asynchronous: bool
If this is intended to be used directly within an event loop with async/await
- securitySecurity or bool, optional
Configures communication security in this cluster. Can be a security object, or True. If True, temporary self-signed credentials will be created automatically. Default is
True
.- debug: bool, optional
More information will be printed when constructing clusters to enable debugging.
Examples
Create the cluster.
>>> from dask_cloudprovider.digitalocean import DropletCluster >>> cluster = DropletCluster(n_workers=1) Creating scheduler instance Created droplet dask-38b817c1-scheduler Waiting for scheduler to run Scheduler is running Creating worker instance Created droplet dask-38b817c1-worker-dc95260d
Connect a client.
>>> from dask.distributed import Client >>> client = Client(cluster)
Do some work.
>>> import dask.array as da >>> arr = da.random.random((1000, 1000), chunks=(100, 100)) >>> arr.mean().compute() 0.5001550986751964
Close the cluster
>>> client.close() >>> cluster.close() Terminated droplet dask-38b817c1-worker-dc95260d Terminated droplet dask-38b817c1-scheduler
You can also do this all in one go with context managers to ensure the cluster is created and cleaned up.
>>> with DropletCluster(n_workers=1) as cluster: ... with Client(cluster) as client: ... print(da.random.random((1000, 1000), chunks=(100, 100)).mean().compute()) Creating scheduler instance Created droplet dask-48efe585-scheduler Waiting for scheduler to run Scheduler is running Creating worker instance Created droplet dask-48efe585-worker-5181aaf1 0.5000558682356162 Terminated droplet dask-48efe585-worker-5181aaf1 Terminated droplet dask-48efe585-scheduler
- Attributes
asynchronous
Are we running in the event loop?
- auto_shutdown
- bootstrap
- called_from_running_loop
- command
- dashboard_link
- docker_image
- gpu_instance
- loop
- name
- observed
- plan
- requested
- scheduler_address
- scheduler_class
- worker_class
Methods
adapt
([Adaptive, minimum, maximum, ...])Turn on adaptivity
call_async
(f, *args, **kwargs)Run a blocking function in a thread as a coroutine.
from_name
(name)Create an instance of this class to represent an existing cluster by name.
get_client
()Return client for the cluster
get_logs
([cluster, scheduler, workers])Return logs for the cluster, scheduler and workers
get_tags
()Generate tags to be applied to all resources.
new_worker_spec
()Return name and spec for the next worker
scale
([n, memory, cores])Scale cluster to n workers
scale_up
([n, memory, cores])Scale cluster to n workers
sync
(func, *args[, asynchronous, ...])Call func with args synchronously or asynchronously depending on the calling context
wait_for_workers
(n_workers[, timeout])Blocking call to wait for n workers before continuing
close
get_cloud_init
logs
render_cloud_init
render_process_cloud_init
scale_down