Dask Cloud Provider¶
Native Cloud integration for Dask.
This library creates Dask clusters on a given cloud provider with no set up other than having credentials. Currently, it only supports AWS.
$ pip install dask-cloudprovider
$ conda install -c conda-forge dask-cloudprovider
Below are the different modules for creating clusters on various cloud providers.
In order to create clusters on AWS you need to set your access key, secret key and region. The simplest way is to use the aws command line tool.
$ pip install awscli $ aws configure
FargateCluster will create a new Fargate ECS cluster by default along
with all the IAM roles, security groups, and so on that it needs to function.
from dask_cloudprovider import FargateCluster cluster = FargateCluster()
⚠ All AWS resources created by
FargateClustershould be removed on garbage collection. If the process is killed harshly this will not happen.
Note that in many cases you will want to specify a custom Docker image to
FargateCluster so that Dask has the packages it needs to execute your workflow.
from dask_cloudprovider import FargateCluster cluster = FargateCluster(image="<hub-user>/<repo-name>[:<tag>]")
One strategy to ensure that package versions match between your custom environment and the Docker container is to create your environment from an
environment.yml file, export the exact package list for that environment using
conda list --export > package-list.txt, and then use the pinned package versions contained in
package-list.txt in your Dockerfile. You could use the default Dask Dockerfile as a template and simply add your pinned additional packages.
You can also create Dask clusters using EC2 based ECS clusters using
Creating the ECS cluster is out of scope for this library but you can pass in the ARN of an existing one like this:
from dask_cloudprovider import ECSCluster cluster = ECSCluster(cluster_arn="arn:aws:ecs:<region>:<acctid>:cluster/<clustername>")
All the other required resources such as roles, task definitions, tasks, etc
will be created automatically like in
There is also support in
ECSCluster for GPU aware Dask clusters. To do
this you need to create an ECS cluster with GPU capable instances (from the
p3dn families) and specify the number of GPUs each worker task
from dask_cloudprovider import ECSCluster cluster = ECSCluster( cluster_arn="arn:aws:ecs:<region>:<acctid>:cluster/<gpuclustername>", worker_gpu=1)
By setting the
worker_gpu option to something other than
None will cause the cluster
dask-cuda-worker as the worker startup command. Setting this option will also change
the default Docker image to
rapidsai/rapidsai:latest, if you’re using a custom image
you must ensure the NVIDIA CUDA toolkit is installed with a version that matches the host machine