Microsoft Azure

AzureVMCluster(location, resource_group, …) Cluster running on Azure Virtual machines.

Overview

Authentication

In order to create clusters on Azure you need to set your authentication credentials. You can do this via the az command line tool.

$ az login

Note

Setting the default output to table with az configure will make the az tool much easier to use.

Resource Groups

To create resources on Azure they must be placed in a resource group. Dask Cloudprovider will need a group to create Dask components in.

You can list existing groups via the cli.

$ az group list

You can also create a new resource group if you do not have an existing one.

$ az group create --location <location> --name <resource group name> --subscription <subscription>

You can get a full list of locations with az account list-locations and subscriptions with az account list.

Take note of your resource group name for later.

Virtual Networks

Compute resources on Azure must be placed in virtual networks (vnet). Dask Cloudprovider will require an existing vnet to connect compute resources to.

You can list existing vnets via the cli.

$ az network vnet list

You can also create a new vnet via the cli.

$ az network vnet create -g <resource group name> -n <vnet name> --address-prefix 10.0.0.0/16 \
      --subnet-name <subnet name> --subnet-prefix 10.0.0.0/24

This command will create a new vnet in your resource group with one subnet with the 10.0.0.0/24 prefix. For more than 255 compute resources you will need additional subnets.

Take note of your vnet name for later.

Security Groups

To allow network traffic to reach your Dask cluster you will need to create a security group which allows traffic on ports 8786-8787 from wherever you are.

You can list existing security groups via the cli.

$ az network nsg list

Or you can create a new security group.

$ az network nsg create -g <resource group name> --name <security group name>
$ az network nsg rule create -g <resource group name> --nsg-name <security group name> -n MyNsgRuleWithAsg \
      --priority 500 --source-address-prefixes Internet --destination-port-ranges 8786 8787 \
      --destination-address-prefixes '*' --access Allow --protocol Tcp --description "Allow Internet to Dask on ports 8786,8787."

This example allows all traffic to 8786-8787 from the internet. It is recommended you make your rules more restrictive than this by limiting it to your corporate network or specific IP.

Again take note of this security group name for later.

AzureVM

class dask_cloudprovider.azure.AzureVMCluster(location: str = None, resource_group: str = None, vnet: str = None, security_group: str = None, public_ingress: bool = None, vm_size: str = None, vm_image: dict = {}, bootstrap: bool = None, auto_shutdown: bool = None, docker_image=None, **kwargs)[source]

Cluster running on Azure Virtual machines.

This cluster manager constructs a Dask cluster running on Azure Virtual Machines.

When configuring your cluster you may find it useful to install the az tool for querying the Azure API for available options.

https://docs.microsoft.com/en-us/cli/azure/install-azure-cli

Parameters:
location: str

The Azure location to launch you cluster in. List available locations with az account list-locations.

resource_group: str

The resource group to create components in. List your resource groups with az group list.

vnet: str

The vnet to attach VM network interfaces to. List your vnets with az network vnet list.

security_group: str

The security group to apply to your VMs. This must allow ports 8786-8787 from wherever you are running this from. List your security greoups with az network nsg list.

public_ingress: bool

Assign a public IP address to the scheduler. Default True.

vm_size: str

Azure VM size to use for scheduler and workers. Default Standard_DS1_v2. List available VM sizes with az vm list-sizes --location <location>.

vm_image: dict

By default all VMs will use the latest Ubuntu LTS release with the following configuration

{"publisher": "Canonical", "offer": "UbuntuServer","sku": "18.04-LTS", "version": "latest"}

You can override any of these options by passing a dict with matching keys here. For example if you wish to try Ubuntu 19.04 you can pass {"sku": "19.04"} and the publisher, offer and version will be used from the default.

bootstrap: bool (optional)

It is assumed that the VHD will not have Docker installed (or the NVIDIA drivers for GPU instances). If bootstrap is True these dependencies will be installed on instance start. If you are using a custom VHD which already has these dependencies set this to False.

auto_shutdown: bool (optional)

Shutdown the VM if the Dask process exits. Default True.

worker_module: str

The Dask worker module to start on worker VMs.

n_workers: int

Number of workers to initialise the cluster with. Defaults to 0.

worker_module: str

The Python module to run for the worker. Defaults to distributed.cli.dask_worker

worker_options: dict

Params to be passed to the worker class. See distributed.worker.Worker for default worker class. If you set worker_module then refer to the docstring for the custom worker class.

scheduler_options: dict

Params to be passed to the scheduler class. See distributed.scheduler.Scheduler.

docker_image: string (optional)

The Docker image to run on all instances.

This image must have a valid Python environment and have dask installed in order for the dask-scheduler and dask-worker commands to be available. It is recommended the Python environment matches your local environment where AzureVMCluster is being created from.

For GPU instance types the Docker image much have NVIDIA drivers and dask-cuda installed.

By default the daskdev/dask:latest image will be used.

silence_logs: bool

Whether or not we should silence logging when setting up the cluster.

asynchronous: bool

If this is intended to be used directly within an event loop with async/await

security : Security or bool, optional

Configures communication security in this cluster. Can be a security object, or True. If True, temporary self-signed credentials will be created automatically.

Examples

Minimal example

Create the cluster

>>> from dask_cloudprovider.azure import AzureVMCluster
>>> cluster = AzureVMCluster(resource_group="<resource group>",
...                          vnet="<vnet>",
...                          security_group="<security group>",
...                          n_workers=1)
Creating scheduler instance
Assigned public IP
Network interface ready
Creating VM
Created VM dask-5648cc8b-scheduler
Waiting for scheduler to run
Scheduler is running
Creating worker instance
Network interface ready
Creating VM
Created VM dask-5648cc8b-worker-e1ebfc0e

Connect a client.

>>> from dask.distributed import Client
>>> client = Client(cluster)

Do some work.

>>> import dask.array as da
>>> arr = da.random.random((1000, 1000), chunks=(100, 100))
>>> arr.mean().compute()
0.5004117488368686

Close the cluster.

>>> client.close()
>>> cluster.close()
Terminated VM dask-5648cc8b-worker-e1ebfc0e
Removed disks for VM dask-5648cc8b-worker-e1ebfc0e
Deleted network interface
Terminated VM dask-5648cc8b-scheduler
Removed disks for VM dask-5648cc8b-scheduler
Deleted network interface
Unassigned public IP

You can also do this all in one go with context managers to ensure the cluster is created and cleaned up.

>>> with AzureVMCluster(resource_group="<resource group>",
...                     vnet="<vnet>",
...                     security_group="<security group>",
...                     n_workers=1) as cluster:
...     with Client(cluster) as client:
...             print(da.random.random((1000, 1000), chunks=(100, 100)).mean().compute())
Creating scheduler instance
Assigned public IP
Network interface ready
Creating VM
Created VM dask-1e6dac4e-scheduler
Waiting for scheduler to run
Scheduler is running
Creating worker instance
Network interface ready
Creating VM
Created VM dask-1e6dac4e-worker-c7c4ca23
0.4996427609642539
Terminated VM dask-1e6dac4e-worker-c7c4ca23
Removed disks for VM dask-1e6dac4e-worker-c7c4ca23
Deleted network interface
Terminated VM dask-1e6dac4e-scheduler
Removed disks for VM dask-1e6dac4e-scheduler
Deleted network interface
Unassigned public IP

RAPIDS example

You can also use AzureVMCluster to run a GPU enabled cluster and leverage the RAPIDS accelerated libraries.

>>> cluster = AzureVMCluster(resource_group="<resource group>",
...                          vnet="<vnet>",
...                          security_group="<security group>",
...                          n_workers=1,
...                          vm_size="Standard_NC12s_v3",  # Or any NVIDIA GPU enabled size
...                          docker_image="rapidsai/rapidsai:cuda11.0-runtime-ubuntu18.04-py3.8",
...                          worker_class="dask_cuda.CUDAWorker")
>>> from dask.distributed import Client
>>> client = Client(cluster)

Run some GPU code.

>>> def get_gpu_model():
...     import pynvml
...     pynvml.nvmlInit()
...     return pynvml.nvmlDeviceGetName(pynvml.nvmlDeviceGetHandleByIndex(0))
>>> client.submit(get_gpu_model).result()
b'Tesla V100-PCIE-16GB'

Close the cluster.

>>> client.close()
>>> cluster.close()
Attributes:
asynchronous
auto_shutdown
bootstrap
command
dashboard_link
docker_image
gpu_instance
observed
plan
requested
scheduler_address
scheduler_class
worker_class

Methods

adapt(*args[, minimum, maximum]) Turn on adaptivity
get_logs([cluster, scheduler, workers]) Return logs for the cluster, scheduler and workers
get_tags() Generate tags to be applied to all resources.
new_worker_spec() Return name and spec for the next worker
scale([n, memory, cores]) Scale cluster to n workers
scale_up([n, memory, cores]) Scale cluster to n workers
close  
get_cloud_init  
logs  
render_cloud_init  
scale_down  
sync