Microsoft Azure Machine Learning


The Azure ML integration has been deprecated and will be removed in a future release. Please use the cluster manager instead.

AzureMLCluster(workspace[, compute_target, …]) Deploy a Dask cluster using Azure ML


To start using dask_cloudprovider.AzureMLCluster you need, at a minimum, an Azure subscription and an AzureML Workspace.


class dask_cloudprovider.azureml.AzureMLCluster(workspace, compute_target=None, environment_definition=None, experiment_name=None, initial_node_count=None, jupyter=None, jupyter_port=None, dashboard_port=None, scheduler_port=None, scheduler_idle_timeout=None, worker_death_timeout=None, additional_ports=None, admin_username=None, admin_ssh_key=None, datastores=None, code_store=None, vnet_resource_group=None, vnet=None, subnet=None, show_output=False, telemetry_opt_out=None, asynchronous=False, **kwargs)[source]

Deploy a Dask cluster using Azure ML

This creates a dask scheduler and workers on an Azure ML Compute Target.

workspace: azureml.core.Workspace (required)

Azure ML Workspace - see

vm_size: str (optional)

Azure VM size to be used in the Compute Target - see

datastores: List[Datastore] (optional)

List of Azure ML Datastores to be mounted on the headnode - see and

Defaults to []. To mount all datastores in the workspace, set to ws.datastores.values().

environment_definition: azureml.core.Environment (optional)

Azure ML Environment - see

Defaults to the “AzureML-Dask-CPU” or “AzureML-Dask-GPU” curated environment.

scheduler_idle_timeout: int (optional)

Number of idle seconds leading to scheduler shut down.

Defaults to 1200 (20 minutes).

experiment_name: str (optional)

The name of the Azure ML Experiment used to control the cluster.

Defaults to dask-cloudprovider.

initial_node_count: int (optional)

The initial number of nodes for the Dask Cluster.

Defaults to 1.

jupyter: bool (optional)

Flag to start JupyterLab session on the headnode of the cluster.

Defaults to False.

jupyter_port: int (optional)

Port on headnode to use for hosting JupyterLab session.

Defaults to 9000.

dashboard_port: int (optional)

Port on headnode to use for hosting Dask dashboard.

Defaults to 9001.

scheduler_port: int (optional)

Port to map the scheduler port to via SSH-tunnel if machine not on the same VNET.

Defaults to 9002.

worker_death_timeout: int (optional)

Number of seconds to wait for a worker to respond before removing it.

Defaults to 30.

additional_ports: list[tuple[int, int]] (optional)

Additional ports to forward. This requires a list of tuples where the first element is the port to open on the headnode while the second element is the port to map to or forward via the SSH-tunnel.

Defaults to [].

compute_target: azureml.core.ComputeTarget (optional)

Azure ML Compute Target - see

admin_username: str (optional)

Username of the admin account for the AzureML Compute. Required for runs that are not on the same VNET. Defaults to empty string. Throws Exception if machine not on the same VNET.

Defaults to "".

admin_ssh_key: str (optional)

Location of the SSH secret key used when creating the AzureML Compute. The key should be passwordless if run from a Jupyter notebook. The id_rsa file needs to have 0700 permissions set. Required for runs that are not on the same VNET. Defaults to empty string. Throws Exception if machine not on the same VNET.

Defaults to "".

vnet: str (optional)

Name of the virtual network.

subnet: str (optional)

Name of the subnet inside the virtual network vnet.

vnet_resource_group: str (optional)

Name of the resource group where the virtual network vnet is located. If not passed, but names for vnet and subnet are passed, vnet_resource_group is assigned with the name of resource group associated with workspace

telemetry_opt_out: bool (optional)

A boolean parameter. Defaults to logging a version of AzureMLCluster with Microsoft. Set this flag to False if you do not want to share this information with Microsoft. Microsoft is not tracking anything else you do in your Dask cluster nor any other information related to your workload.

asynchronous: bool (optional)

Flag to run jobs asynchronously.

**kwargs: dict

Additional keyword arguments.


First, import all necessary modules.

>>> from azureml.core import Workspace
>>> from import AzureMLCluster

Next, create the Workspace object given your AzureML Workspace parameters. Check more in the AzureML documentation for Workspace.

You can use ws = Workspace.from_config() after downloading the config file from the Azure Portal or ML Studio.

>>> subscription_id = "<your-subscription-id-here>"
>>> resource_group = "<your-resource-group>"
>>> workspace_name = "<your-workspace-name>"
>>> ws = Workspace(
...     workspace_name=workspace_name,
...     subscription_id=subscription_id,
...     resource_group=resource_group
... )

Then create the cluster.

>>> amlcluster = AzureMLCluster(
...     # required
...     ws,
...     # optional
...     vm_size="STANDARD_DS13_V2",                                 # Azure VM size for the Compute Target
...     datastores=ws.datastores.values(),                          # Azure ML Datastores to mount on the headnode
...     environment_definition=ws.environments['AzureML-Dask-CPU'], # Azure ML Environment to run on the cluster
...     jupyter=true,                                               # Start JupyterLab session on the headnode
...     initial_node_count=2,                                       # number of nodes to start
...     scheduler_idle_timeout=7200                                 # scheduler idle timeout in seconds
... )

Once the cluster has started, the Dask Cluster widget will print out two links:

  1. Jupyter link to a Jupyter Lab instance running on the headnode.
  2. Dask Dashboard link.

Note that AzureMLCluster uses IPython Widgets to present this information, so if you are working in Jupyter Lab and see text that starts with VBox(children=…, make sure you have enabled the IPython Widget extension.

To connect to the Jupyter Lab session running on the cluster from your own computer, click the link provided in the widget printed above, or if you need the link directly it is stored in amlcluster.jupyter_link.

Once connected, you’ll be in an AzureML Run session. To connect Dask from within the session, just run to following code to connect dask to the cluster:

from azureml.core import Run
from dask.distributed import Client

run = Run.get_context()
c = Client(run.get_metrics()["scheduler"])

You can stop the cluster with amlcluster.close(). The cluster will automatically spin down if unused for 20 minutes by default. Alternatively, you can delete the Azure ML Compute Target or cancel the Run from the Python SDK or UI to stop the cluster.


Link to Dask dashboard.


Link to JupyterLab on running on the headnode of the cluster.



adapt([Adaptive]) Turn on adaptivity
close() Close the cluster.
get_logs([cluster, scheduler, workers]) Return logs for the cluster, scheduler and workers
scale([workers]) Scale the cluster.
scale_down([workers]) Scale down the number of workers.
scale_up([workers]) Scale up the number of workers.

Close the cluster. All Azure ML Runs corresponding to the scheduler and worker processes will be completed. The Azure ML Compute Target will return to its minimum number of nodes after its idle time before scaledown.

Link to Dask dashboard.

Link to JupyterLab on running on the headnode of the cluster. Set jupyter=True when creating the AzureMLCluster.


Scale the cluster. Scales to a maximum of the workers available in the cluster.


Scale down the number of workers. Scales to minimum of 1.


Scale up the number of workers.