Microsoft Azure Machine Learning¶
Warning
The Azure ML integration has been deprecated and will be removed in a future release.
Please use the dask_cloudprovider.azure.AzureVMCluster
cluster manager instead.
AzureMLCluster (workspace[, compute_target, …]) |
Deploy a Dask cluster using Azure ML |
Overview¶
To start using dask_cloudprovider.AzureMLCluster
you need, at a minimum,
an Azure subscription and
an AzureML Workspace.
AzureML¶
-
class
dask_cloudprovider.azureml.
AzureMLCluster
(workspace, compute_target=None, environment_definition=None, experiment_name=None, initial_node_count=None, jupyter=None, jupyter_port=None, dashboard_port=None, scheduler_port=None, scheduler_idle_timeout=None, worker_death_timeout=None, additional_ports=None, admin_username=None, admin_ssh_key=None, datastores=None, code_store=None, vnet_resource_group=None, vnet=None, subnet=None, show_output=False, telemetry_opt_out=None, asynchronous=False, **kwargs)[source]¶ Deploy a Dask cluster using Azure ML
This creates a dask scheduler and workers on an Azure ML Compute Target.
Parameters: - workspace: azureml.core.Workspace (required)
Azure ML Workspace - see https://aka.ms/azureml/workspace.
- vm_size: str (optional)
Azure VM size to be used in the Compute Target - see https://aka.ms/azureml/vmsizes.
- datastores: List[Datastore] (optional)
List of Azure ML Datastores to be mounted on the headnode - see https://aka.ms/azureml/data and https://aka.ms/azureml/datastores.
Defaults to
[]
. To mount all datastores in the workspace, set tows.datastores.values()
.- environment_definition: azureml.core.Environment (optional)
Azure ML Environment - see https://aka.ms/azureml/environments.
Defaults to the “AzureML-Dask-CPU” or “AzureML-Dask-GPU” curated environment.
- scheduler_idle_timeout: int (optional)
Number of idle seconds leading to scheduler shut down.
Defaults to
1200
(20 minutes).- experiment_name: str (optional)
The name of the Azure ML Experiment used to control the cluster.
Defaults to
dask-cloudprovider
.- initial_node_count: int (optional)
The initial number of nodes for the Dask Cluster.
Defaults to
1
.- jupyter: bool (optional)
Flag to start JupyterLab session on the headnode of the cluster.
Defaults to
False
.- jupyter_port: int (optional)
Port on headnode to use for hosting JupyterLab session.
Defaults to
9000
.- dashboard_port: int (optional)
Port on headnode to use for hosting Dask dashboard.
Defaults to
9001
.- scheduler_port: int (optional)
Port to map the scheduler port to via SSH-tunnel if machine not on the same VNET.
Defaults to
9002
.- worker_death_timeout: int (optional)
Number of seconds to wait for a worker to respond before removing it.
Defaults to
30
.- additional_ports: list[tuple[int, int]] (optional)
Additional ports to forward. This requires a list of tuples where the first element is the port to open on the headnode while the second element is the port to map to or forward via the SSH-tunnel.
Defaults to
[]
.- compute_target: azureml.core.ComputeTarget (optional)
Azure ML Compute Target - see https://aka.ms/azureml/computetarget.
- admin_username: str (optional)
Username of the admin account for the AzureML Compute. Required for runs that are not on the same VNET. Defaults to empty string. Throws Exception if machine not on the same VNET.
Defaults to
""
.- admin_ssh_key: str (optional)
Location of the SSH secret key used when creating the AzureML Compute. The key should be passwordless if run from a Jupyter notebook. The
id_rsa
file needs to have 0700 permissions set. Required for runs that are not on the same VNET. Defaults to empty string. Throws Exception if machine not on the same VNET.Defaults to
""
.- vnet: str (optional)
Name of the virtual network.
- subnet: str (optional)
Name of the subnet inside the virtual network
vnet
.- vnet_resource_group: str (optional)
Name of the resource group where the virtual network
vnet
is located. If not passed, but names forvnet
andsubnet
are passed,vnet_resource_group
is assigned with the name of resource group associated withworkspace
- telemetry_opt_out: bool (optional)
A boolean parameter. Defaults to logging a version of AzureMLCluster with Microsoft. Set this flag to False if you do not want to share this information with Microsoft. Microsoft is not tracking anything else you do in your Dask cluster nor any other information related to your workload.
- asynchronous: bool (optional)
Flag to run jobs asynchronously.
- **kwargs: dict
Additional keyword arguments.
Examples
First, import all necessary modules.
>>> from azureml.core import Workspace >>> from dask_cloudprovider.azure import AzureMLCluster
Next, create the
Workspace
object given your AzureMLWorkspace
parameters. Check more in the AzureML documentation for Workspace.You can use
ws = Workspace.from_config()
after downloading the config file from the Azure Portal or ML Studio.>>> subscription_id = "<your-subscription-id-here>" >>> resource_group = "<your-resource-group>" >>> workspace_name = "<your-workspace-name>"
>>> ws = Workspace( ... workspace_name=workspace_name, ... subscription_id=subscription_id, ... resource_group=resource_group ... )
Then create the cluster.
>>> amlcluster = AzureMLCluster( ... # required ... ws, ... # optional ... vm_size="STANDARD_DS13_V2", # Azure VM size for the Compute Target ... datastores=ws.datastores.values(), # Azure ML Datastores to mount on the headnode ... environment_definition=ws.environments['AzureML-Dask-CPU'], # Azure ML Environment to run on the cluster ... jupyter=true, # Start JupyterLab session on the headnode ... initial_node_count=2, # number of nodes to start ... scheduler_idle_timeout=7200 # scheduler idle timeout in seconds ... )
Once the cluster has started, the Dask Cluster widget will print out two links:
- Jupyter link to a Jupyter Lab instance running on the headnode.
- Dask Dashboard link.
Note that
AzureMLCluster
uses IPython Widgets to present this information, so if you are working in Jupyter Lab and see text that starts withVBox(children=
…, make sure you have enabled the IPython Widget extension.To connect to the Jupyter Lab session running on the cluster from your own computer, click the link provided in the widget printed above, or if you need the link directly it is stored in
amlcluster.jupyter_link
.Once connected, you’ll be in an AzureML Run session. To connect Dask from within the session, just run to following code to connect dask to the cluster:
from azureml.core import Run from dask.distributed import Client run = Run.get_context() c = Client(run.get_metrics()["scheduler"])
You can stop the cluster with amlcluster.close(). The cluster will automatically spin down if unused for 20 minutes by default. Alternatively, you can delete the Azure ML Compute Target or cancel the Run from the Python SDK or UI to stop the cluster.
Attributes: - asynchronous
dashboard_link
Link to Dask dashboard.
jupyter_link
Link to JupyterLab on running on the headnode of the cluster.
- name
- observed
- plan
- requested
- scheduler_address
Methods
adapt
([Adaptive])Turn on adaptivity close
()Close the cluster. get_logs
([cluster, scheduler, workers])Return logs for the cluster, scheduler and workers scale
([workers])Scale the cluster. scale_down
([workers])Scale down the number of workers. scale_up
([workers])Scale up the number of workers. close_when_disconnect logs sync -
close
()[source]¶ Close the cluster. All Azure ML Runs corresponding to the scheduler and worker processes will be completed. The Azure ML Compute Target will return to its minimum number of nodes after its idle time before scaledown.
-
dashboard_link
¶ Link to Dask dashboard.
-
jupyter_link
¶ Link to JupyterLab on running on the headnode of the cluster. Set
jupyter=True
when creating theAzureMLCluster
.