API

ECSCluster([fargate_scheduler, …]) Deploy a Dask cluster using ECS
FargateCluster(**kwargs) Deploy a Dask cluster using Fargate on ECS
class dask_cloudprovider.ECSCluster(fargate_scheduler=False, fargate_workers=False, image=None, scheduler_cpu=None, scheduler_mem=None, scheduler_timeout=None, worker_cpu=None, worker_mem=None, worker_gpu=None, n_workers=None, cluster_arn=None, cluster_name_template=None, execution_role_arn=None, task_role_arn=None, task_role_policies=None, cloudwatch_logs_group=None, cloudwatch_logs_stream_prefix=None, cloudwatch_logs_default_retention=None, vpc=None, subnets=None, security_groups=None, environment=None, tags=None, skip_cleanup=None, **kwargs)[source]

Deploy a Dask cluster using ECS

This creates a dask scheduler and workers on an ECS cluster. If you do not configure a cluster one will be created for you with sensible defaults.

Parameters:
fargate_scheduler: bool (optional)

Select whether or not to use fargate for the scheduler.

Defaults to False. You must provide an existing cluster.

fargate_workers: bool (optional)

Select whether or not to use fargate for the workers.

Defaults to False. You must provide an existing cluster.

image: str (optional)

The docker image to use for the scheduler and worker tasks.

Defaults to daskdev/dask:latest or rapidsai/rapidsai:latest if worker_gpu is set.

scheduler_cpu: int (optional)

The amount of CPU to request for the scheduler in milli-cpu (1/1024).

Defaults to 1024 (one vCPU).

scheduler_mem: int (optional)

The amount of memory to request for the scheduler in MB.

Defaults to 4096 (4GB).

scheduler_timeout: str (optional)

The scheduler task will exit after this amount of time if there are no clients connected.

Defaults to 5 minutes.

worker_cpu: int (optional)

The amount of CPU to request for worker tasks in milli-cpu (1/1024).

Defaults to 4096 (four vCPUs).

worker_mem: int (optional)

The amount of memory to request for worker tasks in MB.

Defaults to 16384 (16GB).

worker_gpu: int (optional)

The number of GPUs to expose to the worker.

To provide GPUs to workers you need to use a GPU ready docker image that has dask-cuda installed and GPU nodes available in your ECS cluster. Fargate is not supported at this time.

Defaults to None, no GPUs.

n_workers: int (optional)

Number of workers to start on cluster creation.

Defaults to None.

cluster_arn: str (optional if fargate is true)

The ARN of an existing ECS cluster to use for launching tasks.

Defaults to None which results in a new cluster being created for you.

cluster_name_template: str (optional)

A template to use for the cluster name if cluster_arn is set to None.

Defaults to 'dask-{uuid}'

execution_role_arn: str (optional)

The ARN of an existing IAM role to use for ECS execution.

This ARN must have sts:AssumeRole allowed for ecs-tasks.amazonaws.com and allow the following permissions:

  • ecr:GetAuthorizationToken
  • ecr:BatchCheckLayerAvailability
  • ecr:GetDownloadUrlForLayer
  • ecr:GetRepositoryPolicy
  • ecr:DescribeRepositories
  • ecr:ListImages
  • ecr:DescribeImages
  • ecr:BatchGetImage
  • logs:*
  • ec2:AuthorizeSecurityGroupIngress
  • ec2:Describe*
  • elasticloadbalancing:DeregisterInstancesFromLoadBalancer
  • elasticloadbalancing:DeregisterTargets
  • elasticloadbalancing:Describe*
  • elasticloadbalancing:RegisterInstancesWithLoadBalancer
  • elasticloadbalancing:RegisterTargets

Defaults to None (one will be created for you).

task_role_arn: str (optional)

The ARN for an existing IAM role for tasks to assume. This defines which AWS resources the dask workers can access directly. Useful if you need to read from S3 or a database without passing credentials around.

Defaults to None (one will be created with S3 read permission only).

task_role_policies: List[str] (optional)

If you do not specify a task_role_arn you may want to list some IAM Policy ARNs to be attached to the role that will be created for you.

E.g if you need your workers to read from S3 you could add arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess.

Default None (no policies will be attached to the role)

cloudwatch_logs_group: str (optional)

The name of an existing cloudwatch log group to place logs into.

Default None (one will be created called dask-ecs)

cloudwatch_logs_stream_prefix: str (optional)

Prefix for log streams.

Defaults to the cluster name.

cloudwatch_logs_default_retention: int (optional)

Retention for logs in days. For use when log group is auto created.

Defaults to 30.

vpc: str (optional)

The ID of the VPC you wish to launch your cluster in.

Defaults to None (your default VPC will be used).

subnets: List[str] (optional)

A list of subnets to use when running your task.

Defaults to None. (all subnets available in your VPC will be used)

security_groups: List[str] (optional)

A list of security group IDs to use when launching tasks.

Defaults to None (one will be created which allows all traffic between tasks and access to ports 8786 and 8787 from anywhere).

environment: dict (optional)

Extra environment variables to pass to the scheduler and worker tasks.

Useful for setting EXTRA_APT_PACKAGES, EXTRA_CONDA_PACKAGES and `EXTRA_PIP_PACKAGES if you’ree using the default image.

Defaults to None.

tags: dict (optional)

Tags to apply to all resources created automatically.

Defaults to None. Tags will always include {"createdBy": "dask-cloudprovider"}

skip_cleanup: bool (optional)

Skip cleaning up of stale resources. Useful if you have lots of resources and this operation takes a while.

Default False.

**kwargs: dict

Additional keyword arguments to pass to SpecCluster.

Attributes:
asynchronous
dashboard_link
observed
plan
requested
scheduler_address
tags

Methods

adapt(self, \*args[, minimum, maximum]) Turn on adaptivity
logs(self) Return logs for the scheduler and workers
new_worker_spec(self) Return name and spec for the next worker
scale(self[, n, memory, cores]) Scale cluster to n workers
scale_up(self[, n, memory, cores]) Scale cluster to n workers
close  
scale_down  
sync  
logs(self)[source]

Return logs for the scheduler and workers

Parameters:
scheduler : boolean

Whether or not to collect logs for the scheduler

workers : boolean or Iterable[str], optional

A list of worker addresses to select. Defaults to all workers if True or no workers if False

Returns:
logs: Dict[str]

A dictionary of logs, with one item for the scheduler and one for each worker

class dask_cloudprovider.FargateCluster(**kwargs)[source]

Deploy a Dask cluster using Fargate on ECS

This creates a dask scheduler and workers on a Fargate powered ECS cluster. If you do not configure a cluster one will be created for you with sensible defaults.

Parameters:
kwargs: dict

Keyword arguments to be passed to ECSCluster.

Attributes:
asynchronous
dashboard_link
observed
plan
requested
scheduler_address
tags

Methods

adapt(self, \*args[, minimum, maximum]) Turn on adaptivity
logs(self) Return logs for the scheduler and workers
new_worker_spec(self) Return name and spec for the next worker
scale(self[, n, memory, cores]) Scale cluster to n workers
scale_up(self[, n, memory, cores]) Scale cluster to n workers
close  
scale_down  
sync