IBM Cloud

IBMCodeEngineCluster([image, region, ...])

Cluster running on IBM Code Engine.

Overview

Authentication

To authenticate with IBM Cloud you must first generate an API key.

Then you must put this in your Dask configuration at cloudprovider.ibm.api_key. This can be done by adding the API key to your YAML configuration or exporting an environment variable.

# ~/.config/dask/cloudprovider.yaml

cloudprovider:
   ibm:
      api_key: "your_api_key"
$ export DASK_CLOUDPROVIDER__IBM__API_KEY="your_api_key"

Project ID

To use Dask Cloudprovider with IBM Cloud you must also configure your Project ID. This can be found at the top of the IBM Cloud dashboard.

Your Project ID must be added to your Dask config file.

# ~/.config/dask/cloudprovider.yaml
cloudprovider:
  ibm:
     project_id: "your_project_id"

Or via an environment variable.

$ export DASK_CLOUDPROVIDER__IBM__PROJECT_ID="your_project_id"

Code Engine

class dask_cloudprovider.ibm.IBMCodeEngineCluster(image: str = None, region: str = None, project_id: str = None, scheduler_cpu: str = None, scheduler_mem: str = None, scheduler_disk: str = None, scheduler_timeout: int = None, scheduler_command: str = None, worker_cpu: str = None, worker_mem: str = None, worker_disk: str = None, worker_threads: int = 1, worker_command: str = None, docker_server: str = None, docker_username: str = None, docker_password: str = None, debug: bool = False, **kwargs)[source]

Cluster running on IBM Code Engine.

This cluster manager builds a Dask cluster running on IBM Code Engine.

When configuring your cluster, you may find it useful to refer to the IBM Cloud documentation for available options.

https://cloud.ibm.com/docs/codeengine

Parameters
image: str

The Docker image to run on all instances. This image must have a valid Python environment and have dask installed in order for the dask-scheduler and dask-worker commands to be available.

region: str

The IBM Cloud region to launch your cluster in.

See: https://cloud.ibm.com/docs/codeengine?topic=codeengine-regions

project_id: str

Your IBM Cloud project ID. This must be set either here or in your Dask config.

scheduler_cpu: str

The amount of CPU to allocate to the scheduler.

See: https://cloud.ibm.com/docs/codeengine?topic=codeengine-mem-cpu-combo

scheduler_mem: str

The amount of memory to allocate to the scheduler.

See: https://cloud.ibm.com/docs/codeengine?topic=codeengine-mem-cpu-combo

scheduler_disk: str

The amount of ephemeral storage to allocate to the scheduler. This value must be lower than scheduler_mem.

scheduler_timeout: int

The timeout for the scheduler in seconds.

scheduler_command: str

The command to run the scheduler. This should be a string that is passed to the dask-scheduler command. The default is dask-scheduler --protocol ws.

worker_cpu: str

The amount of CPU to allocate to each worker.

See: https://cloud.ibm.com/docs/codeengine?topic=codeengine-mem-cpu-combo

worker_mem: str

The amount of memory to allocate to each worker.

See: https://cloud.ibm.com/docs/codeengine?topic=codeengine-mem-cpu-combo

worker_disk: str

The amount of ephemeral storage to allocate to each worker. This value must be lower than worker_mem.

worker_threads: int

The number of threads to use on each worker.

worker_command: str

The command to run the worker. This should be a string that is passed to the dask-worker command. The default is python -m distributed.cli.dask_spec.

docker_server: str

The Docker registry server (e.g., “docker.io”, “gcr.io”). Required if using private Docker images.

docker_username: str

The username for authenticating with the Docker registry. Required if using private Docker images.

docker_password: str

The password or access token for authenticating with the Docker registry. Required if using private Docker images.

debug: bool, optional

More information will be printed when constructing clusters to enable debugging.

Notes

Credentials

In order to use the IBM Cloud API, you will need to set up an API key. You can create an API key in the IBM Cloud console.

The best practice way of doing this is to pass an API key to be used by workers. You can set this API key as an environment variable. Here is a small example to help you do that.

To expose your IBM API KEY, use this command: export DASK_CLOUDPROVIDER__IBM__API_KEY=xxxxx

Docker Registry Authentication

If you need to use private Docker images, you can configure Docker registry credentials using the docker_server, docker_username, and docker_password parameters. These credentials will be used to create a Kubernetes secret for image pulling in Code Engine.

Certificates

This backend will need to use a Let’s Encrypt certificate (ISRG Root X1) to connect the client to the scheduler between websockets. More information can be found here: https://letsencrypt.org/certificates/

Examples

Create the cluster.

>>> from dask_cloudprovider.ibm import IBMCodeEngineCluster
>>> cluster = IBMCodeEngineCluster(n_workers=1)
Launching cluster with the following configuration:
    Source Image: daskdev/dask:latest
    Region: eu-de
    Project id: f21626f6-54f7-4065-a038-75c8b9a0d2e0
    Scheduler CPU: 0.25
    Scheduler Memory: 1G
    Scheduler Disk: 400M
    Scheduler Timeout: 600
    Worker CPU: 2
    Worker Memory: 4G
    Worker Disk: 400M
Creating scheduler dask-xxxxxxxx-scheduler
Waiting for scheduler to run at dask-xxxxxxxx-scheduler.xxxxxxxxxxxx.xx-xx.codeengine.appdomain.cloud:443
Scheduler is running
Creating worker instance dask-xxxxxxxx-worker-xxxxxxxx
>>> from dask.distributed import Client
>>> client = Client(cluster)

Do some work.

>>> import dask.array as da
>>> arr = da.random.random((1000, 1000), chunks=(100, 100))
>>> arr.mean().compute()
0.5001550986751964

Close the cluster

>>> cluster.close()
Deleting Instance: dask-xxxxxxxx-worker-xxxxxxxx
Deleting Instance: dask-xxxxxxxx-scheduler

You can also do this all in one go with context managers to ensure the cluster is created and cleaned up.

>>> with IBMCodeEngineCluster(n_workers=1) as cluster:
...     with Client(cluster) as client:
...         print(da.random.random((1000, 1000), chunks=(100, 100)).mean().compute())
Launching cluster with the following configuration:
    Source Image: daskdev/dask:latest
    Region: eu-de
    Project id: f21626f6-54f7-4065-a038-75c8b9a0d2e0
    Scheduler CPU: 0.25
    Scheduler Memory: 1G
    Scheduler Disk: 400M
    Scheduler Timeout: 600
    Worker CPU: 2
    Worker Memory: 4G
    Worker Disk: 400M
    Worker Threads: 1
Creating scheduler dask-xxxxxxxx-scheduler
Waiting for scheduler to run at dask-xxxxxxxx-scheduler.xxxxxxxxxxxx.xx-xx.codeengine.appdomain.cloud:443
Scheduler is running
Creating worker instance dask-xxxxxxxx-worker-xxxxxxxx
0.5000812282861661
Deleting Instance: dask-xxxxxxxx-worker-xxxxxxxx
Deleting Instance: dask-xxxxxxxx-scheduler
Attributes
asynchronous

Are we running in the event loop?

auto_shutdown
bootstrap
called_from_running_loop
command
dashboard_link
docker_image
gpu_instance
loop
name
observed
plan
requested
scheduler_address
scheduler_class
worker_class

Methods

adapt([Adaptive, minimum, maximum, ...])

Turn on adaptivity

call_async(f, *args, **kwargs)

Run a blocking function in a thread as a coroutine.

from_name(name)

Create an instance of this class to represent an existing cluster by name.

get_client()

Return client for the cluster

get_logs([cluster, scheduler, workers])

Return logs for the cluster, scheduler and workers

get_tags()

Generate tags to be applied to all resources.

new_worker_spec()

Return name and spec for the next worker

scale([n, memory, cores])

Scale cluster to n workers

scale_up([n, memory, cores])

Scale cluster to n workers

sync(func, *args[, asynchronous, ...])

Call func with args synchronously or asynchronously depending on the calling context

wait_for_workers(n_workers[, timeout])

Blocking call to wait for n workers before continuing

close

get_cloud_init

logs

render_cloud_init

render_process_cloud_init

scale_down