Firstly, locate the cluster definitions page from dropdown menu at the top of the application.
The following steps detailed in this documentation require you to at least have BI Developer application role permissions.
Cluster Definitions have various types which determine what tasks they can be used in and what options can be configured for them.
Cluster Definition Type | Description | Used by |
---|---|---|
Databricks Cluster (Azure) | Spin up Apache Spark clusters on-the-fly using Azure based Virtual Machine configurations. | Databricks, Spark SQL Statement |
Azure Batch Pool | Creates an Azure Batch Pool. | Azure Batch Task |
Azure Batch Container Pool | Creates an Azure Batch Container Pool. | Azure Batch Task |
The next page in the form will prompt you to configure the various specs and software that is used in the cluster. If you need a refresher on what makes up a Cluster Definition read Cluster Definition Concepts.
There are no validation steps required for Cluster Definitions, this is because all information is populated based on the type and is verified against the cloud provider.
There are three different cluster types.
You can create an Azure Batch Pool or Azure Batch Container Pool. For both cluster types you will also need to choose the Region and the Connection.
Choose from the dropdown list of regions. (Your chosen region may affect the available OS Configurations you can choose on the next page.)
Your chosen region must be the same as your Azure Batch account region.
Choose a connection in the next dropdown. These are available connections to Azure Batch.
Select an OS Configuration. You can choose to filter the OS Configuration dropdown by not displaying unverified OS Configurations or those that are expired or will soon expire. Your previously selected region may affect the OS Configurations that are available.
The available OS Configuration options in the dropdown list will differ depending on those selections. (The following image does not contain OS Configurations that are unverified or will soon expire.)
Choose an Azure Virtual Machine Type. The VMs available will change depending on your chosen hosting Region and capabilities.
Then choose the number of Minimum Workers for this cluster definition. This is the minimum number of processes used to run tasks.
You can also choose the number of Maximum Workers. You can leave this field blank and not specify a maximum number of workers. If specified, the cluster will automatically scale based on the workload.
Providing a number of Maximum Workers may result in higher running costs.
You can then submit this cluster definition to save, and it will be ready to use.