Hyperparameter Optimization#

  • In this chapter, we are going to explore how to perform hyperparameter optimization on a CNN model using Ablator.

Why do HPO with Ablator?

  • Ablator combines the Ray back-end with Optuna for hyperparameter optimization (HPO), eliminating the need for boilerplate code in fault-tolerant strategies, training, and result analysis.

Importing libraries#

  • Import the Configs, ModelWrapper, and ParallelTrainer from ablator.

  • Import SearchSpace from ablator.main.configs.

from ablator import ModelConfig, OptimizerConfig, TrainConfig, RunConfig, ParallelConfig
from ablator import ModelWrapper, ParallelTrainer, configclass
from ablator.main.configs import SearchSpace

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
import torchvision
import torchvision.transforms as transforms

import os
import shutil
from sklearn.metrics import f1_score, accuracy_score

Configurations#

Defining Configs:

  • Optimizer Config: adam (lr = 0.001).

  • Train Config: batch_size = 32, epochs = 10, random weights initialization is set as true.

  • Model Config: The CustomModelConfig defines two parameters for the number of filters and an activation function.

@configclass
class CustomModelConfig(ModelConfig):
  num_filter1: int
  num_filter2: int
  activation: str


model_config = CustomModelConfig(num_filter1 =32, num_filter2 = 64, activation = "relu")

optimizer_config = OptimizerConfig(
    name="adam",
    arguments={"lr": 0.001}
)

train_config = TrainConfig(
    dataset="Fashion-mnist",
    batch_size=32,
    epochs=10,
    optimizer_config=optimizer_config,
    scheduler_config=None,
    rand_weights_init = True
)

Defining a CNN Model#

This is a custom CNN model with the following architecture:

  • The first convolutional layer: It takes a single channel and applies num_filters1 filters to it. Then, it applies an activation function and a max pooling layer.

  • The second convolutional layer: It takes num_filters1 channels and applies num_filters2 filters to them. It also utilizes an activation function and a pooling layer.

  • The third convolutional layer: This is an additional layer that applies num_filters2 filters.

  • A flattening layer: It converts the convolutional layers into a linear format and subsequently produces a 10-dimensional output for labeling.

Furthermore, the class MyModel extends the PyTorch model to incorporate the CrossEntropyLoss as well.

# Define the model
class FashionCNN(nn.Module):
    def __init__(self, config: CustomModelConfig):
        super(FashionCNN, self).__init__()

        activation_list = {"relu": nn.ReLU(), "elu": nn.ELU(), "leakyRelu": nn.LeakyReLU()}

        num_filter1 = config.num_filter1
        num_filter2 = config.num_filter2
        activation = activation_list[config.activation]

        self.conv1 = nn.Conv2d(1, num_filter1, kernel_size=3, stride=1, padding=1)
        self.act1 = activation
        self.maxpool1 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.conv2 = nn.Conv2d(num_filter1, num_filter2, kernel_size=3, stride=1, padding=1)
        self.act2 = activation
        self.maxpool2 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.conv3 = nn.Conv2d(num_filter2, num_filter2, kernel_size=3, stride=1, padding=1)
        self.act3 = activation

        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(num_filter2 * 7 * 7, 10)


    def forward(self, x):
        x = self.conv1(x)
        x = self.act1(x)
        x = self.maxpool1(x)
        x = self.conv2(x)
        x = self.act2(x)
        x = self.maxpool2(x)
        x = self.conv3(x)
        x = self.act3(x)
        x = self.flatten(x)
        x = self.fc1(x)

        return x

class MyModel(nn.Module):
    def __init__(self, config: CustomModelConfig) -> None:
        super().__init__()

        self.model = FashionCNN(config)
        self.loss = nn.CrossEntropyLoss()

    def forward(self, x, labels=None):
        out = self.model(x)
        loss = None

        if labels is not None:
            loss = self.loss(out, labels)

        out = out.argmax(dim=-1)

        return {"y_pred": out, "y_true": labels}, loss

Search Space#

For this tutorial, we have defined search_space object for four different hyperparameters.

This includes:

  • For the number of filters in the first conv. layer.

  • Same for the second conv. layer.

  • learning rate.

  • activation function.

search_space = {
    "model_config.num_filter1": SearchSpace(value_range = [32, 64], value_type = 'int'),
    "model_config.num_filter2": SearchSpace(value_range = [64, 128], value_type = 'int'),
    "train_config.optimizer_config.arguments.lr": SearchSpace(
        value_range = [0.001, 0.01],
        value_type = 'float'
        ),
    "model_config.activation": SearchSpace(categorical_values = ["relu", "elu", "leakyRelu"]),
}

Parallel Configuration#

We pass a search_space to the Parallel Config for the hyperparameters we need to explore.

@configclass
class CustomParallelConfig(ParallelConfig):
  model_config: CustomModelConfig

parallel_config = CustomParallelConfig(
    train_config=train_config,
    model_config=model_config,
    metrics_n_batches = 800,
    experiment_dir = "/tmp/experiments/",
    device="cuda",
    amp=True,
    random_seed = 42,
    total_trials = 20,
    concurrent_trials = 20,
    search_space = search_space,
    optim_metrics = {"val_loss": "min"},
    gpu_mb_per_experiment = 1024,
    cpus_per_experiment = 1,
)

Importing the dataset#

Fashion MNIST

Image dimensions: 28 pixels x 28 pixels (grayscale) Shape of the training data tensor: [60000, 1, 28, 28]

transform = transforms.ToTensor()

train_dataset = torchvision.datasets.FashionMNIST(
    root='./data',
    train=True,
    download=True,
    transform=transform
)

test_dataset = torchvision.datasets.FashionMNIST(
    root='./data',
    train=False,
    download=True,
    transform=transform
)

The ModelWrapper will be the same as discussed in the Prototyping models.

class MyModelWrapper(ModelWrapper):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def make_dataloader_train(self, run_config: CustomParallelConfig):
        return torch.utils.data.DataLoader(
            train_dataset,
            batch_size=32,
            shuffle=True
        )

    def make_dataloader_val(self, run_config: CustomParallelConfig):
        return torch.utils.data.DataLoader(
            test_dataset,
            batch_size=32,
            shuffle=False
        )

    def evaluation_functions(self):
        return {
            "accuracy": lambda y_true, y_pred: accuracy_score(y_true.flatten(), y_pred.flatten()),
        }

Creating Ray Cluster#

Ablator utilizes Ray for achieving parallel processing of different trials.

  • To initiate the Ray cluster, run the command ray start --head in a terminal. This will start the Ray head node on your local machine.

  • To utilize Ray for parallelization, it is necessary to connect to the Ray cluster. The Ray cluster comprises multiple Ray worker nodes capable of executing tasks in parallel.

  • To connect to an existing Ray cluster, use the command ray.init(address="auto").

import ray
ray.init(address = "auto")

ParallelTrainer.#

It extends the ProtoTrainer class. The parallelTrainer executes multiple trials in parallel. It initializes Optuna trials, which are responsible for tuning the hyperparameters. Each trial is run on a separate worker node within the Ray cluster.

This class manages the following tasks:

  • Preparing a Ray cluster for running Optuna trials to tune hyperparameters.

  • Initializing Optuna trials and adding them to the Optuna storage.

  • Syncing artifacts (experiment trials and database files) to remote sites, such as Google Cloud Storage.

if not os.path.exists(parallel_config.experiment_dir):
    shutil.os.mkdir(parallel_config.experiment_dir)

shutil.rmtree(parallel_config.experiment_dir)

wrapper = MyModelWrapper(
    model_class=MyModel,
)

ablator = ParallelTrainer(
    wrapper=wrapper,
    run_config=parallel_config,
)
ablator.launch(working_directory = os.getcwd(), ray_head_address="auto")

We can provide resume = True to the launch() method to resume training the model from existing checkpoints and existing experiment state.

Shutting down the ray cluster using ray.shutdown() after use.

ray.shutdown()

Visualizing results w.r.t experiments#

Since the experiment stores TensorBoard events files for each trial, we can perform a short visualization with TensorBoard. More detailed analysis will be explored in the later tutorials.

Install tensorboard and load using %load_ext tensorboard if using a notebook.

  • Run the command %tensorboard --logdir /tmp/experiments/[experiment_dir_name] --port [port]

%load_ext tensorboard
%tensorboard --logdir /tmp/experiments/experiment_5ade_3be2 --port 6008

TensorBoard-Output

Conclusion#

Finally, after completing all the trials, the metrics obtained from each trial will be stored in the “experiment_dir”. This directory will contain subdirectories representing each trial, as well as SQLite databases for Optuna and the experiment’s state.

Each trial will have the following components: best_checkpoints, checkpoints, results, training log, configurations, and metadata.

In the later tutorial, we will learn how to analyze the results from the trained trials.