Hyperparameter Optimization#
In this chapter, we are going to explore how to perform hyperparameter optimization on a CNN model using Ablator.
Why do HPO with Ablator?
Ablator combines the Ray back-end with Optuna for hyperparameter optimization (HPO), eliminating the need for boilerplate code in fault-tolerant strategies, training, and result analysis.
Importing libraries#
Import the Configs, ModelWrapper, and ParallelTrainer from ablator.
Import SearchSpace from ablator.main.configs.
from ablator import ModelConfig, OptimizerConfig, TrainConfig, RunConfig, ParallelConfig
from ablator import ModelWrapper, ParallelTrainer, configclass
from ablator.main.configs import SearchSpace
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
import torchvision
import torchvision.transforms as transforms
import os
import shutil
from sklearn.metrics import f1_score, accuracy_score
Configurations#
Defining Configs:
Optimizer Config: adam (lr = 0.001).
Train Config: batch_size = 32, epochs = 10, random weights initialization is set as true.
Model Config: The
CustomModelConfig
defines two parameters for the number of filters and an activation function.
@configclass
class CustomModelConfig(ModelConfig):
num_filter1: int
num_filter2: int
activation: str
model_config = CustomModelConfig(num_filter1 =32, num_filter2 = 64, activation = "relu")
optimizer_config = OptimizerConfig(
name="adam",
arguments={"lr": 0.001}
)
train_config = TrainConfig(
dataset="Fashion-mnist",
batch_size=32,
epochs=10,
optimizer_config=optimizer_config,
scheduler_config=None,
rand_weights_init = True
)
Defining a CNN Model#
This is a custom CNN model with the following architecture:
The first convolutional layer: It takes a single channel and applies
num_filters1
filters to it. Then, it applies an activation function and a max pooling layer.The second convolutional layer: It takes num_filters1 channels and applies
num_filters2
filters to them. It also utilizes an activation function and a pooling layer.The third convolutional layer: This is an additional layer that applies
num_filters2
filters.A flattening layer: It converts the convolutional layers into a linear format and subsequently produces a 10-dimensional output for labeling.
Furthermore, the class MyModel extends the PyTorch model to incorporate the CrossEntropyLoss
as well.
# Define the model
class FashionCNN(nn.Module):
def __init__(self, config: CustomModelConfig):
super(FashionCNN, self).__init__()
activation_list = {"relu": nn.ReLU(), "elu": nn.ELU(), "leakyRelu": nn.LeakyReLU()}
num_filter1 = config.num_filter1
num_filter2 = config.num_filter2
activation = activation_list[config.activation]
self.conv1 = nn.Conv2d(1, num_filter1, kernel_size=3, stride=1, padding=1)
self.act1 = activation
self.maxpool1 = nn.MaxPool2d(kernel_size=2, stride=2)
self.conv2 = nn.Conv2d(num_filter1, num_filter2, kernel_size=3, stride=1, padding=1)
self.act2 = activation
self.maxpool2 = nn.MaxPool2d(kernel_size=2, stride=2)
self.conv3 = nn.Conv2d(num_filter2, num_filter2, kernel_size=3, stride=1, padding=1)
self.act3 = activation
self.flatten = nn.Flatten()
self.fc1 = nn.Linear(num_filter2 * 7 * 7, 10)
def forward(self, x):
x = self.conv1(x)
x = self.act1(x)
x = self.maxpool1(x)
x = self.conv2(x)
x = self.act2(x)
x = self.maxpool2(x)
x = self.conv3(x)
x = self.act3(x)
x = self.flatten(x)
x = self.fc1(x)
return x
class MyModel(nn.Module):
def __init__(self, config: CustomModelConfig) -> None:
super().__init__()
self.model = FashionCNN(config)
self.loss = nn.CrossEntropyLoss()
def forward(self, x, labels=None):
out = self.model(x)
loss = None
if labels is not None:
loss = self.loss(out, labels)
out = out.argmax(dim=-1)
return {"y_pred": out, "y_true": labels}, loss
Search Space#
For this tutorial, we have defined search_space
object for four different hyperparameters.
This includes:
For the number of filters in the first conv. layer.
Same for the second conv. layer.
learning rate.
activation function.
search_space = {
"model_config.num_filter1": SearchSpace(value_range = [32, 64], value_type = 'int'),
"model_config.num_filter2": SearchSpace(value_range = [64, 128], value_type = 'int'),
"train_config.optimizer_config.arguments.lr": SearchSpace(
value_range = [0.001, 0.01],
value_type = 'float'
),
"model_config.activation": SearchSpace(categorical_values = ["relu", "elu", "leakyRelu"]),
}
Parallel Configuration#
We pass a search_space
to the Parallel Config for the hyperparameters we need to explore.
@configclass
class CustomParallelConfig(ParallelConfig):
model_config: CustomModelConfig
parallel_config = CustomParallelConfig(
train_config=train_config,
model_config=model_config,
metrics_n_batches = 800,
experiment_dir = "/tmp/experiments/",
device="cuda",
amp=True,
random_seed = 42,
total_trials = 20,
concurrent_trials = 20,
search_space = search_space,
optim_metrics = {"val_loss": "min"},
gpu_mb_per_experiment = 1024,
cpus_per_experiment = 1,
)
Importing the dataset#
Fashion MNIST
Image dimensions: 28 pixels x 28 pixels (grayscale) Shape of the training data tensor: [60000, 1, 28, 28]
transform = transforms.ToTensor()
train_dataset = torchvision.datasets.FashionMNIST(
root='./data',
train=True,
download=True,
transform=transform
)
test_dataset = torchvision.datasets.FashionMNIST(
root='./data',
train=False,
download=True,
transform=transform
)
The ModelWrapper
will be the same as discussed in the Prototyping models.
class MyModelWrapper(ModelWrapper):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def make_dataloader_train(self, run_config: CustomParallelConfig):
return torch.utils.data.DataLoader(
train_dataset,
batch_size=32,
shuffle=True
)
def make_dataloader_val(self, run_config: CustomParallelConfig):
return torch.utils.data.DataLoader(
test_dataset,
batch_size=32,
shuffle=False
)
def evaluation_functions(self):
return {
"accuracy": lambda y_true, y_pred: accuracy_score(y_true.flatten(), y_pred.flatten()),
}
Creating Ray Cluster#
Ablator utilizes Ray for achieving parallel processing of different trials.
To initiate the Ray cluster, run the command
ray start --head
in a terminal. This will start the Ray head node on your local machine.To utilize Ray for parallelization, it is necessary to connect to the Ray cluster. The Ray cluster comprises multiple Ray worker nodes capable of executing tasks in parallel.
To connect to an existing Ray cluster, use the command
ray.init(address="auto")
.
import ray
ray.init(address = "auto")
ParallelTrainer.#
It extends the ProtoTrainer class. The parallelTrainer executes multiple trials in parallel. It initializes Optuna trials, which are responsible for tuning the hyperparameters. Each trial is run on a separate worker node within the Ray cluster.
This class manages the following tasks:
Preparing a Ray cluster for running Optuna trials to tune hyperparameters.
Initializing Optuna trials and adding them to the Optuna storage.
Syncing artifacts (experiment trials and database files) to remote sites, such as Google Cloud Storage.
if not os.path.exists(parallel_config.experiment_dir):
shutil.os.mkdir(parallel_config.experiment_dir)
shutil.rmtree(parallel_config.experiment_dir)
wrapper = MyModelWrapper(
model_class=MyModel,
)
ablator = ParallelTrainer(
wrapper=wrapper,
run_config=parallel_config,
)
ablator.launch(working_directory = os.getcwd(), ray_head_address="auto")
We can provide resume = True
to the launch()
method to resume training the model from existing checkpoints and existing experiment state.
Shutting down the ray cluster using ray.shutdown()
after use.
ray.shutdown()
Visualizing results w.r.t experiments#
Since the experiment stores TensorBoard events files for each trial, we can perform a short visualization with TensorBoard. More detailed analysis will be explored in the later tutorials.
Install tensorboard
and load using %load_ext tensorboard
if using a notebook.
Run the command
%tensorboard --logdir /tmp/experiments/[experiment_dir_name] --port [port]
%load_ext tensorboard
%tensorboard --logdir /tmp/experiments/experiment_5ade_3be2 --port 6008
Conclusion#
Finally, after completing all the trials, the metrics obtained from each trial will be stored in the “experiment_dir”. This directory will contain subdirectories representing each trial, as well as SQLite databases for Optuna and the experiment’s state.
Each trial will have the following components: best_checkpoints, checkpoints, results, training log, configurations, and metadata.
In the later tutorial, we will learn how to analyze the results from the trained trials.