Configuration basics#
Ablator framework uses a configuration system to define everything related to the training of machine learning models, from the model architecture, to the environment that it’s being trained in.
Ablator has the ability to dynamically create a hierarchical configuration by composition, and you can either override it through yaml
config files and the command line, or you can just play around with python objects and classes. Refer to these examples or the last two sections in this tutorial to see how you can implement these two methods.
Configuration categories#
For our framework, configuration is organized into different categories: - Running configuration (either for training a single model or training multiple models in parallel) - Model configuration - Training configuration - Optimizer configuration - Scheduler configuration.
Most of them will be used together in order for ablator
to work seamlessly.
RunConfig#
RunConfig
is used to configure the experiment environment, e.g where to store experiment artifacts (loss, accuracy, other evaluation metrics), the device to be used (GPU, CPU), when to do validation step or progress logging while running the experiment.
The table below summarizes the parameters, either required or customizable. Note that RunConfig
requires TrainConfig
and ModelConfig
to be included during initialization, which are covered in the next sections of this tutorial.
Parameter |
Usage |
---|---|
experiment_dir |
location to store experiment artifacts. |
random_seed |
random seed. |
train_config |
training configuration. (check |
model_config |
model configuration. (check |
keep_n_checkpoints |
number of latest checkpoints to keep. |
tensorboard |
whether to use tensorboardLogger. |
amp |
whether to use automatic mixed precision when running on gpu. |
device |
device to run on. |
verbose |
verbosity level. |
eval_subsample |
fraction of the dataset to use for evaluation. |
metrics_n_batches |
max number of batches stored in every tag(train, eval, test) for evaluation. |
metrics_mb_limit |
max number of megabytes stored in every tag(train, eval, test) for evaluation. |
early_stopping_iter |
The maximum allowed difference between the current iteration and the last iteration with the best metric before applying early stopping. Early stopping will be triggered if the difference |
eval_epoch |
The epoch interval between two evaluations. |
log_epoch |
The epoch interval between two logging. |
init_chkpt |
path to a checkpoint to initialize the model with. |
warm_up_epochs |
number of epochs marked as warm up epochs. |
divergence_factor |
if |
ParallelConfig#
ParallelConfig
is a subclass of RunConfig
. It introduces additional arguments to configure parallel training and enabling horizontal scaling of a single experiment, such as the number of trials, the maximum number of trials to run concurrently, the target metrics to optimize, and more.
Parameter |
Usage |
---|---|
total_trials |
total number of trials. |
concurrent_trials |
number of trials to run concurrently. |
search_space |
search space for hyperparameter search,eg. |
optim_metrics |
metrics to optimize, eg. |
search_algo |
type of search algorithm. |
ignore_invalid_params |
whether to ignore invalid parameters when sampling. |
remote_config |
remote storage configuration. |
gcp_config |
gcp configuration. |
gpu_mb_per_experiment |
gpu resource to assign to an experiment. |
cpus_per_experiment |
cpu resource to assign to an experiment. |
It’s worth to mention search_space
, which is used to define a set of continuous or categorical/discrete values for a certain hyperparameter that you want to ablate. Refer to Search Space basics to learn more about how to use it for ablation.
ModelConfig#
This configuration can be used to add parameters specific to the model you’re using. A sample use case for this is when you want to try different model sizes, number of layers, activation functions, etc. You can do this by creating a custom ModelConfig
class for the model and include these parameters. One advantage of this is that ablator
will be able to create a search space over the parameters and then run Hyperparameter optimization.
There are 2 steps that are required after defining a custom model config class for your model:
Pass the custom config to its constructor so you can construct the model using the parameters that’s defined in the custom config.
Create a custom running config class (decorated with
configclass
decorator), to updatemodel_config
argument to proper type, e.gMyCustomModelConfig
(sincemodel_config
attribute of the running configuration,RunConfig
orParallelConfig
, is originally of typeModelConfig
).
Note that in the model config class, arguments can be defined as Stateless or Derived data type. These are custom Python annotations to define attributes to which the experiment state is agnostic.
Stateless is used if a variable can take different value assignments between trials or experiments. For example, the learning rate, as we can resume training a model with different learning rates, should be stateless. Note that if you’re declaring a variable to be Stateless, it must be assigned an initial value before launching the experiment.
Derived attributes are Stateful and are un-decided at the start of the experiment. Their values are determined by internal experiment processes that can depend on other experimental attributes, e.g model input size that depends on the dataset.
Stateful is opposite to Stateless, i.e its value must be the same between different experiments. For example, when you continue training a paused model, the model architecture should be the same (number of layers, output size). Stateful variables, defined as a primitive datatype, are required at initialization.
Below is an example of a simple 1-layer neural network model, with configuration for input size (to be inferred); hidden layer dimension, activation function, and dropout rate (all of which are stateful); learning rate (stateless).
from ablator import RunConfig, ModelConfig, Stateless, Derived, configclass
import torch.nn as nn
import torch
class MyModelConfig(ModelConfig):
inp_size: Derived[int]
lr: Stateless[float]
hidden_dim: int
activation: str
dropout: float
@configclass
class CustomRunConfig(RunConfig):
model_config: MyModelConfig
class MyCustomModel(nn.Module):
def __init__(self, config: MyModelConfig) -> None:
super().__init__()
self.linear = nn.Linear(config.inp_size, config.hidden_dim)
self.dropout = nn.Dropout(config.dropout)
if config.activation == "relu":
self.activate = nn.ReLU()
elif config.activation == "elu":
self.activate = nn.ELU()
def forward(self, x: torch.Tensor):
out = self.linear(x)
out = self.dropout(out)
out = self.activate(out)
return {"preds": out, "labels": out}, x.sum().abs()
model_config = MyModelConfig(lr=0.01, hidden_dim=100, activation="relu", dropout=0.3)
TrainConfig#
This configuration class defines everything that is related to the main training process of your model, which includes dataset name, batch size, number of epochs, optimizer, scheduler. 2 important attributes to metion are optimizer_config
and scheduler_config
. As the names suggest, they configure the optimizer and scheduler to be used in the training process.
Parameter |
Usage |
---|---|
dataset |
dataset name. maybe used in custom dataset loader functions. |
batch_size |
batch size. |
epochs |
number of epochs to train. |
optimizer_config |
optimizer configuration. (check |
scheduler_config |
scheduler configuration. (check |
rand_weights_init |
whether to initialize model weights randomly. |
OptimizerConfig and SchedulerConfig#
OptimizerConfig
is a config class that allows user choose the optimizer they wanted. Currently, we supports SGD optimizer, Adam optimizer, and AdamW optimizer.
SchedulerConfig
, on the other hand, can be used for scheduling learning rate updates in the training process.
Both of these config classes have similar arguments:
Parameter |
Usage |
---|---|
name |
The type of the scheduler or optimizer, this can be any in |
arguments |
The arguments for the scheduler or optimizer, specific to a certain type of scheduler or scheduler. |
The table below shows how arguments can be defined for each type of optimzer:
Optimizer type |
Arguments |
---|---|
sgd |
|
adam |
|
adamw |
|
The table below shows how arguments can be defined for each type of scheduler:
Sc heduler type |
Arguments |
---|---|
cycle |
|
plataeu |
|
step |
|
The following code snippet describes how to initialize these objects
from ablator import OptimizerConfig, SchedulerConfig
optimizer_config = OptimizerConfig(name="sgd", arguments={"lr": 0.1})
scheduler_config = SchedulerConfig(name="cycle", arguments={"max_lr": 0.5, "total_steps": 50})
Now let’s combine everything. Ablator trainer requires a model wrapper and a running config when initializing, after that, experiment can be launched via trainer.launch()
. Note that this tutorial only focuses on defining the running configuration run_config
, for Ablator trainer, refer to Prototyping models and HPO.
Take the code snippet below as an example, train_config
sets up the dataset, batch size, epochs, and references the optimizer configuration and scheduler configuration. Next, config
object combines the train_config
and model_config
, along with runtime settings like verbosity and device.
from ablator import TrainConfig
train_config = TrainConfig(
dataset="test",
batch_size=128,
epochs=2,
optimizer_config=optimizer_config,
scheduler_config=scheduler_config,
)
config = CustomRunConfig(
train_config=train_config,
model_config=MyModelConfig(),
verbose="silent",
device="cpu",
)
With the configuration created, we are half-way to running ablation experiment with the ablator trainer.
trainer = ParallelTrainer(wrapper=model_wrapper, run_config=run_config)
trainer.launch()
In the next chapter, you will learn how to create the model wrapper, the other half that’s left. We will start with training a single model.
Different methods to define running configurations#
There are 3 ways to provide values to the configurations: named arguments, file-based, or dictionary-based. All examples from the previous sections are actually the named arguments method. Now let’s look at how file based method and dictionary based method work.
File-based#
File based configuration is a way for you to create simple configuration files, passing configuration values to a single yaml
file. After that, based on the type of running configuration you want, you can use RunConfigClass.load(path/to/yaml/file)
method to create configuration with values provided in the config file.
To write these config files, simply follow key : value
syntax (each pair on a single line). The following example shows what a config yaml file looks like. We will name it config.yaml
:
experiment_dir: "/tmp/dir"
train_config:
dataset: test
batch_size: 128
epochs: 2
optimizer_config:
name: sgd
arguments:
lr: 0.1
scheduler_config:
name: cycle
arguments:
max_lr: 0.5
total_steps: 50
model_config:
inp_size: 50
hidden_dim: 100
activation: "relu"
dropout: 0.15
verbose: "silent"
device: "cpu"
We can see that the outermost arguments are from RunConfig
. Also note how train_config
, which corresponds to TrainConfig
object in the running config, has its arguments defined 1 level below (indented). Therefore, the first rule to follow is that the arguments to use are from the running config class, either RunConfig
or ParallelConfig
, so make sure you use the right set of arguments. The second rule is that any arguments that is another config class should be indented 1
level from its parent config class.
Now in your code, only 1 single line of code is required to load these values to create the config object:
config = CustomRunConfig.load("path/to/yaml/file")
Note that since we created a custom running configuration class CustomRunConfig
that is tied to the custom model config in the previous sections, we used CustomRunConfig.load("path/to/yaml/file")
to load configuration from file. Otherwise, if you’re not creating any subclasses, RunConfig.load("path")
or ParallelConfig.load("path")
also works.
Dictionary based#
Another alternative is similar to the file-based method, but it’s defining configurations in a dictionary instead of a yaml file, and then the dictionary will be passed (as keyword arguments) to the running configuration at initialization
configuration = {
"experiment_dir": "/tmp/dir",
"train_config": {
"dataset": "test",
"batch_size": 128,
"epochs": 2,
"optimizer_config":{
"name": "sgd",
"arguments": {
"lr": 0.1
}
},
"scheduler_config":{
"name": "cycle",
"arguments":{
"max_lr": 0.5,
"total_steps": 50
}
}
},
"model_config": {
"inp_size": 50,
"hidden_dim": 100,
"activation": "relu",
"dropout": 0.15
},
"verbose": "silent",
"device": "cpu"
}
config = CustomRunConfig(
**configuration
)
Conclusion#
Now that you know how to define running configurations, you can start creating your own prototype. In the next chapter, we will learn how to write a prototype for your model, combine it with the running configuration, and launch the experiment.