active_learning module¶
Module containing the active learning pipeline
- class active_learning.ActiveLearningPipeline(data_module, model, strategy, epochs, gpus, checkpoint_dir=None, active_learning_mode=False, initial_epochs=None, items_to_label=1, iterations=None, reset_weights=False, epochs_increase_per_query=0, heatmaps_per_iteration=0, logger=True, early_stopping=False, lr_scheduler=None, model_selection_criterion='loss', deterministic_mode=True, save_model_every_epoch=False, clear_wandb_cache=False, **kwargs)[source]¶
Bases:
object
The pipeline or simulation environment to run active learning experiments.
- Parameters
data_module (ActiveLearningDataModule) – A data module object providing data.
model (PytorchModel) – A model object with architecture able to be fitted with the data.
strategy (QueryStrategy) – An active learning strategy to query for new labels.
epochs (int) – The number of epochs the model should be trained.
gpus (int) – Number of GPUS to use for model training.
checkpoint_dir (str, optional) – Directory where the model checkpoints are to be saved.
early_stopping (bool, optional) – Enable/Disable Early stopping when model is not learning anymore. Defaults to False.
logger – A logger object as defined by Pytorch Lightning.
lr_scheduler (string, optional) – Algorithm used for dynamically updating the learning rate during training. E.g. ‘reduceLROnPlateau’ or ‘cosineAnnealingLR’
active_learning_mode (bool, optional) – Enable/Disabled Active Learning Pipeline. Defaults to False.
initial_epochs (int, optional) – Number of epochs the initial model should be trained. Defaults to epochs.
items_to_label (int, optional) – Number of items that should be selected for labeling in the active learning run. Defaults to 1.
iterations (int, optional) – iteration times how often the active learning pipeline should be executed. If None, the active learning pipeline is run until the whole dataset is labeled. Defaults to None.
reset_weights (bool, optional) – Enable/Disable resetting of weights after every active learning run
epochs_increase_per_query (int, optional) – Increase number of epochs for every query to compensate for the increased training dataset size. Defaults to 0.
heatmaps_per_iteration (int, optional) – Number of heatmaps that should be generated per iteration. Defaults to 0.
deterministic_mode (bool, optional) – Whether only deterministic CUDA operations should be used. Defaults to True.
save_model_every_epoch (bool, optional) – Whether the model files of all epochs are to be saved or only the model file of the best epoch. Defaults to False.
clear_wandb_cache (bool, optional) – Whether the whole Weights and Biases cache should be deleted when the run is finished. Should only be used when no other runs are running in parallel. Defaults to False.
**kwargs – Additional, strategy-specific parameters.
- static remove_wandb_cache()[source]¶
Deletes Weights and Biases cache directory. This is necessary since the Weights and Biases client currently does not implement proper cache cleanup itself. See this github issue for more details.
- setup_trainer(epochs, iteration=None)[source]¶
Initializes a new Pytorch Lightning trainer object.
- Parameters
epochs (int) – Number of training epochs.
iteration (Optional[int], optional) – Current active learning iteration. Defaults to None.
- Returns
A trainer object.
- Return type
pytorch_lightning.Trainer