lisbet.training.utils#

Utility functions for model training.

Functions

`estimate_num_workers`(n_tasks, batch_size[, ...])	Estimate the optimal number of DataLoader worker processes to use, based on the number of training tasks, the batch size, and the desired batch size per worker.
`generate_seeds`(seed, task_ids)	Internal helper.
`worker_init_fn`(worker_id)	Worker initialization function for DataLoader.

lisbet.training.utils.generate_seeds(seed, task_ids)[source]#: Internal helper. Generates multiple seeds from the base one.

lisbet.training.utils.estimate_num_workers(n_tasks, batch_size, batch_size_per_worker=16)[source]#

Estimate the optimal number of DataLoader worker processes to use, based on the number of training tasks, the batch size, and the desired batch size per worker.

Parameters:

n_tasks (int) – The number of training tasks (e.g., datasets or splits) being processed.
batch_size (int) – The total batch size used for loading data.
batch_size_per_worker (int, optional) – The target batch size to be handled by each worker process (default: 16).

Returns:

num_workers – The estimated number of DataLoader worker processes to use.

Return type:

int

lisbet.training.utils.worker_init_fn(worker_id)[source]#

Worker initialization function for DataLoader.

This function sets a unique random seed for each DataLoader worker based on the worker ID, global rank, and task seed. This ensures that each worker operates with a different random seed, which is crucial for data shuffling and augmentation in distributed training scenarios.

Parameters:: worker_id (int) – The ID of the worker being initialized. This is typically an integer in the range [0, num_workers - 1].