lisbet.training.utils#
Utility functions for model training.
Functions
|
Estimate the optimal number of DataLoader worker processes to use, based on the number of training tasks, the batch size, and the desired batch size per worker. |
|
Internal helper. |
|
Worker initialization function for DataLoader. |
- lisbet.training.utils.generate_seeds(seed, task_ids)[source]#
Internal helper. Generates multiple seeds from the base one.
- lisbet.training.utils.estimate_num_workers(n_tasks, batch_size, batch_size_per_worker=8)[source]#
Estimate the optimal number of DataLoader worker processes to use, based on the number of training tasks, the batch size, and the desired batch size per worker.
- Parameters:
n_tasks (int) – The number of training tasks (e.g., datasets or splits) being processed.
batch_size (int) – The total batch size used for loading data.
batch_size_per_worker (int, optional) – The target batch size to be handled by each worker process (default: 8).
- Returns:
num_workers – The estimated number of DataLoader worker processes to use.
- Return type:
int
- lisbet.training.utils.worker_init_fn(worker_id)[source]#
Worker initialization function for DataLoader.
This function sets a unique random seed for each DataLoader worker based on the worker ID, global rank, and task seed. This ensures that each worker operates with a different random seed, which is crucial for data shuffling and augmentation in distributed training scenarios.
- Parameters:
worker_id (
int) – The ID of the worker being initialized. This is typically an integer in the range [0, num_workers - 1].