lisbet.training.utils#

Utility functions for model training.

Functions

estimate_num_workers(n_tasks, batch_size[, ...])

Estimate the optimal number of DataLoader worker processes to use, based on the number of training tasks, the batch size, and the desired batch size per worker.

generate_seeds(seed, task_ids)

Internal helper.

worker_init_fn(worker_id)

Worker initialization function for DataLoader.

lisbet.training.utils.generate_seeds(seed, task_ids)[source]#

Internal helper. Generates multiple seeds from the base one.

lisbet.training.utils.estimate_num_workers(n_tasks, batch_size, batch_size_per_worker=8)[source]#

Estimate the optimal number of DataLoader worker processes to use, based on the number of training tasks, the batch size, and the desired batch size per worker.

Parameters:
  • n_tasks (int) – The number of training tasks (e.g., datasets or splits) being processed.

  • batch_size (int) – The total batch size used for loading data.

  • batch_size_per_worker (int, optional) – The target batch size to be handled by each worker process (default: 8).

Returns:

num_workers – The estimated number of DataLoader worker processes to use.

Return type:

int

lisbet.training.utils.worker_init_fn(worker_id)[source]#

Worker initialization function for DataLoader.

This function sets a unique random seed for each DataLoader worker based on the worker ID, global rank, and task seed. This ensures that each worker operates with a different random seed, which is crucial for data shuffling and augmentation in distributed training scenarios.

Parameters:

worker_id (int) – The ID of the worker being initialized. This is typically an integer in the range [0, num_workers - 1].