ShmDataLoader#

class fkat.data.ShmDataLoader(seed: int, dataloader_factory: DataLoaderFactory[T_co], num_microbatch_prefetches: int = -1, dp_rank: int = 0, profiler: Optional[Profiler] = None, device: Optional[device] = None, multiprocessing: bool = True)[source]#

Bases: Iterable[list[T_co]]

A DataLoader that uses shared memory to efficiently manage and prefetch data batches.

Enables double-buffered micro-batch processing and fetching that overlaps with model forward/backward passes, minimizing dataloading overhead.

Parameters:
  • seed (int) – Random seed for reproducibility. Use ${seed} at top level in config.yaml.

  • dataloader_factory (DataLoaderFactory[T_co]) – Factory for creating DataLoaders.

  • num_microbatch_prefetches (int, optional) – Number of microbatches to prefetch. Defaults to -1.

  • dp_rank (int, optional) – Rank of the current process. Defaults to 0.

  • profiler (Optional[Profiler], optional) – Profiler for profiling. Defaults to None.

  • device (Optional[torch.device]) – device to move the microbatches to in the background

  • multiprocessing (Optional[True]) – whether to instantiate DataLoader in a separate process. Defaults to True to relieve pressure from the training process, use False to debug and profile

load_batch() None[source]#
load_batch_sync() list[+T_co][source]#
on_exception(exception: BaseException) None[source]#
prefetch() None[source]#
set_device(device: torch.device | None) None[source]#
teardown(*args: Any) None[source]#