debugging#

class fkat.pytorch.callbacks.debugging.Introspection(checksums: Optional[set[str]] = None, tensor_stats: Optional[set[str]] = None, env_vars: bool = False, pip_freeze: bool = False, output_path_prefix: Optional[str] = None, schedule: Optional[Schedule] = None)[source]#
on_before_optimizer_step(trainer: Trainer, pl_module: LightningModule, optimizer: Optimizer) None[source]#

Called before optimizer.step().

on_train_batch_end(trainer: L.Trainer, pl_module: L.LightningModule, outputs: STEP_OUTPUT, batch: Any, batch_idx: int) None[source]#

Called when the train batch ends.

Note

The value outputs["loss"] here will be the normalized value w.r.t accumulate_grad_batches of the loss returned from training_step.

on_train_batch_start(trainer: Trainer, pl_module: LightningModule, batch: Any, batch_idx: int) None[source]#

Called when the train batch begins.

on_train_end(trainer: Trainer, pl_module: LightningModule) None[source]#

Remove hooks at the end of training

on_train_start(trainer: Trainer, pl_module: LightningModule) None[source]#

Register hooks at the start of training to capture all gradients including the first step

setup(trainer: Trainer, pl_module: LightningModule, stage: str) None[source]#

Called when fit, validate, test, predict, or tune begins.

class fkat.pytorch.callbacks.debugging.OptimizerSnapshot(output_path_prefix: str, schedule: Optional[Schedule] = None)[source]#

Callback that saves optimizer state at specified intervals during training.

This callback allows you to capture the state of optimizers at specific points during training, which can be useful for debugging, analysis, or resuming training from specific optimization states.

Parameters:
  • output_path_prefix (str) – Output path prefix for generated optimizer snapshots.

  • schedule (Optional[Schedule]) – Schedule at which to take a snapshot of optimizers. Defaults to Never

on_train_batch_start(trainer: Trainer, pl_module: LightningModule, batch: Any, batch_idx: int) None[source]#

Called when the train batch begins.