Data augmentation#

Data augmentation can improve model robustness and generalization by introducing variations during training. LISBET supports several augmentation techniques that can be combined and applied with configurable probabilities.

Comparison of data augmentation effects

Visual comparison of data augmentation effects. From left to right: Original sequence, RandomPermutation (individuals) showing identity swapping with color changes, RandomPermutation (space) showing x/y coordinate swapping, and RandomBlockPermutation showing identity swapping within a temporal block.#

Available augmentation techniques#

  • all_perm_id: Randomly permute individual identities across all frames in a window
    • Use this to make the model invariant to individual labels (e.g., “mouse1” vs “mouse2”)

    • Particularly useful for self-supervised tasks where identity labels are arbitrary

    • See the visualization above (second panel) for an example of identity permutation

  • all_perm_ax: Randomly permute spatial axes (x, y, z) across all frames in a window
    • Use this to make the model invariant to coordinate system orientation

    • ⚠️ Important: Only suitable for top-down view datasets (typical laboratory setups)

    • Not recommended for front-view, side-view, or 3D datasets where axes have semantic meaning

    • See the visualization above (third panel) for an example of spatial axis permutation

  • blk_perm_id: Randomly permute individual identities within a contiguous block of frames
    • Creates temporal identity confusion within part of the window

    • More challenging augmentation than all_perm_id

    • Requires frac parameter to specify the nominal block size as a fraction of the window

    • Uses uniform frame probability: every frame in the window has equal chance of being affected, regardless of its position (no boundary bias)

    • Due to boundary handling, the actual number of affected frames may be smaller than frac × window_size when the block overlaps window edges. The expected probability per frame is approximately frac / (1 + frac). For example, with frac=0.3, each frame has ~23% chance of being affected.

    • Note: frac controls the nominal block size, not the expected affected fraction. Even frac=1.0 yields ~50% probability per frame (not 100%), because the block can “hang off” either edge of the window. This is the tradeoff for achieving uniform frame probability.

    • See the visualization above (fourth panel) for an example of block permutation

  • gauss_jitter: Inject sparse Gaussian coordinate noise
    • Per-element Bernoulli(p) over (frame, keypoint, individual) selects positions to jitter

    • Adds zero-mean Gaussian noise with standard deviation sigma (default 0.01)

    • Robustifies against sporadic tracking jitter / keypoint mislocalization

    • Coordinates are clamped to [0, 1] after noise

  • kp_ablation: Randomly set keypoint coordinates to NaN (missing data simulation) accros all frames in a window
    • Select the keypoints to ablate using a Bernoulli(pB) over (keypoint, individual)

    • Sets all spatial coordinates (x, y, z) to NaN for selected elements

    • Simulates missing or occluded keypoints commonly occurring in real tracking data

    • Helps models become robust to incomplete data / tracking failures

    • Recommended p values: 0.01–0.1 depending on desired robustness level

Usage examples#

Note

Use the --data_augmentation= format (with equals sign) to clearly separate the argument from its value. While this makes quoting optional in most shells, it’s still recommended for consistency and to prevent any potential shell interpretation of special characters (commas, colons).

Basic usage with default probability (1.0):

$ betman train_model \
    --data_augmentation="all_perm_id" \
    ... # other parameters

With custom probability:

$ betman train_model \
    --data_augmentation="all_perm_id:p=0.5" \
    ... # other parameters

This applies identity permutation to 50% of training samples.

Multiple augmentations:

$ betman train_model \
    --data_augmentation="all_perm_id:p=0.5,all_perm_ax:p=0.7" \
    ... # other parameters

Gaussian jitter (sparse) + permutation:

$ betman train_model \
    --data_augmentation="all_perm_id:p=0.5,gauss_jitter:p=0.02:sigma=0.01" \
    ... # other parameters

Block jitter (bursts) example:

$ betman train_model \
    --data_augmentation="blk_gauss_jitter:p=0.05:sigma=0.02:frac=0.1" \
    ... # other parameters

Combined full pipeline:

$ betman train_model \
    --data_augmentation="all_perm_id:p=0.5,blk_perm_id:p=0.3:frac=0.2,gauss_jitter:p=0.02:sigma=0.01,blk_gauss_jitter:p=0.05:sigma=0.02:frac=0.1" \
    ... # other parameters

Keypoint ablation (sparse missing data):

$ betman train_model \
    --data_augmentation="kp_ablation:p=0.05:pB=0.01" \
    ... # other parameters

This randomly sets 5% of keypoint coordinates to NaN (missing), simulating sporadic occlusions.

Combined augmentation pipeline with ablation:

$ betman train_model \
    --data_augmentation="all_perm_id:p=0.5,kp_ablation:p=0.03:pB=0.01" \
    ... # other parameters

Block permutation with fraction:

$ betman train_model \
    --data_augmentation="blk_perm_id:p=0.3:frac=0.2" \
    ... # other parameters

This applies identity permutation to a random 20% block of frames, with 30% probability.

Combined augmentations for top-down view datasets:

$ betman train_model \
    -v \
    --task_ids=cons,order,shift,warp \
    --data_augmentation="all_perm_id:p=0.5,all_perm_ax:p=0.7,blk_perm_id:p=0.3:frac=0.2" \
    --data_format=movement \
    --run_id=lisbet64x8-calms21U-aug \
    --seed=1234 \
    --epochs=100 \
    --emb_dim=64 \
    --num_layers=8 \
    --num_heads=8 \
    --hidden_dim=256 \
    --train_sample=0.05 \
    --save_history \
    datasets/CalMS21/unlabeled_videos

Important considerations#

  • View-dependent augmentations: The all_perm_ax augmentation assumes symmetry across spatial axes and should only be used for top-down view datasets common in laboratory mouse experiments. For human datasets or non-overhead camera angles, this augmentation may hurt performance as axes have semantic meaning (e.g., up/down gravity, left/right lateral).

  • Task compatibility: Identity permutations (all_perm_id, blk_perm_id) are most beneficial for self-supervised tasks and datasets where individual identities are interchangeable.

  • Probability tuning: Start with moderate probabilities (0.3-0.7) and adjust based on validation performance. Higher probabilities increase variability but may make training less stable.
    • For jitter augmentations, recommended initial values: gauss_jitter p≈0.01–0.05, blk_gauss_jitter p≈0.02 with frac≈0.05–0.15.

    • Increase sigma gradually (e.g., 0.005 → 0.02) monitoring degradation in dev metrics.

    • For ablation augmentations, recommended initial values: kp_ablation p≈0.01–0.05 with pB≈0.01–0.02.

    • Higher ablation rates train models that are more robust to missing data but may reduce performance on clean data.

  • Computational cost: Augmentations are applied on-the-fly during training and add minimal overhead. Block permutations (blk_perm_id) are slightly more expensive than full permutations.
    • Jitter augmentations add negligible overhead (vectorized operations). Window jitter scales linearly with number of sampled starts; low p keeps cost minimal.

    • Ablation augmentations are extremely efficient (simple masking operations with negligible overhead).