Data augmentation#

Data augmentation can improve model robustness and generalization by introducing variations during training. LISBET supports several augmentation techniques that can be combined and applied with configurable probabilities.

Python container support#

The pose augmentations automatically accept either an xarray.Dataset or a NumPy pose array; there is no transform-specific engine option. NumPy input must use the canonical shape (time, individuals, keypoints, space), and each augmentation returns the same container type that it received. For xarray input, coordinate labels are preserved (or permuted together with their data for a full coordinate permutation). PoseToTensor accepts both representations and produces a float32 tensor with shape (time, features).

Raw NumPy arrays do not contain coordinate labels or metadata. Consequently, visualization with PoseToVideo remains xarray-only, and custom NumPy transforms must use the canonical axis order rather than looking up coordinate names.

Comparison of data augmentation effects — **Visual comparison of data augmentation effects.** From left to right: Original sequence, RandomPermutation (individuals) showing identity swapping with color changes, RandomPermutation (space) showing x/y coordinate swapping, and RandomBlockPermutation showing identity swapping within a temporal block.#

Available augmentation techniques#

For every technique, p is the probability of applying the entire augmentation to a training sample.

all_perm_id: Randomly permute individual identities across all frames in a window
- Use this to make the model invariant to individual labels (e.g., “mouse1” vs “mouse2”)
- Particularly useful for self-supervised tasks where identity labels are arbitrary
- See the visualization above (second panel) for an example of identity permutation
all_perm_ax: Randomly permute spatial axes (x, y, z) across all frames in a window
- Use this to make the model invariant to coordinate system orientation
- ⚠️ Important: Only suitable for top-down view datasets (typical laboratory setups)
- Not recommended for front-view, side-view, or 3D datasets where axes have semantic meaning
- See the visualization above (third panel) for an example of spatial axis permutation
blk_perm_id: Randomly permute individual identities within a contiguous block of frames
- Creates temporal identity confusion within part of the window
- More challenging augmentation than all_perm_id
- Requires frac parameter to specify the nominal block size as a fraction of the window
- Uses uniform frame probability: every frame in the window has equal chance of being affected, regardless of its position (no boundary bias)
- Due to boundary handling, the actual number of affected frames may be smaller than frac × window_size when the block overlaps window edges. The expected probability per frame is approximately frac / (1 + frac). For example, with frac=0.3, each frame has ~23% chance of being affected.
- Note: frac controls the nominal block size, not the expected affected fraction. Even frac=1.0 yields ~50% probability per frame (not 100%), because the block can “hang off” either edge of the window. This is the tradeoff for achieving uniform frame probability.
- See the visualization above (fourth panel) for an example of block permutation
gauss_jitter: Add Gaussian coordinate noise across the full window
- When applied, adds independent noise to every coordinate
- Adds zero-mean Gaussian noise with standard deviation sigma (default 0.01)
- Robustifies against tracking jitter and keypoint mislocalization
- Coordinates are clamped to [0, 1] after noise
kp_ablation: Set selected keypoints to zero across the full window
- Independently selects each (keypoint, individual) pair using Bernoulli(pB)
- Sets every spatial coordinate (x, y, z) to zero for the selected pairs in every frame
- Simulates missing or occluded keypoints commonly occurring in real tracking data
- Helps models become robust to incomplete data / tracking failures
- Tune pB according to the desired ablation rate
rotation: Randomly rotate all keypoint coordinates consistently across a window
- Rotates around the center of the normalized [0, 1] coordinate space
- Supports 2D and 3D pose data
- max_angle controls the sampled range from -max_angle to +max_angle
- Post-rotation normalization can truncate, rescale, or retain out-of-range values

Usage examples#

Note

Use the --data_augmentation= format (with equals sign) to clearly separate the argument from its value. While this makes quoting optional in most shells, it’s still recommended for consistency and to prevent any potential shell interpretation of special characters (commas, colons).

Basic usage with default probability (1.0):

$ betman train_model \
    --data_augmentation="all_perm_id" \
    ... # other parameters

With custom probability:

$ betman train_model \
    --data_augmentation="all_perm_id:p=0.5" \
    ... # other parameters

This applies identity permutation to 50% of training samples.

Multiple augmentations:

$ betman train_model \
    --data_augmentation="all_perm_id:p=0.5,all_perm_ax:p=0.7" \
    ... # other parameters

Gaussian jitter + permutation:

$ betman train_model \
    --data_augmentation="all_perm_id:p=0.5,gauss_jitter:p=0.02:sigma=0.01" \
    ... # other parameters

Combined full pipeline:

$ betman train_model \
    --data_augmentation="all_perm_id:p=0.5,blk_perm_id:p=0.3:frac=0.2,gauss_jitter:p=0.02:sigma=0.01" \
    ... # other parameters

Keypoint ablation:

$ betman train_model \
    --data_augmentation="kp_ablation:p=0.05:pB=0.01" \
    ... # other parameters

This applies ablation to 5% of training windows. Within each selected window, every (keypoint, individual) pair has a 1% chance of being set to zero for all frames.

Combined augmentation pipeline with ablation:

$ betman train_model \
    --data_augmentation="all_perm_id:p=0.5,kp_ablation:p=0.03:pB=0.01" \
    ... # other parameters

Block permutation with fraction:

$ betman train_model \
    --data_augmentation="blk_perm_id:p=0.3:frac=0.2" \
    ... # other parameters

This applies identity permutation to a nominal 20%-sized block, with 30% probability. Boundary clipping can reduce the number of affected frames.

Random rotation:

$ betman train_model \
    --data_augmentation="rotation:p=0.5:max_angle=30" \
    ... # other parameters

This applies a rotation sampled between -30 and +30 degrees to 50% of training windows.

Combined augmentations for top-down view datasets:

$ betman train_model \
    -v \
    --task_ids=cons,order,shift,warp \
    --data_augmentation="all_perm_id:p=0.5,all_perm_ax:p=0.7,blk_perm_id:p=0.3:frac=0.2" \
    --data_format=movement \
    --run_id=lisbet64x8-calms21U-aug \
    --seed=1234 \
    --epochs=100 \
    --emb_dim=64 \
    --num_layers=8 \
    --num_heads=8 \
    --hidden_dim=256 \
    --train_sample=0.05 \
    --save_history \
    datasets/CalMS21/unlabeled_videos

Important considerations#

View-dependent augmentations: The all_perm_ax augmentation assumes symmetry across spatial axes and should only be used for top-down view datasets common in laboratory mouse experiments. For human datasets or non-overhead camera angles, this augmentation may hurt performance as axes have semantic meaning (e.g., up/down gravity, left/right lateral).
Task compatibility: Identity permutations (all_perm_id, blk_perm_id) are most beneficial for self-supervised tasks and datasets where individual identities are interchangeable.
Probability tuning: Start with moderate probabilities (0.3-0.7) and adjust based on validation performance. Higher probabilities increase variability but may make training less stable.
- For Gaussian jitter, tune the window-level p and noise scale sigma independently.
- Increase sigma gradually (e.g., 0.005 → 0.02), monitoring degradation in dev metrics.
- For keypoint ablation, tune the window-level p separately from the pair-level pB.
- Higher ablation rates train models that are more robust to missing data but may reduce performance on clean data.
Computational cost: Augmentations are applied on-the-fly during training and add minimal overhead. Block permutations (blk_perm_id) are slightly more expensive than full permutations.
- Gaussian jitter adds negligible overhead through vectorized operations.
- Ablation augmentations are extremely efficient (simple masking operations with negligible overhead).