lisbet.modeling#

PyTorch models and their extensions. The transformer model is based on ViT [1] and its reference implementation in JAX/Flax, available at google-research/vision_transformer.

Notes

[a] Early versions of LISBET were using TensorFlow/Keras.

References

[1] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X.,

Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ArXiv:2010.11929 [Cs]. http://arxiv.org/abs/2010.11929

class lisbet.modeling.FrameClassificationHead(output_token_idx, input_dim, num_classes, hidden_dim=None)[source]#

Frame-level classification head.

This head selects a specific token from the sequence (typically the last one) and applies a classification layer to predict frame-level labels.

Parameters:
  • output_token_idx (int) – Index of the token to use for classification (e.g., -1 for last token).

  • input_dim (int) – Dimension of the input embeddings (formerly emb_dim).

  • num_classes (int) – Number of output classes (formerly out_dim).

  • hidden_dim (int | None) – Dimension of the hidden layer. If None, uses a single linear layer. If provided, uses an MLP with the specified hidden dimension.

output_token_idx#

Index of the token used for classification.

Type:

int

logits#

Classification layer (either Linear or MLP).

Type:

nn.Module

__init__(output_token_idx, input_dim, num_classes, hidden_dim=None)[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)[source]#

Forward pass through the frame classification head.

Parameters:

x (Tensor) – Input tensor of shape (batch_size, sequence_length, input_dim).

Returns:

Classification logits of shape (batch_size, num_classes).

Return type:

Tensor

get_config()[source]#

Get the configuration dictionary for this head.

Returns:

Configuration dictionary containing all parameters needed to recreate this head instance.

Return type:

dict[str, Any]

classmethod from_config(config)[source]#

Create a FrameClassificationHead instance from a configuration dictionary.

Parameters:

config (dict[str, Any]) – Configuration dictionary containing all parameters needed to create the head instance.

Returns:

New FrameClassificationHead instance created from the configuration.

Return type:

FrameClassificationHead

class lisbet.modeling.WindowClassificationHead(input_dim, num_classes, hidden_dim=None)[source]#

Window-level classification head.

This head performs global max pooling over the sequence dimension and applies a classification layer to predict window-level labels.

Parameters:
  • input_dim (int) – Dimension of the input embeddings (formerly emb_dim).

  • num_classes (int) – Number of output classes (formerly out_dim).

  • hidden_dim (int | None) – Dimension of the hidden layer. If None, uses a single linear layer. If provided, uses an MLP with the specified hidden dimension.

logits#

Classification layer (either Linear or MLP).

Type:

nn.Module

__init__(input_dim, num_classes, hidden_dim=None)[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)[source]#

Forward pass through the window classification head.

Parameters:

x (Tensor) – Input tensor of shape (batch_size, sequence_length, input_dim).

Returns:

Classification logits of shape (batch_size, num_classes).

Return type:

Tensor

get_config()[source]#

Get the configuration dictionary for this head.

Returns:

Configuration dictionary containing all parameters needed to recreate this head instance.

Return type:

dict[str, Any]

classmethod from_config(config)[source]#

Create a WindowClassificationHead instance from a configuration dictionary.

Parameters:

config (dict[str, Any]) – Configuration dictionary containing all parameters needed to create the head instance.

Returns:

New WindowClassificationHead instance created from the configuration.

Return type:

WindowClassificationHead

class lisbet.modeling.EmbeddingHead(output_token_idx)[source]#

Embedding head for extracting behavior embeddings.

This head selects a specific token from the sequence (typically the last one) and returns it as the behavior embedding without any additional transformation.

Parameters:

output_token_idx (int) – Index of the token to use for embedding extraction (e.g., -1 for last token).

output_token_idx#

Index of the token used for embedding extraction.

Type:

int

__init__(output_token_idx)[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)[source]#

Forward pass through the embedding head.

Parameters:

x (Tensor) – Input tensor of shape (batch_size, sequence_length, embedding_dim).

Returns:

Embedding tensor of shape (batch_size, embedding_dim).

Return type:

Tensor

get_config()[source]#

Get the configuration dictionary for this head.

Returns:

Configuration dictionary containing all parameters needed to recreate this head instance.

Return type:

dict[str, Any]

classmethod from_config(config)[source]#

Create an EmbeddingHead instance from a configuration dictionary.

Parameters:

config (dict[str, Any]) – Configuration dictionary containing all parameters needed to create the head instance.

Returns:

New EmbeddingHead instance created from the configuration.

Return type:

EmbeddingHead

lisbet.modeling.model_info(model_path)[source]#

Print information about a LISBET model config file.

class lisbet.modeling.MultiTaskModel(backbone, task_heads, model_id='lisbet_model')[source]#

Multi-task model that combines a backbone with multiple task-specific heads.

This model enables training and inference across multiple tasks using a shared backbone representation. Each task has its own dedicated head that processes the backbone output.

Parameters:
  • backbone (BackboneInterface) – The backbone model that processes input sequences and produces shared representations.

  • task_heads (dict[str, Module]) – Dictionary mapping task IDs to their corresponding task-specific heads.

backbone#

The shared backbone model.

Type:

BackboneInterface

task_heads#

Dictionary of task-specific heads.

Type:

nn.ModuleDict

model_id#

Identifier for the model instance, useful for logging or saving. Defaults to “lisbet_model”.

Type:

str

__init__(backbone, task_heads, model_id='lisbet_model')[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x, task_id)[source]#

Forward pass through the model for a specific task.

Parameters:
  • x (Tensor) – Input tensor of shape (batch_size, sequence_length, input_dim).

  • task_id (str) – Identifier for the task to use. Must be a key in task_heads.

Returns:

Task-specific output tensor. Shape depends on the specific task head.

Return type:

Tensor

Raises:

KeyError – If task_id is not found in the available task heads.

get_task_ids()[source]#

Get the list of available task IDs.

Returns:

List of task IDs that can be used with this model.

Return type:

list[str]

get_config()[source]#

Get the configuration dictionary for this model.

Returns:

Configuration dictionary containing backbone config and task head configs.

Return type:

dict[str, Any]

classmethod from_config(config, backbone_registry=None, head_registry=None)[source]#

Create a MultiTaskModel instance from a configuration dictionary.

Parameters:
  • config (dict[str, Any]) – Configuration dictionary containing backbone and task head configs.

  • backbone_registry (dict[str, type] | None) – Registry mapping backbone type names to their classes. If None, uses a default registry.

  • head_registry (dict[str, type] | None) – Registry mapping head type names to their classes. If None, uses a default registry.

Returns:

New MultiTaskModel instance created from the configuration.

Return type:

MultiTaskModel

Raises:

ValueError – If backbone or head types are not found in the registries.

class lisbet.modeling.LSTMBackbone(feature_dim, embedding_dim, hidden_dim, num_layers)[source]#

LSTM backbone for sequence modeling.

Parameters:
  • feature_dim (int) – Dimension of the input features.

  • embedding_dim (int) – Dimension of the output embeddings.

  • hidden_dim (int) – Dimension of the LSTM hidden state.

  • num_layers (int) – Number of LSTM layers.

__init__(feature_dim, embedding_dim, hidden_dim, num_layers)[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)[source]#

Forward pass through the LSTM backbone.

Parameters:

x (Tensor) – Input tensor of shape (batch_size, sequence_length, feature_dim).

Returns:

Output tensor of shape (batch_size, sequence_length, embedding_dim).

Return type:

Tensor

get_config()[source]#

Get the configuration dictionary for this backbone.

Return type:

dict[str, Any]

classmethod from_config(config)[source]#

Create an LSTMBackbone instance from a configuration dictionary.

Return type:

LSTMBackbone

class lisbet.modeling.TransformerBackbone(feature_dim, embedding_dim, hidden_dim, num_heads, num_layers, max_length)[source]#

Transformer backbone for sequence modeling.

A transformer-based backbone that processes input sequences using self-attention mechanisms. The backbone includes frame embedding, positional embedding, transformer encoder layers, and layer normalization.

Parameters:
  • feature_dim (int) – Dimension of the input features.

  • embedding_dim (int) – Dimension of the output embeddings.

  • hidden_dim (int) – Dimension of the feedforward network inside transformer layers.

  • num_heads (int) – Number of attention heads in the multi-head attention mechanism.

  • num_layers (int) – Number of transformer encoder layers.

  • max_length (int) – Maximum sequence length for positional embeddings.

frame_embedder#

Linear layer for embedding input frames.

Type:

nn.Linear

pos_embedder#

Positional embedding module.

Type:

PosEmbedding

transformer_encoder#

Stack of transformer encoder layers.

Type:

nn.TransformerEncoder

layer_norm#

Layer normalization applied to the output.

Type:

nn.LayerNorm

__init__(feature_dim, embedding_dim, hidden_dim, num_heads, num_layers, max_length)[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)[source]#

Forward pass through the transformer backbone.

Parameters:

x (Tensor) – Input tensor of shape (batch_size, sequence_length, feature_dim).

Returns:

Output tensor of shape (batch_size, sequence_length, embedding_dim).

Return type:

Tensor

get_config()[source]#

Get the configuration dictionary for this backbone.

Returns:

Configuration dictionary containing all parameters needed to recreate this backbone instance.

Return type:

dict[str, Any]

classmethod from_config(config)[source]#

Create a TransformerBackbone instance from a configuration dictionary.

Parameters:

config (dict[str, Any]) – Configuration dictionary containing all parameters needed to create the backbone instance.

Returns:

New TransformerBackbone instance created from the configuration.

Return type:

TransformerBackbone

Modules

backbones

factory

Model factory utilities for LISBET.

heads

info

models

Multi-task model for different tasks.

modules_extra

Extra modules for various tasks, such as positional embedding and MLP.