lisbet.modeling#

PyTorch models and their extensions. The transformer model is based on ViT [1] and its reference implementation in JAX/Flax, available at google-research/vision_transformer.

Notes

[a] Early versions of LISBET were using TensorFlow/Keras.

References

[1] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X.,: Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ArXiv:2010.11929 [Cs]. http://arxiv.org/abs/2010.11929

class lisbet.modeling.FrameClassificationHead(output_token_idx, input_dim, num_classes, hidden_dim=None)[source]#

Frame-level classification head.

This head selects a specific token from the sequence (typically the last one) and applies a classification layer to predict frame-level labels.

Parameters:

output_token_idx (int) – Index of the token to use for classification (e.g., -1 for last token).
input_dim (int) – Dimension of the input embeddings (formerly emb_dim).
num_classes (int) – Number of output classes (formerly out_dim).
hidden_dim (int | None) – Dimension of the hidden layer. If None, uses a single linear layer. If provided, uses an MLP with the specified hidden dimension.

output_token_idx#

Index of the token used for classification.

Type:: int

logits#

Classification layer (either Linear or MLP).

Type:: nn.Module

__init__(output_token_idx, input_dim, num_classes, hidden_dim=None)[source]#: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)[source]#

Forward pass through the frame classification head.

Parameters:: x (Tensor) – Input tensor of shape (batch_size, sequence_length, input_dim).
Returns:: Classification logits of shape (batch_size, num_classes).
Return type:: Tensor

get_config()[source]#

Get the configuration dictionary for this head.

Returns:: Configuration dictionary containing all parameters needed to recreate this head instance.
Return type:: dict[str, Any]

classmethod from_config(config)[source]#

Create a FrameClassificationHead instance from a configuration dictionary.

Parameters:: config (dict[str, Any]) – Configuration dictionary containing all parameters needed to create the head instance.
Returns:: New FrameClassificationHead instance created from the configuration.
Return type:: FrameClassificationHead

class lisbet.modeling.WindowClassificationHead(input_dim, num_classes, hidden_dim=None)[source]#

Window-level classification head.

This head performs global max pooling over the sequence dimension and applies a classification layer to predict window-level labels.

Parameters:

input_dim (int) – Dimension of the input embeddings (formerly emb_dim).
num_classes (int) – Number of output classes (formerly out_dim).
hidden_dim (int | None) – Dimension of the hidden layer. If None, uses a single linear layer. If provided, uses an MLP with the specified hidden dimension.

logits#

Classification layer (either Linear or MLP).

Type:: nn.Module

__init__(input_dim, num_classes, hidden_dim=None)[source]#: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)[source]#

Forward pass through the window classification head.

Parameters:: x (Tensor) – Input tensor of shape (batch_size, sequence_length, input_dim).
Returns:: Classification logits of shape (batch_size, num_classes).
Return type:: Tensor

get_config()[source]#

Get the configuration dictionary for this head.

Returns:: Configuration dictionary containing all parameters needed to recreate this head instance.
Return type:: dict[str, Any]

classmethod from_config(config)[source]#

Create a WindowClassificationHead instance from a configuration dictionary.

Parameters:: config (dict[str, Any]) – Configuration dictionary containing all parameters needed to create the head instance.
Returns:: New WindowClassificationHead instance created from the configuration.
Return type:: WindowClassificationHead

class lisbet.modeling.EmbeddingHead(output_token_idx)[source]#

Embedding head for extracting behavior embeddings.

This head selects a specific token from the sequence (typically the last one) and returns it as the behavior embedding without any additional transformation.

Parameters:: output_token_idx (int) – Index of the token to use for embedding extraction (e.g., -1 for last token).

output_token_idx#

Index of the token used for embedding extraction.

Type:: int

__init__(output_token_idx)[source]#: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)[source]#

Forward pass through the embedding head.

Parameters:: x (Tensor) – Input tensor of shape (batch_size, sequence_length, embedding_dim).
Returns:: Embedding tensor of shape (batch_size, embedding_dim).
Return type:: Tensor

get_config()[source]#

Get the configuration dictionary for this head.

Returns:: Configuration dictionary containing all parameters needed to recreate this head instance.
Return type:: dict[str, Any]

classmethod from_config(config)[source]#

Create an EmbeddingHead instance from a configuration dictionary.

Parameters:: config (dict[str, Any]) – Configuration dictionary containing all parameters needed to create the head instance.
Returns:: New EmbeddingHead instance created from the configuration.
Return type:: EmbeddingHead

lisbet.modeling.model_info(model_path)[source]#: Print information about a LISBET model config file.

class lisbet.modeling.MultiTaskModel(backbone, task_heads, model_id='lisbet_model')[source]#

Multi-task model that combines a backbone with multiple task-specific heads.

This model enables training and inference across multiple tasks using a shared backbone representation. Each task has its own dedicated head that processes the backbone output.

Parameters:

backbone (BackboneInterface) – The backbone model that processes input sequences and produces shared representations.
task_heads (dict[str, Module]) – Dictionary mapping task IDs to their corresponding task-specific heads.

backbone#

The shared backbone model.

Type:: BackboneInterface

task_heads#

Dictionary of task-specific heads.

Type:: nn.ModuleDict

model_id#

Identifier for the model instance, useful for logging or saving. Defaults to “lisbet_model”.

Type:: str

__init__(backbone, task_heads, model_id='lisbet_model')[source]#: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x, task_id)[source]#

Forward pass through the model for a specific task.

Parameters:

x (Tensor) – Input tensor of shape (batch_size, sequence_length, input_dim).
task_id (str) – Identifier for the task to use. Must be a key in task_heads.

Returns:

Task-specific output tensor. Shape depends on the specific task head.

Return type:

Tensor

Raises:

KeyError – If task_id is not found in the available task heads.

get_task_ids()[source]#

Get the list of available task IDs.

Returns:: List of task IDs that can be used with this model.
Return type:: list[str]

get_config()[source]#

Get the configuration dictionary for this model.

Returns:: Configuration dictionary containing backbone config and task head configs.
Return type:: dict[str, Any]

classmethod from_config(config, backbone_registry=None, head_registry=None)[source]#

Create a MultiTaskModel instance from a configuration dictionary.

Parameters:

config (dict[str, Any]) – Configuration dictionary containing backbone and task head configs.
backbone_registry (dict[str, type] | None) – Registry mapping backbone type names to their classes. If None, uses a default registry.
head_registry (dict[str, type] | None) – Registry mapping head type names to their classes. If None, uses a default registry.

Returns:

New MultiTaskModel instance created from the configuration.

Return type:

MultiTaskModel

Raises:

ValueError – If backbone or head types are not found in the registries.

class lisbet.modeling.LSTMBackbone(feature_dim, embedding_dim, hidden_dim, num_layers)[source]#

LSTM backbone for sequence modeling.

Parameters:

feature_dim (int) – Dimension of the input features.
embedding_dim (int) – Dimension of the output embeddings.
hidden_dim (int) – Dimension of the LSTM hidden state.
num_layers (int) – Number of LSTM layers.

__init__(feature_dim, embedding_dim, hidden_dim, num_layers)[source]#: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)[source]#

Forward pass through the LSTM backbone.

Parameters:: x (Tensor) – Input tensor of shape (batch_size, sequence_length, feature_dim).
Returns:: Output tensor of shape (batch_size, sequence_length, embedding_dim).
Return type:: Tensor

get_config()[source]#

Get the configuration dictionary for this backbone.

Return type:: dict[str, Any]

classmethod from_config(config)[source]#

Create an LSTMBackbone instance from a configuration dictionary.

Return type:: LSTMBackbone

training: bool#

class lisbet.modeling.TransformerBackbone(feature_dim, embedding_dim, hidden_dim, num_heads, num_layers, max_length)[source]#

Transformer backbone for sequence modeling.

A transformer-based backbone that processes input sequences using self-attention mechanisms. The backbone includes frame embedding, positional embedding, transformer encoder layers, and layer normalization.

Parameters:

feature_dim (int) – Dimension of the input features.
embedding_dim (int) – Dimension of the output embeddings.
hidden_dim (int) – Dimension of the feedforward network inside transformer layers.
num_heads (int) – Number of attention heads in the multi-head attention mechanism.
num_layers (int) – Number of transformer encoder layers.
max_length (int) – Maximum sequence length for positional embeddings.

frame_embedder#

Linear layer for embedding input frames.

Type:: nn.Linear

pos_embedder#

Positional embedding module.

Type:: PosEmbedding

transformer_encoder#

Stack of transformer encoder layers.

Type:: nn.TransformerEncoder

layer_norm#

Layer normalization applied to the output.

Type:: nn.LayerNorm

__init__(feature_dim, embedding_dim, hidden_dim, num_heads, num_layers, max_length)[source]#: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)[source]#

Forward pass through the transformer backbone.

Parameters:: x (Tensor) – Input tensor of shape (batch_size, sequence_length, feature_dim).
Returns:: Output tensor of shape (batch_size, sequence_length, embedding_dim).
Return type:: Tensor

get_config()[source]#

Get the configuration dictionary for this backbone.

Returns:: Configuration dictionary containing all parameters needed to recreate this backbone instance.
Return type:: dict[str, Any]

classmethod from_config(config)[source]#

Create a TransformerBackbone instance from a configuration dictionary.

Parameters:: config (dict[str, Any]) – Configuration dictionary containing all parameters needed to create the backbone instance.
Returns:: New TransformerBackbone instance created from the configuration.
Return type:: TransformerBackbone

training: bool#

Modules

`backbones`
`factory`	Model factory utilities for LISBET.
`heads`
`info`
`models`	Multi-task model for different tasks.
`modules_extra`	Extra modules for various tasks, such as positional embedding and MLP.