lisbet.unsupervised#

LISBET module for sequence segmentation and dimensionality reduction.

Functions

`reduce_umap`(data_path[, num_dims, ...])	Dimensionality reduction using UMAP.
`segment_hmm`(data_path[, min_n_components, ...])	Segment time series data using Hidden Markov Models.

lisbet.unsupervised.segment_hmm(data_path, min_n_components=2, max_n_components=32, num_iter=10, data_filter=None, fit_frac=None, hmm_seed=None, n_jobs=-1, pretrained_path=None, output_path=None)[source]#

Segment time series data using Hidden Markov Models.

This function fits one or more HMM models to the embeddings and uses the models to segment the data into discrete states.

Parameters:

data_path (str or Path) – Path to the directory containing LISBET embeddings.
min_n_components (int, optional) – Minimum number of states to use in the HMM.
max_n_components (int, optional) – Maximum number of states to use in the HMM.
num_iter (int, default=10) – Maximum number of iterations for the Baum-Welch algorithm.
data_filter (callable, optional) – Function to filter the data before fitting.
fit_frac (float, optional) – Fraction of data to use for fitting. If None, use all data.
hmm_seed (int, optional) – Random seed for reproducibility.
n_jobs (int, default=-1) – Number of parallel jobs to run, -1 means using all processors.
pretrained_path (str or Path, optional) – Path to the directory containing pretrained HMM models. If None, models are trained from scratch.
output_path (str or Path, optional) – Path to save the results. If None, results are not saved to disk.

Returns:

predictions – Dictionary mapping the number of states to the predicted segments for each sequence.

Return type:

dict

Raises:

ValueError – If min_n_components or max_n_components are smaller than 2, or max_n_components is smaller than min_n_components.

lisbet.unsupervised.reduce_umap(data_path, num_dims=2, num_neighbors=60, data_filter=None, sample_size=None, sample_seed=None, umap_seed=None, output_path=None)[source]#: Dimensionality reduction using UMAP.