Annotating new data using selected prototypes#

In the original design of the discovery-driven pipeline, prototype selection was intended as the final step following the HMM scan. However, some use cases may require labeling a new dataset using previously selected prototypes, without re-running the HMM scan or re-analyzing all prototypes.

Re-running the HMM scan on new data, even when it includes the original dataset, typically results in a permutation of the prototype labels. This permutation can, in principle, be corrected through automated label alignment, but the process would add complexity and slow down workflows. More importantly, the inclusion of new data can introduce new prototypes if novel behavioral patterns are detected. While this dynamic discovery is a valuable feature of the pipeline, it complicates analyses where the goal is specifically to track known prototypes across additional datasets, for example to study behavior consistency or investigate associated brain circuit dynamics.

To address this need, we have implemented two methods for annotating new data using the prototypes selected in the original dataset. The first method is to train a LISBET classifier on the selected prototypes and then use it to label new data. The second method uses cached HMMs to annotate new data, but it is not recommended for most users due to its complexity and potential safety issues.

It is worth noting that the cached HMMs method provides an exact match to the original prototype labels, whereas the the classifier approach only offers an approximation. In our experience, the advantages of the classifier approach outweigh its drawbacks, as it is less error-prone and yields a reusable model that can be shared for future use.

Alternative: Using Cached HMMs#

For advanced users, LISBET allows you to use cached HMM models to annotate new data. This method is not recommended for most users due to complexity and potential safety issues with loading pickle files.

If you wish to proceed:

  1. Ensure you have the original HMM .joblib files saved from the initial scan.

  2. Run:

    $ betman segment_motifs \
        --pretrained_path=PATH_TO_HMM_MODELS \
        --output_path=NEW_OUTPUT_PATH \
        datasets/NewDataset
    

    You can then extract the relevant prototype columns from the output annotation files.

Warning

Loading pickle/joblib files can be unsafe if the source is untrusted. Only use this method with files you generated yourself, DO NOT LOAD PICKLE FILES FROM UNTRUSTED SOURCES.

References#

Sun, J. J., Karigo, T., Chakraborty, D., Mohanty, S. P., Wild, B., Sun, Q., Chen, C., Anderson, D. J., Perona, P., Yue, Y., & Kennedy, A. (2021). The Multi-Agent Behavior Dataset: Mouse Dyadic Social Interactions (arXiv:2104.02710). arXiv. https://doi.org/10.48550/arXiv.2104.02710

Chindemi, G., Girard, B., & Bellone, C. (2023). LISBET: a machine learning model for the automatic segmentation of social behavior motifs (arXiv:2311.04069). arXiv. https://doi.org/10.48550/arXiv.2311.04069