Command Line Interfaces

sl-manage

This Command-Line Interface (CLI) allows managing session and project data acquired in the Sun lab.

This CLI is intended to run on the Sun lab remote compute server(s) and should not be called by the end-user directly. Instead, commands from this CLI are designed to be accessed through the bindings in the sl-experiment and sl-forgery libraries.

sl-manage [OPTIONS] COMMAND [ARGS]...

project

This group provides commands for managing the data of a Sun lab project.

Commands from this group are used to support all interactions with the data stored on the Sun lab remote compute server(s).

sl-manage project [OPTIONS] COMMAND [ARGS]...

Options

-pp, --project-path <project_path>

Required The absolute path to the project-specific directory where raw session data is stored.

-pdr, --processed-data-root <processed_data_root>

The absolute path to the directory that stores the processed data from all Sun lab projects, if it is different from the root directory included in the ‘session-path’ argument value.

manifest

Generates the manifest .feather file that captures the current state of the target project’s data.

The manifest file contains the comprehensive snapshot of the available project’s data. It includes the information about the management and processing pipelines that have been applied to each session’s data, as well as the descriptive information about each session. The manifest file is used as an entry-point for all interactions with the Sun lab data stored on the remote compute server(s).

sl-manage project manifest [OPTIONS]

session

This group provides commands for managing the data of a Sun lab data acquisition session.

Commands from this group are used to support data processing and dataset-formation (forging) on remote compute servers.

sl-manage session [OPTIONS] COMMAND [ARGS]...

Options

-sp, --session-path <session_path>

Required The absolute path to the root session directory to process. This directory must contain the ‘raw_data’ subdirectory.

-pdr, --processed-data-root <processed_data_root>

The absolute path to the directory that stores the processed data from all Sun lab projects, if it is different from the root directory included in the ‘session-path’ argument value.

-id, --manager-id <manager_id>

Required The unique identifier of the process that manages this runtime.

Default:

0

-r, --reset-tracker

Determines whether to forcibly reset the tracker file for the target session management pipeline before processing runtime. This flag should only be used in exceptional cases to recover from improper runtime terminations.

archive

Prepares the target session for long-term storage by moving all session data to the storage volume.

This command is primarily intended to run on remote compute servers that use slow HDD volumes to maximize data integrity and fast NVME volumes to maximize data processing speed. For such systems, moving all sessions that are no longer actively processed or analyzed to the slow drive volume frees up the processing volume space and ensures long-term data integrity.

sl-manage session archive [OPTIONS]

checksum

Resolves the data integrity checksum for the target session’s ‘raw_data’ directory.

This command can be used to verify the integrity of the session’s ‘raw_data’ directory using an existing checksum or to re-generate the checksum to reflect the current state of the directory. It only works with the ‘raw_data’ session directory and ignores all other directories. Primarily, this command is used to verify the integrity of the session’s data as it is transferred from data acquisition systems to long-term storage destinations.

sl-manage session checksum [OPTIONS]

Options

-rc, --recalculate-checksum

Determines whether to recalculate and overwrite the cached checksum value for the processed session. When the command is called with this flag, it effectively re-checksums the data instead of verifying its integrity.

prepare

Prepares the target session for data processing by moving all session data to the working volume.

This command is intended to run on remote compute servers that use slow HDD volumes to maximize data integrity and fast NVME volumes to maximize data processing speed. For such systems, moving the data to the fast volume before processing results in a measurable processing time decrease.

sl-manage session prepare [OPTIONS]

sl-configure

This Command-Line Interface (CLI) allows configuring major components of the Sun lab data acquisition, processing, and analysis workflow, such as acquisition systems and compute server(s).

sl-configure [OPTIONS] COMMAND [ARGS]...

directory

Sets the input directory as the Sun lab working directory, creating any missing path components.

This command as the initial entry-point for setting up any machine (PC) to work with Sun lab libraries and data. After the working directory is configured, all calls to this and all other Sun lab libraries automatically use this directory to store the configuration and runtime data required to perform any requested task. This allows all Sun lab libraries to behave consistently across different user machines and runtime contexts.

sl-configure directory [OPTIONS]

Options

-d, --directory <directory>

Required The absolute path to the directory where to cache Sun lab configuration and local runtime data.

server

Generates a service or user server access credentials’ file.

This command is used to set up access to the lab’s remote compute server(s). The Server class uses the data stored inside the generated credentials .yaml file to connect to and execute remote jobs on the target compute server(s). Depending on the configuration, this command generates either the ‘user_credentials.yaml’ or ‘service_credentials.yaml’ file.

sl-configure server [OPTIONS]

Options

-u, --username <username>

Required The username to use for server authentication.

-p, --password <password>

Required The password to use for server authentication.

-s, --service

Determines whether the credentials’ file is created for a service account. This determines the name of the generated file. Do not provide this flag unless creating a service credentials file.

-h, --host <host>

Required The host name or IP address of the server.

Default:

'cbsuwsun.biohpc.cornell.edu'

-sr, --storage-root <storage_root>

Required The absolute path to to the root storage server directory. Typically, this is the path to the top-level (root) directory of the HDD RAID volume.

Default:

'/local/storage'

-wr, --working-root <working_root>

Required The absolute path to the root working server directory. Typically, this is the path to the top-level (root) directory of the NVME RAID volume. If the server uses the same volume for both storing and working with data, set this to the same path as the ‘storage-root’ argument.

Default:

'/local/workdir'

-sd, --shared-directory <shared_directory>

Required The name of the shared directory used to store all Sun lab project data on all server volumes.

Default:

'sun_data'

system

Generates the configuration file for the specified data acquisition system.

This command is typically used when setting up new data acquisition systems in the lab. The sl-experiment library uses the created file to load the acquisition system configuration data during data acquisition runtimes. The system configuration only needs to be created on the machine (PC) that runs the sl-experiment library and manages the acquisition runtime if the system uses multiple machines (PCs). Once the system configuration .yaml file is created via this command, edit the file to modify the acquisition system configuration at any time.

sl-configure system [OPTIONS]

Options

-s, --system <system>

Required The type (name) of the data acquisition system for which to generate the configuration file.

Default:

<AcquisitionSystems.MESOSCOPE_VR: 'mesoscope-vr'>

Options:

mesoscope-vr

Tools

This package provides helper tools used to automate routine operations, such as transferring or verifying the integrity of the data. The tools from this package are used by most other data processing libraries in the lab.

class sl_shared_assets.tools.ProjectManifest(manifest_file)

Bases: object

Wraps the contents of a Sun lab project manifest .feather file and exposes methods for visualizing and working with the data stored inside the file.

This class functions as a high-level API for working with Sun lab projects. It is used both to visualize the current state of various projects and during automated data processing to determine which processing steps to apply to different sessions.

Parameters:

manifest_file (Path) – The path to the .feather manifest file that stores the target project’s state data.

_data

Stores the manifest data as a Polars DataFrame.

_animal_string

Determines whether animal IDs are stored as strings or unsigned integers.

property animals: tuple[str, ...]

Returns all unique animal IDs stored inside the manifest file.

This provides a tuple of all animal IDs participating in the target project.

get_session_info(session)

Returns a Polars DataFrame that stores detailed information for the specified session.

Since session IDs are unique, it is expected that filtering by session ID is enough to get the requested information.

Parameters:

session (str) – The ID of the session for which to retrieve the data.

Returns:

‘animal’, ‘date’, ‘notes’, ‘session’, ‘type’, ‘system’, ‘complete’, ‘integrity’, ‘suite2p’, ‘behavior’, ‘video’, ‘archived’.

Return type:

A Polars DataFrame with the following columns

get_sessions(animal=None, exclude_incomplete=True)

Returns requested session IDs based on selected filtering criteria.

This method provides a tuple of sessions based on the specified filters. If no animal is specified, returns sessions for all animals in the project.

Parameters:
  • animal (str | int | None, default: None) – An optional animal ID to filter the sessions. If set to None, the method returns sessions for all animals.

  • exclude_incomplete (bool, default: True) – Determines whether to exclude sessions not marked as ‘complete’ from the output list.

Return type:

tuple[str, ...]

Returns:

The tuple of session IDs matching the filter criteria.

Raises:

ValueError – If the specified animal is not found in the manifest file.

print_data()

Prints the entire contents of the manifest file to the terminal.

Return type:

None

print_notes(animal=None)

Prints only animal, session, and notes data from the manifest file.

This data view is optimized for experimenters to check what sessions have been recorded for each animal in the project and refresh their memory on the outcomes of each session using experimenter notes.

Parameters:

animal (str | int | None, default: None) – The ID of the animal for which to display the data. If an ID is provided, this method will only display the data for that animal. Otherwise, it will display the data for all animals.

Return type:

None

print_summary(animal=None)

Prints a summary view of the manifest file to the terminal, excluding the ‘experimenter notes’ data for each session.

This data view is optimized for tracking which processing steps have been applied to each session inside the project.

Parameters:

animal (str | int | None, default: None) – The ID of the animal for which to display the data. If an ID is provided, this method will only display the data for that animal. Otherwise, it will display the data for all animals.

Return type:

None

property sessions: tuple[str, ...]

Returns all session IDs stored inside the manifest file.

This property provides a tuple of all sessions, independent of the participating animal, that were recorded as part of the target project. Use the get_sessions() method to get the list of session tuples with filtering.

sl_shared_assets.tools.archive_session(session_path, manager_id, reset_tracker=False, processed_data_root=None)

Prepares the target session for long-term (cold) storage.

This function is primarily designed to be used on remote compute servers that use different data volumes for storage and processing. It should be called for sessions that are no longer frequently processed or accessed to move all session data to the (slow) storage volume and free up the fast processing volume for working with other data. Typically, this function is used exactly once during each session’s life cycle: when the session’s project is officially concluded.

Parameters:
  • session_path (Path) – The path to the session directory to be processed.

  • manager_id (int) – The unique identifier of the manager process that manages the runtime.

  • processed_data_root (Path | None, default: None) – The path to the root directory used to store the processed data from all Sun lab projects, if different from the ‘session_path’ root.

  • reset_tracker (bool, default: False) – Determines whether to reset the tracker file before executing the runtime. This allows recovering from deadlocked runtimes, but otherwise should not be used to ensure runtime safety.

Return type:

None

Notes

This function inverses the result of running the prepare_session() function.

sl_shared_assets.tools.calculate_directory_checksum(directory, num_processes=None, batch=False, save_checksum=True)

Calculates xxHash3-128 checksum for the input directory, which includes the data of all contained files and the directory structure information.

Checksums are used to verify the data integrity during transmission within machines (from one storage volume to another) and between machines. The function can be configured to write the generated checksum as a hexadecimal string to the ax_checksum.txt file stored at the highest level of the input directory.

Note

This function uses multiprocessing to efficiently parallelize checksum calculation for multiple files. In combination with xxHash3, this achieves a significant speedup over other common checksum options, such as MD5 and SHA256. Note that xxHash3 is not suitable for security purposes and is only used to ensure data integrity.

The returned checksum accounts for both the contents of each file and the layout of the input directory structure.

Parameters:
  • directory (Path) – The Path to the directory to be checksummed.

  • num_processes (int | None, default: None) – The number of CPU processes to use for parallelizing checksum calculation. If set to None, the function defaults to using (logical CPU count - 4).

  • batch (bool, default: False) – Determines whether the function is called as part of batch-processing multiple directories. This is used to optimize progress reporting to avoid cluttering the terminal.

  • save_checksum (bool, default: True) – Determines whether the checksum should be saved (written to) a .txt file.

Return type:

str

Returns:

The xxHash3-128 checksum for the input directory as a hexadecimal string.

sl_shared_assets.tools.delete_directory(directory_path)

Removes the input directory and all its subdirectories using parallel processing.

This function outperforms default approaches like subprocess call with rm -rf and shutil rmtree for directories with a comparably small number of large files. For example, this is the case for the mesoscope frame directories, which are deleted ~6 times faster with this method over sh.rmtree. Potentially, it may also outperform these approaches for all comparatively shallow directories.

Notes

This function is often combined with the transfer_directory function to remove the source directory after it has been transferred.

Parameters:

directory_path (Path) – The path to the directory to delete.

Return type:

None

sl_shared_assets.tools.generate_project_manifest(raw_project_directory, manager_id, processed_data_root=None)

Builds and saves the project manifest .feather file under the specified output directory.

This function evaluates the input project directory and builds the ‘manifest’ file for the project. The file includes the descriptive information about every session stored inside the input project folder and the state of the session’s data processing (which processing pipelines have been applied to each session). The file is created under the input raw project directory and uses the following name pattern: ProjectName_manifest.feather.

Notes

The manifest file is primarily used to capture and move project state information between machines, typically in the context of working with data stored on a remote compute server or cluster.

Parameters:
  • raw_project_directory (Path) – The path to the root project directory used to store raw session data.

  • manager_id (int) – The unique identifier of the manager process that manages the runtime.

  • processed_data_root (Path | None, default: None) – The path to the root directory (volume) used to store processed data for all Sun lab projects if it is different from the parent of the ‘raw_project_directory’.

Return type:

None

sl_shared_assets.tools.prepare_session(session_path, manager_id, processed_data_root, reset_tracker=False)

Prepares the target session for data processing and dataset integration.

This function is primarily designed to be used on remote compute servers that use different data volumes for storage and processing. Since storage volumes are often slow, the session data needs to be copied to the fast volume before executing processing pipelines. Typically, this function is used exactly once during each session’s life cycle: when it is first transferred to the remote compute server.

Parameters:
  • session_path (Path) – The path to the session directory to be processed.

  • manager_id (int) – The unique identifier of the manager process that manages the runtime.

  • processed_data_root (Path | None) – The path to the root directory used to store the processed data from all Sun lab projects, if different from the ‘session_path’ root.

  • reset_tracker (bool, default: False) – Determines whether to reset the tracker file before executing the runtime. This allows recovering from deadlocked runtimes, but otherwise should not be used to ensure runtime safety.

Return type:

None

Notes

This function inverses the result of running the archive_session() function.

sl_shared_assets.tools.resolve_checksum(session_path, manager_id, processed_data_root=None, reset_tracker=False, regenerate_checksum=False)

Verifies the integrity of the session’s data by generating the checksum of the raw_data directory and comparing it against the checksum stored in the ax_checksum.txt file.

Primarily, this function is used to verify data integrity after transferring it from the data acquisition system PC to the remote server for long-term storage.

Notes

Any session that does not successfully pass checksum verification (or recreation) is automatically excluded from all further automatic processing steps.

Since version 5.0.0, this function also supports recalculating and overwriting the checksum stored inside the ax_checksum.txt file. This allows this function to re-checksum session data, which is helpful if the experimenter deliberately alters the session’s data post-acquisition (for example, to comply with new data storage guidelines).

Parameters:
  • session_path (Path) – The path to the session directory to be processed.

  • manager_id (int) – The unique identifier of the manager process that manages the runtime.

  • processed_data_root (None | Path, default: None) – The path to the root directory used to store the processed data from all Sun lab projects, if different from the ‘session_path’ root.

  • reset_tracker (bool, default: False) – Determines whether to reset the tracker file before executing the runtime. This allows recovering from deadlocked runtimes, but otherwise should not be used to ensure runtime safety.

  • regenerate_checksum (bool, default: False) – Determines whether to update the checksum stored in the ax_checksum.txt file before carrying out the verification. In this case, the verification necessarily succeeds and the session’s reference checksum is changed to reflect the current state of the session data.

Return type:

None

sl_shared_assets.tools.transfer_directory(source, destination, num_threads=1, verify_integrity=False, remove_source=False)

Copies the contents of the input directory tree from source to destination while preserving the folder structure.

Notes

This method recreates the moved directory hierarchy on the destination if the hierarchy does not exist. This is done before copying the files.

The method executes a multithreading copy operation and does not by default remove the source data after the copy is complete.

If the method is configured to verify transferred data integrity, it generates xxHash-128 checksum of the data before and after the transfer and compares the two checksums to detect data corruption.

Parameters:
  • source (Path) – The path to the directory that needs to be moved.

  • destination (Path) – The path to the destination directory where to move the contents of the source directory.

  • num_threads (int, default: 1) – The number of threads to use for parallel file transfer. This number should be set depending on the type of transfer (local or remote) and is not guaranteed to provide improved transfer performance. For local transfers, setting this number above 1 will likely provide a performance boost. For remote transfers using a single TCP / IP socket (such as non-multichannel SMB protocol), the number should be set to 1. Setting this value to a number below 1 instructs the function to use all available CPU cores.

  • verify_integrity (bool, default: False) – Determines whether to perform integrity verification for the transferred files.

  • remove_source (bool, default: False) – Determines whether to remove the source directory and all of its contents after the transfer is complete and optionally verified.

Raises:

RuntimeError – If the transferred files do not pass the xxHas3-128 checksum integrity verification.

Return type:

None

Data and Configuration Assets

This package provides the classes used to store data acquired at all stages of the Sun lab data workflow and to configure various elements and pipelines making up the overall workflow. Many classes in this package are designed to be saved to disk as .yaml files and restored from the .yaml files as needed.

class sl_shared_assets.data_classes.AcquisitionSystems(*values)

Bases: StrEnum

Defines the set of data acquisition systems used in the Sun lab and supported by all data-related libraries.

MESOSCOPE_VR = 'mesoscope-vr'

The Mesoscope-VR data acquisition system. It is built around 2-Photon Random Access Mesoscope (2P-RAM) and relies on Unity-backed virtual reality task-environments to conduct experiments.

class sl_shared_assets.data_classes.DrugData(lactated_ringers_solution_volume_ml, lactated_ringers_solution_code, ketoprofen_volume_ml, ketoprofen_code, buprenorphine_volume_ml, buprenorphine_code, dexamethasone_volume_ml, dexamethasone_code)

Bases: object

Stores the information about all drugs administered to the subject before, during, and immediately after the surgical intervention.

buprenorphine_code: str

Stores the manufacturer code or internal reference code for buprenorphine. This code is used to identify the buprenorphine batch in additional datasheets and lab ordering documents.

buprenorphine_volume_ml: float

Stores the volume of buprenorphine diluted with saline administered during surgery, in ml.

dexamethasone_code: str

Stores the manufacturer code or internal reference code for dexamethasone. This code is used to identify the dexamethasone batch in additional datasheets and lab ordering documents.

dexamethasone_volume_ml: float

Stores the volume of dexamethasone diluted with saline administered during surgery, in ml.

ketoprofen_code: str

Stores the manufacturer code or internal reference code for ketoprofen. This code is used to identify the ketoprofen batch in additional datasheets and lab ordering documents.

ketoprofen_volume_ml: float

Stores the volume of ketoprofen diluted with saline administered during surgery, in ml.

lactated_ringers_solution_code: str

Stores the manufacturer code or internal reference code for Lactated Ringer’s Solution (LRS). This code is used to identify the LRS batch in additional datasheets and lab ordering documents.

lactated_ringers_solution_volume_ml: float

Stores the volume of Lactated Ringer’s Solution (LRS) administered during surgery, in ml.

class sl_shared_assets.data_classes.ExperimentState(experiment_state_code, system_state_code, state_duration_s, initial_guided_trials, recovery_failed_trial_threshold, recovery_guided_trials)

Bases: object

Encapsulates the information used to set and maintain the desired experiment and system state.

Broadly, each experiment runtime can be conceptualized as a two-state-system. The first is the experiment task, which reflects the behavior goal, the rules for achieving the goal, and the reward for achieving the goal. The second is the data acquisition system state, which is a snapshot of all hardware module states that make up the system that acquires the data and controls the task environment. Overall, the experiment state is about ‘what the animal is doing’, while the system state is about ‘what the hardware is doing’.

Note

This class is acquisition-system-agnostic. It can be used to define the ExperimentConfiguration class for any valid data acquisition system.

experiment_state_code: int

The integer code of the experiment state. Experiment states do not have a predefined meaning, Instead, each project is expected to define and follow its own experiment state code mapping. Typically, the experiment state code is used to denote major experiment stages, such as ‘baseline’, ‘task’, ‘cooldown’, etc. Note, the same experiment state code can be used by multiple sequential ExperimentState instances to change the system states while maintaining the same experiment state.

initial_guided_trials: int

The number of trials (laps) at the onset of the experiment state, for which to enable lick guidance. This determines the number of trials, counting from the onset of the experiment state, for which the animal will receive water rewards from entering the reward zone. Once the specified number of guided trial passes, the system disables guidance, requiring the animal to lick in the reward zone to get water rewards.

recovery_failed_trial_threshold: int

Specifies the number of failed (non-rewarded) trials (laps), after which the system will re-enable lick guidance for the ‘recovery_guided_trials’ number of following trials. Note, engaging the recovery guided trial system requires the specified number of failed trials to occur sequentially.

recovery_guided_trials: int

Specifies the number of trials (laps) for which the system should re-enable lick guidance, when the animal sequentially fails ‘failed_trial_threshold’ number of trials. This field works similar to the ‘initial_guided_trials’ field, but is triggered by repeated performance failures, rather than experiment state onset. After the animal runs this many guided trials, the system automatically disables guidance for the following trials.

state_duration_s: float

The time, in seconds, to maintain the experiment and system state combination specified by this instance.

system_state_code: int

One of the supported system state-codes. Note, the meaning of each system state code depends on the specific data acquisition and experiment control system used by the project. For details on available system-states, see the sl-experiment library documentation.

class sl_shared_assets.data_classes.ExperimentTrial(cue_sequence, trial_length_cm, trial_reward_size_ul, reward_zone_start_cm, reward_zone_end_cm, guidance_trigger_location_cm)

Bases: object

Encapsulates information about a single experiment trial.

All Virtual Reality tasks can be broadly conceptualized as repeating motifs (sequences) of wall cues, associated with a specific goal, for which animals receive water rewards. Since some experiments can use multiple trial types as part of the same experiment session, multiple instances of this class can be used to specify supported trial structures and trial parameters for a given experiment.

cue_sequence: list[int]

Specifies the sequence of wall cues experienced by the animal while running this trial. Note, the cues must be specified as integer-codes, where each code has the same meaning as in the ‘cue_map’ dictionary of the main ExperimentConfiguration class for that experiment.

guidance_trigger_location_cm: float

Specifies the location of the invisible boundary (wall) with which the animal must collide to elicit automated water reward during guided trials.

reward_zone_end_cm: float

Specifies the ending boundary of the trial reward zone, in centimeters.

reward_zone_start_cm: float

Specifies the starting boundary of the trial reward zone, in centimeters.

trial_length_cm: float

The length of the trial cue sequence in centimeters.

trial_reward_size_ul: float

The volume of water, in microliters, to be dispensed when the animal successfully completes the trial task.

class sl_shared_assets.data_classes.ImplantData(implant, implant_target, implant_code, implant_ap_coordinate_mm, implant_ml_coordinate_mm, implant_dv_coordinate_mm)

Bases: object

Stores the information about a single implantation procedure performed during the surgical intervention.

Multiple ImplantData instances are used at the same time if the surgery involved multiple implants.

implant: str

The descriptive name of the implant.

implant_ap_coordinate_mm: float

Stores implant’s antero-posterior stereotactic coordinate, in millimeters, relative to bregma.

implant_code: str

The manufacturer code or internal reference code for the implant. This code is used to identify the implant in additional datasheets and lab ordering documents.

implant_dv_coordinate_mm: float

Stores implant’s dorsal-ventral stereotactic coordinate, in millimeters, relative to bregma.

implant_ml_coordinate_mm: float

Stores implant’s medial-lateral stereotactic coordinate, in millimeters, relative to bregma.

implant_target: str

The name of the brain region or cranium section targeted by the implant.

class sl_shared_assets.data_classes.InjectionData(injection, injection_target, injection_volume_nl, injection_code, injection_ap_coordinate_mm, injection_ml_coordinate_mm, injection_dv_coordinate_mm)

Bases: object

Stores the information about a single injection performed during surgical intervention.

Multiple InjectionData instances are used at the same time if the surgery involved multiple injections.

injection: str

The descriptive name of the injection.

injection_ap_coordinate_mm: float

Stores injection’s antero-posterior stereotactic coordinate, in millimeters, relative to bregma.

injection_code: str

The manufacturer code or internal reference code for the injected substance. This code is used to identify the substance in additional datasheets and lab ordering documents.

injection_dv_coordinate_mm: float

Stores injection’s dorsal-ventral stereotactic coordinate, in millimeters, relative to bregma.

injection_ml_coordinate_mm: float

Stores injection’s medial-lateral stereotactic coordinate, in millimeters, relative to bregma.

injection_target: str

The name of the brain region targeted by the injection.

injection_volume_nl: float

The volume of substance, in nanoliters, delivered during the injection.

class sl_shared_assets.data_classes.LickTrainingDescriptor(experimenter, mouse_weight_g, minimum_reward_delay_s, maximum_reward_delay_s, maximum_water_volume_ml, maximum_training_time_m, maximum_unconsumed_rewards=1, dispensed_water_volume_ml=0.0, pause_dispensed_water_volume_ml=0.0, experimenter_given_water_volume_ml=0.0, preferred_session_water_volume_ml=0.0, incomplete=False, experimenter_notes='Replace this with your notes.')

Bases: YamlConfig

Stores the task and outcome information specific to lick training sessions that use the Mesoscope-VR system.

dispensed_water_volume_ml: float = 0.0

Stores the total water volume, in milliliters, dispensed during runtime. This excludes the water volume dispensed during the paused (idle) state.

experimenter: str

The ID of the experimenter running the session.

experimenter_given_water_volume_ml: float = 0.0

The additional volume of water, in milliliters, administered by the experimenter to the animal after the session.

experimenter_notes: str = 'Replace this with your notes.'

This field is not set during runtime. It is expected that each experimenter replaces this field with their notes made during runtime.

incomplete: bool = False

If this field is set to True, the session is marked as ‘incomplete’ and automatically excluded from all further Sun lab automated processing and analysis.

maximum_reward_delay_s: int

Stores the maximum delay, in seconds, that can separate the delivery of two consecutive water rewards.

maximum_training_time_m: int

Stores the maximum time, in minutes, the system is allowed to run the training for.

maximum_unconsumed_rewards: int = 1

Stores the maximum number of consecutive rewards that can be delivered without the animal consuming them. If the animal receives this many rewards without licking (consuming) them, reward delivery is paused until the animal consumes the rewards.

maximum_water_volume_ml: float

Stores the maximum volume of water the system is allowed to dispense during training.

minimum_reward_delay_s: int

Stores the minimum delay, in seconds, that can separate the delivery of two consecutive water rewards.

mouse_weight_g: float

The weight of the animal, in grams, at the beginning of the session.

pause_dispensed_water_volume_ml: float = 0.0

Stores the total water volume, in milliliters, dispensed during the paused (idle) state.

preferred_session_water_volume_ml: float = 0.0

The volume of water, in milliliters, the animal should be receiving during the session runtime if its performance matches experimenter-specified threshold.

class sl_shared_assets.data_classes.MesoscopeAdditionalFirmware(headbar_port='/dev/ttyUSB0', lickport_port='/dev/ttyUSB1', wheel_port='/dev/ttyUSB2', unity_ip='127.0.0.1', unity_port=1883)

Bases: object

Stores the configuration parameters for all firmware and hardware components not assembled in the Sun lab.

headbar_port: str = '/dev/ttyUSB0'

The USB port used by the HeadBar Zaber motor controllers (devices).

lickport_port: str = '/dev/ttyUSB1'

The USB port used by the LickPort Zaber motor controllers (devices).

unity_ip: str = '127.0.0.1'

The IP address of the MQTT broker used to communicate with the Unity game engine.

unity_port: int = 1883

The port number of the MQTT broker used to communicate with the Unity game engine.

wheel_port: str = '/dev/ttyUSB2'

The USB port used by the (running) Wheel Zaber motor controllers (devices).

class sl_shared_assets.data_classes.MesoscopeCameras(face_camera_index=0, left_camera_index=0, right_camera_index=2, face_camera_quantization_parameter=15, body_camera_quantization_parameter=15, display_face_camera_frames=True, display_body_camera_frames=True)

Bases: object

Stores the configuration parameters for the cameras used by the Mesoscope-VR system to record behavior videos.

body_camera_quantization_parameter: int = 15

SThe quantization parameter used by the left and right body cameras to encode acquired frames as video files. See ‘face_camera_quantization_parameter’ field for more information on what this parameter does.

display_body_camera_frames: bool = True

Determines whether to display the frames grabbed from the left and right body cameras during runtime.

display_face_camera_frames: bool = True

Determines whether to display the frames grabbed from the face camera during runtime.

face_camera_index: int = 0

The index of the face camera in the list of all available Harvester-managed cameras.

face_camera_quantization_parameter: int = 15

The quantization parameter used by the face camera to encode acquired frames as video files. This controls how much data is discarded when encoding each video frame, directly contributing to the encoding speed, resultant video file size and video quality.

left_camera_index: int = 0

The index of the left body camera (from animal’s perspective) in the list of all available OpenCV-managed cameras.

right_camera_index: int = 2

The index of the right body camera (from animal’s perspective) in the list of all available OpenCV-managed cameras.

class sl_shared_assets.data_classes.MesoscopeExperimentConfiguration(cue_map=<factory>, cue_offset_cm=10.0, unity_scene_name='IvanScene', experiment_states=<factory>, trial_structures=<factory>)

Bases: YamlConfig

Stores the configuration of a single experiment runtime that uses the Mesoscope_VR data acquisition system.

Primarily, this includes the sequence of experiment and system states that define the flow of the experiment runtime and the configuration of various trials supported by the experiment runtime. During runtime, the main runtime control function traverses the sequence of states stored in this class instance start-to-end in the exact order specified by the user. Together with custom Unity projects, which define the task logic (how the system responds to animal interactions with the VR system), this class allows flexibly implementing a wide range of experiments using the Mesoscope-VR system.

Each project should define one or more experiment configurations and save them as .yaml files inside the project ‘configuration’ folder. The name for each configuration file is defined by the user and is used to identify and load the experiment configuration when the ‘sl-experiment’ CLI command exposed by the sl-experiment library is executed.

Notes

This class is designed exclusively for the Mesoscope-VR system. Any other system needs to define a separate ExperimentConfiguration class to specify its experiment runtimes and additional data.

To create a new experiment configuration, use the ‘sl-create-experiment’ CLI command.

cue_map: dict[int, float]

A dictionary that maps each integer-code associated with a wall cue used in the Virtual Reality experiment environment to its length in real-world centimeters. It is used to map each VR cue to the distance the animal needs to travel to fully traverse the wall cue region from start to end.

cue_offset_cm: float = 10.0

Specifies the positive offset distance, in centimeters, by which the animal’s running track is shifted relative to VR wall cue sequence. Due to how the VR environment is revealed to the animal, most runtimes need to shift the animal slightly forward relative to the VR cue sequence origin (0), to prevent it from seeing the area before the first VR wall cue when the task starts and when the animal is teleported to the beginning of the track. This offset statically shifts the entire track (in centimeters) against the set of VR wall cues used during runtime. Storing this static offset as part of experiment configuration is crucial for correctly mapping what the animal sees during runtime to the real-world distance it travels on the running wheel.

experiment_states: dict[str, ExperimentState]

A dictionary that uses human-readable state-names as keys and ExperimentState instances as values. Each ExperimentState instance represents a phase of the experiment.

trial_structures: dict[str, ExperimentTrial]

A dictionary that uses human-readable trial structure names as keys and ExperimentTrial instances as values. Each ExperimentTrial instance specifies the Virtual Reality layout and runtime parameters associated with a single type of trials supported by the experiment runtime.

unity_scene_name: str = 'IvanScene'

The name of the Virtual Reality task (Unity Scene) used during experiment. This is used as an extra security measure to ensure that Unity game engine is running the correct scene when starting the experiment runtime.

class sl_shared_assets.data_classes.MesoscopeExperimentDescriptor(experimenter, mouse_weight_g, maximum_unconsumed_rewards=1, dispensed_water_volume_ml=0.0, pause_dispensed_water_volume_ml=0.0, experimenter_given_water_volume_ml=0.0, preferred_session_water_volume_ml=0.0, incomplete=False, experimenter_notes='Replace this with your notes.')

Bases: YamlConfig

Stores the task and outcome information specific to experiment sessions that use the Mesoscope-VR system.

dispensed_water_volume_ml: float = 0.0

Stores the total water volume, in milliliters, dispensed during runtime. This excludes the water volume dispensed during the paused (idle) state.

experimenter: str

The ID of the experimenter running the session.

experimenter_given_water_volume_ml: float = 0.0

The additional volume of water, in milliliters, administered by the experimenter to the animal after the session.

experimenter_notes: str = 'Replace this with your notes.'

This field is not set during runtime. It is expected that each experimenter will replace this field with their notes made during runtime.

incomplete: bool = False

If this field is set to True, the session is marked as ‘incomplete’ and automatically excluded from all further Sun lab automated processing and analysis.

maximum_unconsumed_rewards: int = 1

Stores the maximum number of consecutive rewards that can be delivered without the animal consuming them. If the animal receives this many rewards without licking (consuming) them, reward delivery is paused until the animal consumes the rewards.

mouse_weight_g: float

The weight of the animal, in grams, at the beginning of the session.

pause_dispensed_water_volume_ml: float = 0.0

Stores the total water volume, in milliliters, dispensed during the paused (idle) state.

preferred_session_water_volume_ml: float = 0.0

The volume of water, in milliliters, the animal should be receiving during the session runtime if its performance matches experimenter-specified threshold.

class sl_shared_assets.data_classes.MesoscopeHardwareState(cm_per_pulse=None, maximum_break_strength=None, minimum_break_strength=None, lick_threshold=None, valve_scale_coefficient=None, valve_nonlinearity_exponent=None, torque_per_adc_unit=None, screens_initially_on=None, recorded_mesoscope_ttl=None, system_state_codes=None)

Bases: YamlConfig

Stores configuration parameters (states) of the Mesoscope-VR system hardware modules used during training or experiment runtime.

This information is used to read and decode the data saved to the .npz log files during runtime as part of data processing.

Notes

This class stores ‘static’ Mesoscope-VR system configuration that does not change during experiment or training session runtime. This is in contrast to MesoscopeExperimentConfiguration class, which reflects the ‘dynamic’ state of the Mesoscope-VR system during each experiment.

This class partially overlaps with the MesoscopeSystemConfiguration class, which is also stored in the raw_data folder of each session. The primary reason to keep both classes is to ensure that the math (rounding) used during runtime matches the math (rounding) used during data processing. MesoscopeSystemConfiguration does not do any rounding or otherwise attempt to be repeatable, which is in contrast to hardware modules that read and apply those parameters. Reading values from this class guarantees the read value exactly matches the value used during runtime.

Notes

All fields in this dataclass initialize to None. During log processing, any log associated with a hardware module that provides the data stored in a field will be processed, unless that field is None. Therefore, setting any field in this dataclass to None also functions as a flag for whether to parse the log associated with the module that provides this field’s information.

This class is automatically configured by _MesoscopeVRSystem class from the sl-experiment library to facilitate proper log parsing.

cm_per_pulse: float | None = None

EncoderInterface instance property. Stores the conversion factor used to translate encoder pulses into real-world centimeters. This conversion factor is fixed for each data acquisition system and does not change between experiments.

lick_threshold: int | None = None

LickInterface instance property. Determines the threshold, in 12-bit Analog to Digital Converter (ADC) units, above which an interaction value reported by the lick sensor is considered a lick (compared to noise or non-lick touch).

maximum_break_strength: float | None = None

BreakInterface instance property. Stores the breaking torque, in Newton centimeters, applied by the break to the edge of the running wheel when it is engaged at 100% strength.

minimum_break_strength: float | None = None

BreakInterface instance property. Stores the breaking torque, in Newton centimeters, applied by the break to the edge of the running wheel when it is engaged at 0% strength (completely disengaged).

recorded_mesoscope_ttl: bool | None = None

TTLInterface instance property. A boolean flag that determines whether the processed session recorded brain activity data with the mesoscope. In that case, attempts to parse the Mesoscope frame scanning TTL pulse data to synchronize Mesoscope data to behavior data.

screens_initially_on: bool | None = None

ScreenInterface instance property. Stores the initial state of the Virtual Reality screens at the beginning of the session runtime.

system_state_codes: dict[str, int] | None = None

A _MesoscopeVRSystem instance property. A dictionary that maps integer state-codes used by the Mesoscope-VR system to communicate its states (system states) to human-readable state names.

torque_per_adc_unit: float | None = None

TorqueInterface instance property. Stores the conversion factor used to translate torque values reported by the sensor as 12-bit Analog to Digital Converter (ADC) units, into real-world Newton centimeters (N·cm) of torque that had to be applied to the edge of the running wheel to produce the observed ADC value.

valve_nonlinearity_exponent: float | None = None

ValveInterface instance property. To dispense precise water volumes during runtime, ValveInterface uses power law equation applied to valve calibration data to determine how long to keep the valve open. This stores the nonlinearity_exponent of the power law equation that describes the relationship between valve open time and dispensed water volume, derived from calibration data.

valve_scale_coefficient: float | None = None

ValveInterface instance property. To dispense precise water volumes during runtime, ValveInterface uses power law equation applied to valve calibration data to determine how long to keep the valve open. This stores the scale_coefficient of the power law equation that describes the relationship between valve open time and dispensed water volume, derived from calibration data.

class sl_shared_assets.data_classes.MesoscopeMicroControllers(actor_port='/dev/ttyACM0', sensor_port='/dev/ttyACM1', encoder_port='/dev/ttyACM2', debug=False, minimum_break_strength_g_cm=43.2047, maximum_break_strength_g_cm=1152.1246, wheel_diameter_cm=15.0333, lick_threshold_adc=400, lick_signal_threshold_adc=300, lick_delta_threshold_adc=300, lick_averaging_pool_size=1, torque_baseline_voltage_adc=2046, torque_maximum_voltage_adc=2750, torque_sensor_capacity_g_cm=720.0779, torque_report_cw=True, torque_report_ccw=True, torque_signal_threshold_adc=100, torque_delta_threshold_adc=70, torque_averaging_pool_size=1, wheel_encoder_ppr=8192, wheel_encoder_report_cw=False, wheel_encoder_report_ccw=True, wheel_encoder_delta_threshold_pulse=15, wheel_encoder_polling_delay_us=500, cm_per_unity_unit=10.0, screen_trigger_pulse_duration_ms=500, auditory_tone_duration_ms=300, valve_calibration_pulse_count=200, sensor_polling_delay_ms=1, valve_calibration_data=((15000, 1.1), (30000, 3.0), (45000, 6.25), (60000, 10.9)))

Bases: object

Stores the configuration parameters for the microcontrollers used by the Mesoscope-VR system.

actor_port: str = '/dev/ttyACM0'

The USB port used by the Actor Microcontroller.

auditory_tone_duration_ms: int = 300

The time, in milliseconds, to sound the auditory tone when water rewards are delivered to the animal.

cm_per_unity_unit: float = 10.0

The length of each Unity ‘unit’ in real-world centimeters recorded by the running wheel encoder.

debug: bool = False

Determines whether to run the managed acquisition system in the ‘debug mode’. This mode should be disabled during most runtimes. It is used during initial system calibration and testing and prints a lot of generally redundant information into the terminal.

encoder_port: str = '/dev/ttyACM2'

The USB port used by the Encoder Microcontroller.

lick_averaging_pool_size: int = 1

The number of lick sensor readouts to average together to produce the final lick sensor readout value. Note, when using a Teensy controller, this number is multiplied by the built-in analog readout averaging (default is 4).

lick_delta_threshold_adc: int = 300

The minimum absolute difference in raw analog units recorded by a 12-bit Analog-to-Digital-Converter (ADC) for the change to be reported to the PC. This is used to prevent reporting repeated non-lick or lick readouts to the PC, conserving communication bandwidth.

lick_signal_threshold_adc: int = 300

The minimum voltage, in raw analog units recorded by a 12-bit Analog-to-Digital-Converter (ADC), reported to the PC as a non-zero value. Voltages below this level are interpreted as ‘no-lick’ noise and are always pulled to 0.

lick_threshold_adc: int = 400

The threshold voltage, in raw analog units recorded by a 12-bit Analog-to-Digital-Converter (ADC), interpreted as the animal’s tongue contacting the sensor. Note, 12-bit ADC only supports values between 0 and 4095, so setting the threshold above 4095 will result in no licks being reported to Unity.

maximum_break_strength_g_cm: float = 1152.1246

The maximum torque applied by the running wheel break in gram centimeter. This is the torque the break delivers at maximum voltage (break is fully engaged).

minimum_break_strength_g_cm: float = 43.2047

The minimum torque applied by the running wheel break in gram centimeter. This is the torque the break delivers at minimum voltage (break is disabled).

screen_trigger_pulse_duration_ms: int = 500

The duration of the HIGH phase of the TTL pulse used to toggle the VR screens between ON and OFF states.

sensor_polling_delay_ms: int = 1

The delay, in milliseconds, between any two successive readouts of any sensor other than the encoder. Note, the encoder uses a dedicated parameter, as the encoder needs to be sampled at a higher frequency than all other sensors.

sensor_port: str = '/dev/ttyACM1'

The USB port used by the Sensor Microcontroller.

torque_averaging_pool_size: int = 1

The number of torque sensor readouts to average together to produce the final torque sensor readout value. Note, when using a Teensy controller, this number is multiplied by the built-in analog readout averaging (default is 4).

torque_baseline_voltage_adc: int = 2046

The voltage level, in raw analog units measured by 3.3v Analog-to-Digital-Converter (ADC) at 12-bit resolution after the AD620 amplifier, that corresponds to no (0) torque readout. Usually, for a 3.3v ADC, this would be around 2046 (the midpoint, ~1.65 V).

torque_delta_threshold_adc: int = 70

The minimum absolute difference in raw analog units recorded by a 12-bit Analog-to-Digital-Converter (ADC) for the change to be reported to the PC. This is used to prevent reporting repeated static torque readouts to the PC, conserving communication bandwidth.

torque_maximum_voltage_adc: int = 2750

The voltage level, in raw analog units measured by 3.3v Analog-to-Digital-Converter (ADC) at 12-bit resolution after the AD620 amplifier, that corresponds to the absolute maximum torque detectable by the sensor. At most, this value can be 4095 (~3.3 V).

torque_report_ccw: bool = True

Determines whether the sensor should report torque in the Counter-Clockwise (CCW) direction. This direction corresponds to the animal trying to move the wheel forward.

torque_report_cw: bool = True

Determines whether the sensor should report torque in the Clockwise (CW) direction. This direction corresponds to the animal trying to move the wheel backward.

torque_sensor_capacity_g_cm: float = 720.0779

The maximum torque detectable by the sensor, in grams centimeter (g cm).

torque_signal_threshold_adc: int = 100

The minimum voltage, in raw analog units recorded by a 12-bit Analog-to-Digital-Converter (ADC), reported to the PC as a non-zero value. Voltages below this level are interpreted as noise and are always pulled to 0.

valve_calibration_data: dict[int | float, int | float] | tuple[tuple[int | float, int | float], ...] = ((15000, 1.1), (30000, 3.0), (45000, 6.25), (60000, 10.9))

A tuple of tuples that maps water delivery solenoid valve open times, in microseconds, to the dispensed volume of water, in microliters. During training and experiment runtimes, this data is used by the ValveModule to translate the requested reward volumes into times the valve needs to be open to deliver the desired volume of water.

valve_calibration_pulse_count: int = 200

The number of times to cycle opening and closing (pulsing) the valve during each calibration runtime. This determines how many reward deliveries are used at each calibrated time-interval to produce the average dispensed water volume readout used to calibrate the valve.

wheel_diameter_cm: float = 15.0333

The diameter of the running wheel connected to the break and torque sensor, in centimeters.

wheel_encoder_delta_threshold_pulse: int = 15

The minimum difference, in encoder pulse counts, between two encoder readouts for the change to be reported to the PC. This is used to prevent reporting idle readouts and filter out sub-threshold noise.

wheel_encoder_polling_delay_us: int = 500

The delay, in microseconds, between any two successive encoder state readouts.

wheel_encoder_ppr: int = 8192

The resolution of the managed quadrature encoder, in Pulses Per Revolution (PPR). This is the number of quadrature pulses the encoder emits per full 360-degree rotation.

wheel_encoder_report_ccw: bool = True

Determines whether to report encoder rotation in the CCW (positive) direction. This corresponds to the animal moving forward on the wheel.

wheel_encoder_report_cw: bool = False

Determines whether to report encoder rotation in the CW (negative) direction. This corresponds to the animal moving backward on the wheel.

class sl_shared_assets.data_classes.MesoscopePaths(google_credentials_path=PosixPath('/media/Data/Experiments/sl-surgery-log-0f651e492767.json'), root_directory=PosixPath('/media/Data/Experiments'), server_storage_directory=PosixPath('/home/cybermouse/server/storage/sun_data'), server_working_directory=PosixPath('/home/cybermouse/server/workdir/sun_data'), nas_directory=PosixPath('/home/cybermouse/nas/rawdata'), mesoscope_directory=PosixPath('/home/cybermouse/scanimage/mesodata'), harvesters_cti_path=PosixPath('/opt/mvIMPACT_Acquire/lib/x86_64/mvGenTLProducer.cti'))

Bases: object

Stores the filesystem configuration parameters for the Mesoscope-VR data acquisition system.

google_credentials_path: Path = PosixPath('/media/Data/Experiments/sl-surgery-log-0f651e492767.json')

The path to the locally stored .JSON file that contains the service account credentials used to read and write Google Sheet data. This is used to access and work with various Google Sheet files used by Sun lab projects, eliminating the need to manually synchronize the data in various Google sheets and other data files.

harvesters_cti_path: Path = PosixPath('/opt/mvIMPACT_Acquire/lib/x86_64/mvGenTLProducer.cti')

The path to the GeniCam CTI file used to connect to Harvesters-managed cameras.

mesoscope_directory: Path = PosixPath('/home/cybermouse/scanimage/mesodata')

The absolute path to the root ScanImagePC (mesoscope-connected PC) directory where all mesoscope-acquired data is aggregated during acquisition runtime. This directory should be locally accessible (mounted) using a network sharing protocol, such as SMB.

nas_directory: Path = PosixPath('/home/cybermouse/nas/rawdata')

The absolute path to the directory where the raw data from all projects is stored on the Synology NAS. This directory should be locally accessible (mounted) using a network sharing protocol, such as SMB.

root_directory: Path = PosixPath('/media/Data/Experiments')

The absolute path to the directory where all projects are stored on the local host-machine (VRPC).

server_storage_directory: Path = PosixPath('/home/cybermouse/server/storage/sun_data')

The absolute path to the directory where the raw data from all projects is stored on the BioHPC server. This directory should be locally accessible (mounted) using a network sharing protocol, such as SMB.

server_working_directory: Path = PosixPath('/home/cybermouse/server/workdir/sun_data')

The absolute path to the directory where the processed data from all projects is stored on the BioHPC server. This directory should be locally accessible (mounted) using a network sharing protocol, such as SMB.

class sl_shared_assets.data_classes.MesoscopePositions(mesoscope_x=0.0, mesoscope_y=0.0, mesoscope_roll=0.0, mesoscope_z=0.0, mesoscope_fast_z=0.0, mesoscope_tip=0.0, mesoscope_tilt=0.0, laser_power_mw=0.0, red_dot_alignment_z=0.0)

Bases: YamlConfig

Stores the positions of real and virtual Mesoscope objective axes reused between experiment sessions that use the Mesoscope-VR system.

This class is designed to help the experimenter move the Mesoscope to the same imaging plane across imaging sessions. It stores both the physical (real) position of the objective along the motorized X, Y, Z, and Roll axes, and the virtual (ScanImage software) tip, tilt, and fastZ (virtual zoom) axes.

laser_power_mw: float = 0.0

The laser excitation power at the sample, in milliwatts.

mesoscope_fast_z: float = 0.0

The ScanImage FastZ (virtual Z-axis) position, in micrometers.

mesoscope_roll: float = 0.0

The Mesoscope objective Roll-axis position, in degrees.

mesoscope_tilt: float = 0.0

The ScanImage Tip position, in degrees.

mesoscope_tip: float = 0.0

The ScanImage Tilt position, in degrees..

mesoscope_x: float = 0.0

The Mesoscope objective X-axis position, in micrometers.

mesoscope_y: float = 0.0

The Mesoscope objective Y-axis position, in micrometers.

mesoscope_z: float = 0.0

The Mesoscope objective Z-axis position, in micrometers.

red_dot_alignment_z: float = 0.0

The Mesoscope objective Z-axis position, in micrometers, used for red-dot alignment procedure.

class sl_shared_assets.data_classes.MesoscopeSystemConfiguration(name='mesoscope-vr', paths=<factory>, sheets=<factory>, cameras=<factory>, microcontrollers=<factory>, additional_firmware=<factory>)

Bases: YamlConfig

Stores the hardware and filesystem configuration parameters for the Mesoscope-VR data acquisition system used in the Sun lab.

This class is specifically designed to encapsulate the configuration parameters for the Mesoscope-VR system. It expects the system to be configured according to the specifications available from the sl_experiment repository (https://github.com/Sun-Lab-NBB/sl-experiment) and should be used exclusively by the VRPC machine (main Mesoscope-VR PC).

Notes

Each SystemConfiguration class is uniquely tied to a specific hardware configuration used in the lab. This class will only work with the Mesoscope-VR system. Any other data acquisition and runtime management system in the lab should define its own SystemConfiguration class to specify its own hardware and filesystem configuration parameters.

additional_firmware: MesoscopeAdditionalFirmware

Stores the configuration parameters for all firmware and hardware components not assembled in the Sun lab.

cameras: MesoscopeCameras

Stores the configuration parameters for the cameras used by the Mesoscope-VR system to record behavior videos.

microcontrollers: MesoscopeMicroControllers

Stores the configuration parameters for the microcontrollers used by the Mesoscope-VR system.

name: str = 'mesoscope-vr'

Stores the descriptive name of the data acquisition system.

paths: MesoscopePaths

Stores the filesystem configuration parameters for the Mesoscope-VR data acquisition system.

save(path)

Saves class instance data to disk as a ‘mesoscope_system_configuration.yaml’ file.

This method converts certain class variables to yaml-safe types (for example, Path objects -> strings) and saves class data to disk as a .yaml file. The method is intended to be used solely by the set_system_configuration_file() function and should not be called from any other context.

Parameters:

path (Path) – The path to the .yaml file to save the data to.

Return type:

None

sheets: MesoscopeSheets

Stores the IDs of Google Sheets used by the Mesoscope-VR data acquisition system.

class sl_shared_assets.data_classes.ProcedureData(surgery_start_us, surgery_end_us, surgeon, protocol, surgery_notes, post_op_notes, surgery_quality=0)

Bases: object

Stores the general information about the surgical intervention.

post_op_notes: str

Stores surgeon’s notes taken during the post-surgery recovery period.

protocol: str

Stores the experiment protocol number (ID) used during the surgery.

surgeon: str

Stores the name or ID of the surgeon. If the intervention was carried out by multiple surgeons, all participating surgeon names and IDs are stored as part of the same string.

surgery_end_us: int

Stores the date and time when the surgery has ended as microseconds elapsed since UTC epoch onset.

surgery_notes: str

Stores surgeon’s notes taken during the surgery.

surgery_quality: int = 0

Stores the quality of the surgical intervention as a numeric level. 0 indicates unusable (bad) result, 1 indicates usable result that is not good enough to be included in a publication, 2 indicates publication-grade result, 3 indicates high-tier publication grade result.

surgery_start_us: int

Stores the date and time when the surgery has started as microseconds elapsed since UTC epoch onset.

class sl_shared_assets.data_classes.ProcessedData(processed_data_path=PosixPath('.'), camera_data_path=PosixPath('.'), mesoscope_data_path=PosixPath('.'), behavior_data_path=PosixPath('.'), root_path=PosixPath('.'))

Bases: object

Stores the paths to the directories and files that make up the ‘processed_data’ session-specific directory.

The processed_data directory stores the data generated by various processing pipelines from the raw data (contents of the raw_data directory). Processed data represents an intermediate step between raw data and the dataset used in the data analysis, but is not itself designed to be analyzed.

behavior_data_path: Path = PosixPath('.')

Stores the path to the directory that contains the non-video and non-brain-activity data extracted from .npz log files by the sl-behavior log processing pipeline.

camera_data_path: Path = PosixPath('.')

Stores the path to the directory that contains video tracking data generated by the Sun lab DeepLabCut-based video processing pipeline(s).

make_directories()

Ensures that all major subdirectories and the root directory exist, creating any missing directories.

This method is called each time the (wrapper) SessionData class is instantiated and allowed to generate missing data directories.

Return type:

None

mesoscope_data_path: Path = PosixPath('.')

Stores path to the directory that contains processed brain activity (cell) data generated by sl-suite2p processing pipelines (single-day and multi-day). This directory is only used by sessions acquired with the Mesoscope-VR system.

processed_data_path: Path = PosixPath('.')

Stores the path to the root processed_data directory of the session. This directory stores the processed session data, generated from raw_data directory contents by various data processing pipelines.

resolve_paths(root_directory_path)

Resolves all paths managed by the class instance based on the input root directory path.

This method is called each time the (wrapper) SessionData class is instantiated to regenerate the managed path hierarchy on any machine that instantiates the class.

Parameters:

root_directory_path (Path) – The path to the top-level directory of the session. Typically, this path is assembled using the following hierarchy: root/project/animal/session_id

Return type:

None

root_path: Path = PosixPath('.')

Stores the path to the root directory of the volume that stores processed data from all Sun lab projects. Primarily, this is necessary for pipelines working with the data on the remote compute server to efficiently move it between storage and working (processing) volumes.

class sl_shared_assets.data_classes.RawData(raw_data_path=PosixPath('.'), camera_data_path=PosixPath('.'), mesoscope_data_path=PosixPath('.'), behavior_data_path=PosixPath('.'), zaber_positions_path=PosixPath('.'), session_descriptor_path=PosixPath('.'), hardware_state_path=PosixPath('.'), surgery_metadata_path=PosixPath('.'), session_data_path=PosixPath('.'), experiment_configuration_path=PosixPath('.'), mesoscope_positions_path=PosixPath('.'), window_screenshot_path=PosixPath('.'), system_configuration_path=PosixPath('.'), checksum_path=PosixPath('.'), telomere_path=PosixPath('.'), ubiquitin_path=PosixPath('.'), nk_path=PosixPath('.'), root_path=PosixPath('.'))

Bases: object

Stores the paths to the directories and files that make up the ‘raw_data’ session-specific directory.

The raw_data directory stores the data acquired during the session data acquisition runtime, before and after preprocessing. Since preprocessing does not irreversibly alter the data, any data in that folder is considered ‘raw,’ event if preprocessing losslessly re-compresses the data for efficient transfer.

Notes

Sun lab data management strategy primarily relies on keeping multiple redundant copies of the raw_data for each acquired session. Typically, one copy is stored on the lab’s processing server and the other is stored on the NAS.

behavior_data_path: Path = PosixPath('.')

Stores the path to the directory that contains all non-video behavior data acquired during the session. Primarily, this includes the .npz log files that store serialized data acquired by all hardware components of the data acquisition system other than cameras and brain activity data acquisition devices (such as the Mesoscope).

camera_data_path: Path = PosixPath('.')

Stores the path to the directory that contains all camera data acquired during the session. Primarily, this includes .mp4 video files from each recorded camera.

checksum_path: Path = PosixPath('.')

Stores the path to the ax_checksum.txt file. This file is generated as part of packaging the data for transmission and stores the xxHash-128 checksum of the data. It is used to verify that the transmission did not damage or otherwise alter the data.

experiment_configuration_path: Path = PosixPath('.')

Stores the path to the experiment_configuration.yaml file. This file contains the snapshot of the experiment runtime configuration used by the session. This file is only created for experiment sessions.

hardware_state_path: Path = PosixPath('.')

Stores the path to the hardware_state.yaml file. This file contains the partial snapshot of the calibration parameters used by the data acquisition system modules during the session. Primarily, it is used during data processing to interpret the raw data stored inside .npz log files.

make_directories()

Ensures that all major subdirectories and the root directory exist, creating any missing directories.

This method is called each time the (wrapper) SessionData class is instantiated and allowed to generate missing data directories.

Return type:

None

mesoscope_data_path: Path = PosixPath('.')

Stores the path to the directory that contains all Mesoscope data acquired during the session. Primarily, this includes the mesoscope-acquired .tiff files (brain activity data) and the MotionEstimator.me file (motion estimation data). This directory is created for all sessions, but is only used (filled) by the sessions that use the Mesoscope-VR system to acquire brain activity data.

mesoscope_positions_path: Path = PosixPath('.')

Stores the path to the mesoscope_positions.yaml file. This file contains the snapshot of the positions used by the Mesoscope at the end of the session. This includes both the physical position of the mesoscope objective and the ‘virtual’ tip, tilt, and fastZ positions set via ScanImage software. This file is only created for sessions that use the Mesoscope-VR system to acquire brain activity data.

nk_path: Path = PosixPath('.')

Stores the path to the nk.bin file. This file is used by the sl-experiment library to mark sessions undergoing runtime initialization. Since runtime initialization is a complex process that may encounter a runtime error, the marker is used to discover sessions that failed to initialize. Since uninitialized sessions by definition do not contain any valuable data, they are marked for immediate deletion from all managed destinations.

raw_data_path: Path = PosixPath('.')

Stores the path to the root raw_data directory of the session. This directory stores all raw data during acquisition and preprocessing. Note, preprocessing does not alter raw data, so at any point in time all data inside the folder is considered ‘raw’.

resolve_paths(root_directory_path)

Resolves all paths managed by the class instance based on the input root directory path.

This method is called each time the (wrapper) SessionData class is instantiated to regenerate the managed path hierarchy on any machine that instantiates the class.

Parameters:

root_directory_path (Path) – The path to the top-level directory of the session. Typically, this path is assembled using the following hierarchy: root/project/animal/session_id

Return type:

None

root_path: Path = PosixPath('.')

Stores the path to the root directory of the volume that stores raw data from all Sun lab projects. Primarily, this is necessary for pipelines working with the data on the remote compute server to efficiently move it between storage and working (processing) volumes.

session_data_path: Path = PosixPath('.')

Stores the path to the session_data.yaml file. This path is used by the SessionData instance to save itself to disk as a .yaml file. In turn, the cached data is reused to reinstate the same data hierarchy across all supported destinations, enabling various libraries to interface with the session data.

session_descriptor_path: Path = PosixPath('.')

Stores the path to the session_descriptor.yaml file. This file is filled jointly by the data acquisition system and the experimenter. It contains session-specific information, such as the specific task parameters and the notes made by the experimenter during runtime. Each supported session type uses a unique SessionDescriptor class to define the format and content of the session_descriptor.yaml file.

surgery_metadata_path: Path = PosixPath('.')

Stores the path to the surgery_metadata.yaml file. This file contains the most actual information about the surgical intervention(s) performed on the animal prior to the session.

system_configuration_path: Path = PosixPath('.')

Stores the path to the system_configuration.yaml file. This file contains the exact snapshot of the data acquisition system configuration parameters used to acquire session data.

telomere_path: Path = PosixPath('.')

Stores the path to the telomere.bin file. This file is statically generated at the end of the session’s data acquisition based on experimenter feedback to mark sessions that ran in-full with no issues. Sessions without a telomere.bin file are considered ‘incomplete’ and are excluded from all automated processing, as they may contain corrupted, incomplete, or otherwise unusable data.

ubiquitin_path: Path = PosixPath('.')

Stores the path to the ubiquitin.bin file. This file is primarily used by the sl-experiment library to mark local session data directories for deletion (purging). Typically, it is created once the data is safely moved to the long-term storage destinations (NAS and Server) and the integrity of the moved data is verified on at least one destination. During ‘sl-purge’ sl-experiment runtimes, the library discovers and removes all session data marked with ‘ubiquitin.bin’ files from the machine that runs the command.

window_screenshot_path: Path = PosixPath('.')

Stores the path to the .png screenshot of the ScanImagePC screen. As a minimum, the screenshot should contain the image of the imaging plane and the red-dot alignment window. This is used to generate a visual snapshot of the cranial window alignment and cell appearance for each experiment session. This file is only created for sessions that use the Mesoscope-VR system to acquire brain activity data.

zaber_positions_path: Path = PosixPath('.')

Stores the path to the zaber_positions.yaml file. This file contains the snapshot of all Zaber motor positions at the end of the session. Zaber motors are used to position the LickPort, HeadBar, and Wheel Mesoscope-VR modules to support proper brain activity recording and behavior during the session. This file is only created for sessions that use the Mesoscope-VR system.

class sl_shared_assets.data_classes.RunTrainingDescriptor(experimenter, mouse_weight_g, final_run_speed_threshold_cm_s, final_run_duration_threshold_s, initial_run_speed_threshold_cm_s, initial_run_duration_threshold_s, increase_threshold_ml, run_speed_increase_step_cm_s, run_duration_increase_step_s, maximum_water_volume_ml, maximum_training_time_m, maximum_unconsumed_rewards=1, maximum_idle_time_s=0.0, dispensed_water_volume_ml=0.0, pause_dispensed_water_volume_ml=0.0, experimenter_given_water_volume_ml=0.0, preferred_session_water_volume_ml=0.0, incomplete=False, experimenter_notes='Replace this with your notes.')

Bases: YamlConfig

Stores the task and outcome information specific to run training sessions that use the Mesoscope-VR system.

dispensed_water_volume_ml: float = 0.0

Stores the total water volume, in milliliters, dispensed during runtime. This excludes the water volume dispensed during the paused (idle) state.

experimenter: str

The ID of the experimenter running the session.

experimenter_given_water_volume_ml: float = 0.0

The additional volume of water, in milliliters, administered by the experimenter to the animal after the session.

experimenter_notes: str = 'Replace this with your notes.'

This field is not set during runtime. It is expected that each experimenter will replace this field with their notes made during runtime.

final_run_duration_threshold_s: float

Stores the final running duration threshold, in seconds, that was active at the end of training.

final_run_speed_threshold_cm_s: float

Stores the final running speed threshold, in centimeters per second, that was active at the end of training.

incomplete: bool = False

If this field is set to True, the session is marked as ‘incomplete’ and automatically excluded from all further Sun lab automated processing and analysis.

increase_threshold_ml: float

Stores the volume of water delivered to the animal, in milliliters, that triggers the increase in the running speed and duration thresholds.

initial_run_duration_threshold_s: float

Stores the initial running duration threshold, in seconds, used during training.

initial_run_speed_threshold_cm_s: float

Stores the initial running speed threshold, in centimeters per second, used during training.

maximum_idle_time_s: float = 0.0

Stores the maximum time, in seconds, the animal can dip below the running speed threshold to still receive the reward. This allows animals that ‘run’ by taking a series of large steps, briefly dipping below speed threshold at the end of each step, to still get water rewards.

maximum_training_time_m: int

Stores the maximum time, in minutes, the system is allowed to run the training for.

maximum_unconsumed_rewards: int = 1

Stores the maximum number of consecutive rewards that can be delivered without the animal consuming them. If the animal receives this many rewards without licking (consuming) them, reward delivery is paused until the animal consumes the rewards.

maximum_water_volume_ml: float

Stores the maximum volume of water the system is allowed to dispense during training.

mouse_weight_g: float

The weight of the animal, in grams, at the beginning of the session.

pause_dispensed_water_volume_ml: float = 0.0

Stores the total water volume, in milliliters, dispensed during the paused (idle) state.

preferred_session_water_volume_ml: float = 0.0

The volume of water, in milliliters, the animal should be receiving during the session runtime if its performance matches experimenter-specified threshold.

run_duration_increase_step_s: float

Stores the value, in seconds, used by the system to increment the duration threshold each time the animal receives ‘increase_threshold’ volume of water.

run_speed_increase_step_cm_s: float

Stores the value, in centimeters per second, used by the system to increment the running speed threshold each time the animal receives ‘increase_threshold’ volume of water.

class sl_shared_assets.data_classes.SessionData(project_name, animal_id, session_name, session_type, acquisition_system=AcquisitionSystems.MESOSCOPE_VR, experiment_name=None, python_version='3.11.13', sl_experiment_version='3.0.0', raw_data=<factory>, processed_data=<factory>, source_data=<factory>, archived_data=<factory>, tracking_data=<factory>)

Bases: YamlConfig

Stores and manages the data layout of a single Sun lab data acquisition session.

The primary purpose of this class is to maintain the session data structure across all supported destinations and to provide a unified data access interface shared by all Sun lab libraries. The class can be used to either generate a new session or load the layout of an already existing session. When the class is used to create a new session, it generates the new session’s name using the current UTC timestamp, accurate to microseconds. This ensures that each session ‘name’ is unique and preserves the overall session order.

Notes

This class is specifically designed for working with the data from a single session, performed by a single animal under the specific experiment. The class is used to manage both raw and processed data. It follows the data through acquisition, preprocessing, and processing stages of the Sun lab data workflow. This class serves as an entry point for all interactions with the managed session’s data.

acquisition_system: str | AcquisitionSystems = 'mesoscope-vr'

Stores the name of the data acquisition system that acquired the data. Has to be set to one of the supported acquisition systems, defined in the AcquisitionSystems enumeration exposed by the sl-shared-assets library.

animal_id: str

Stores the unique identifier of the animal that participates in the session.

archived_data: ProcessedData

Similar to the ‘source_data’ field, stores the absolute path to the same data as the ‘processed_data’ field, but with all paths resolved relative to the ‘raw_data’ root. This path is used as part of the session data archiving process to collect all session data (raw and processed) on the slow ‘storage’ volume of the remote compute server.

classmethod create(project_name, animal_id, session_type, experiment_name=None, session_name=None, python_version='3.11.13', sl_experiment_version='2.0.0')

Creates a new SessionData object and generates the new session’s data structure on the local PC.

This method is intended to be called exclusively by the sl-experiment library to create new training or experiment sessions and generate the session data directory tree.

Notes

To load an already existing session data structure, use the load() method instead.

This method automatically dumps the data of the created SessionData instance into the session_data.yaml file inside the root ‘raw_data’ directory of the created hierarchy. It also finds and dumps other configuration files, such as experiment_configuration.yaml and system_configuration.yaml into the same ‘raw_data’ directory. If the session’s runtime is interrupted unexpectedly, the acquired data can still be processed using these pre-saved class instances.

Parameters:
  • project_name (str) – The name of the project for which the session is carried out.

  • animal_id (str) – The ID code of the animal participating in the session.

  • session_type (SessionTypes | str) – The type of the session. Has to be one of the supported session types exposed by the SessionTypes enumeration.

  • experiment_name (str | None, default: None) – The name of the experiment executed during the session. This optional argument is only used for experiment sessions. Note! The name passed to this argument has to match the name of the experiment configuration .yaml file.

  • session_name (str | None, default: None) – An optional session name override. Generally, this argument should not be provided for most sessions. When provided, the method uses this name instead of generating a new timestamp-based name. This is only used during the ‘ascension’ runtime to convert old data structures to the modern lab standards.

  • python_version (str, default: '3.11.13') – The string that specifies the Python version used to collect session data. Has to be specified using the major.minor.patch version format.

  • sl_experiment_version (str, default: '2.0.0') – The string that specifies the version of the sl-experiment library used to collect session data. Has to be specified using the major.minor.patch version format.

Return type:

SessionData

Returns:

An initialized SessionData instance that stores the layout of the newly created session’s data.

experiment_name: str | None = None

Stores the name of the experiment performed during the session. If the session_type field indicates that the session is an experiment, this field communicates the specific experiment configuration used by the session. During runtime, this name is used to load the specific experiment configuration data stored in a .yaml file with the same name. If the session is not an experiment session, this field should be left as Null (None).

classmethod load(session_path, processed_data_root=None)

Loads the SessionData instance from the target session’s session_data.yaml file.

This method is used to load the data layout information of an already existing session. Primarily, this is used when processing session data. Due to how SessionData is stored and used in the lab, this method always loads the data layout from the session_data.yaml file stored inside the ‘raw_data’ session subfolder. Currently, all interactions with Sun lab data require access to the ‘raw_data’ folder of each session.

Notes

To create a new session, use the create() method instead.

Parameters:
  • session_path (Path) – The path to the root directory of an existing session, e.g.: root/project/animal/session.

  • processed_data_root (Path | None, default: None) – If processed data is kept on a drive different from the one that stores raw data, provide the path to the root project directory (directory that stores all Sun lab projects) on that drive. The method will automatically resolve the project/animal/session/processed_data hierarchy using this root path. If raw and processed data are kept on the same drive, keep this set to None.

Return type:

SessionData

Returns:

An initialized SessionData instance for the session whose data is stored at the provided path.

Raises:

FileNotFoundError – If multiple or no ‘session_data.yaml’ file instances are found under the input session path directory.

processed_data: ProcessedData

Stores absolute paths to all directories and files that jointly make the session’s processed data hierarchy. Processed data encompasses all data generated from the raw data as part of data processing.

project_name: str

Stores the name of the project for which the session was acquired.

python_version: str = '3.11.13'

Stores the Python version that was used to acquire session data.

raw_data: RawData

Stores absolute paths to all directories and files that jointly make the session’s raw data hierarchy. This hierarchy is initially resolved by the acquisition system that acquires the session and used to store all data acquired during the session runtime.

runtime_initialized()

Ensures that the ‘nk.bin’ marker file is removed from the session’s raw_data folder.

The ‘nk.bin’ marker is generated as part of the SessionData initialization (creation) process to mark sessions that did not fully initialize during runtime. This service method is designed to be called by the sl-experiment library classes to remove the ‘nk.bin’ marker when it is safe to do so. It should not be called by end-users.

Return type:

None

save()

Saves the instance data to the ‘raw_data’ directory of the managed session as a ‘session_data.yaml’ file.

This is used to save the data stored in the instance to disk so that it can be reused during further stages of data processing. The method is intended to only be used by the SessionData instance itself during its create() method runtime.

Return type:

None

session_name: str

Stores the name (timestamp-based ID) of the session.

session_type: str | SessionTypes

Stores the type of the session. Has to be set to one of the supported session types, defined in the SessionTypes enumeration exposed by the sl-shared-assets library.

sl_experiment_version: str = '3.0.0'

Stores the version of the sl-experiment library that was used to acquire the session data.

source_data: RawData

Stores absolute paths to the same data as the ‘raw_data’ field, but with all paths resolved relative to the ‘processed_data’ root. On systems that use the same root for processed and raw data, the source and raw directories are identical. On systems that use different root directories for processed and raw data, the source and raw directories are different. This is used to optimize data processing on the remote compute server by temporarily copying all session data to the fast processed data volume.

tracking_data: TrackingData

Stores absolute paths to all directories and files that jointly make the session’s tracking data hierarchy. This hierarchy is used during all stages of data processing to track the processing progress and ensure only a single manager process can modify the session’s data at any given time, ensuring access safety.

class sl_shared_assets.data_classes.SessionLock(file_path, _manager_id=-1)

Bases: YamlConfig

Provides thread-safe session locking to ensure exclusive access during data processing.

This class manages a lock file that tracks which manager process currently has exclusive access to a session’s data. It prevents race conditions when multiple manager processes attempt to modify session data simultaneously.

The lock is identified by a manager process ID, allowing distributed processing across multiple jobs while maintaining data integrity.

acquire(manager_id)

Acquires the session lock for exclusive access.

Parameters:

manager_id (int) – The unique identifier of the manager process requesting the lock.

Raises:
  • TimeoutError – If the .lock file cannot be acquired for a long period of time due to being held by another process.

  • RuntimeError – If the lock is held by another process and forcing lock acquisition is disabled.

Return type:

None

file_path: Path

Stores the absolute path to the .yaml file that stores the lock state on disk.

force_release()

Forcibly releases the lock regardless of ownership.

This method should only be used for emergency recovery of deadlocked sessions. It can be called by any process to unlock the session whose lock is managed by this instance.

Raises:

TimeoutError – If the .lock file cannot be acquired for a long period of time due to being held by another process.

Return type:

None

property is_locked: bool

Returns True if the session is currently locked by any process, False otherwise.

Raises:

TimeoutError – If the .lock file cannot be acquired for a long period of time due to being held by another process.

property owner: int | None

Returns the unique identifier of the manager process that holds the lock if the session is locked or None if the session is unlocked.

Raises:

TimeoutError – If the .lock file cannot be acquired for a long period of time due to being held by another process.

release(manager_id)

Releases the session lock.

Parameters:

manager_id (int) – The unique identifier of the manager process releasing the lock.

Raises:
  • TimeoutError – If the .lock file cannot be acquired for a long period of time due to being held by another process.

  • RuntimeError – If the lock is held by another process.

Return type:

None

class sl_shared_assets.data_classes.SessionTypes(*values)

Bases: StrEnum

Defines the set of data acquisition session types supported by various data acquisition systems used in the Sun lab.

A data acquisition session broadly encompasses a recording session carried out to either: acquire experiment data, train the animal for the upcoming experiments, or to assess the quality of surgical or other pre-experiment intervention.

Notes

This enumeration does not differentiate between different acquisition systems. Different acquisition systems support different session types and may not be suited for acquiring some of the session types listed in this enumeration.

LICK_TRAINING = 'lick training'

Mesoscope-VR session designed to teach animals to use the water delivery port while being head-fixed.

MESOSCOPE_EXPERIMENT = 'mesoscope experiment'

Mesoscope-VR experiment session. The session uses Unity game engine to run experiments in virtual reality task environments and collects brain activity data using Mesoscope.

RUN_TRAINING = 'run training'

Mesoscope-VR session designed to teach animals how to run on the treadmill while being head-fixed.

WINDOW_CHECKING = 'window checking'

A special Mesoscope-VR session designed to evaluate the suitability of the given animal to be included into the experiment dataset. Specifically, the session involves using the Mesoscope to check the quality of the cell activity data.

class sl_shared_assets.data_classes.SubjectData(id, ear_punch, sex, genotype, date_of_birth_us, weight_g, cage, location_housed, status)

Bases: object

Stores the ID information of the surgical intervention’s subject (animal).

cage: int

Stores the number of the cage used to house the subject after surgery.

date_of_birth_us: int

Stores the date of birth of the subject as the number of microseconds elapsed since UTC epoch onset.

ear_punch: str

Stores the ear tag location of the subject.

genotype: str

Stores the genotype of the subject.

id: int

Stores the unique ID (name) of the subject. Assumes all animals are given a numeric ID, rather than a string name.

location_housed: str

Stores the location used to house the subject after the surgery.

sex: str

Stores the gender of the subject.

status: str

Stores the current status of the subject (alive / deceased).

weight_g: float

Stores the weight of the subject pre-surgery, in grams.

class sl_shared_assets.data_classes.SurgeryData(subject, procedure, drugs, implants, injections)

Bases: YamlConfig

Stores the data about a single animal surgical intervention.

This class aggregates other dataclass instances that store specific data about the surgical procedure. Primarily, it is used to save the data as a .yaml file to every session’s ‘raw_data’ directory of each animal used in every lab project. This way, the surgery data is always stored alongside the behavior and brain activity data collected during the session.

drugs: DrugData

Stores the data about the substances subcutaneously injected into the subject before, during and immediately after the surgical intervention.

implants: list[ImplantData]

Stores the data for all cranial and transcranial implants introduced to the subject during the surgical intervention.

injections: list[InjectionData]

Stores the data about all substances infused into the brain of the subject during the surgical intervention.

procedure: ProcedureData

Stores general data about the surgical intervention.

subject: SubjectData

Stores the ID information about the subject (mouse).

class sl_shared_assets.data_classes.TrackingData(tracking_data_path=PosixPath('.'), session_lock_path=PosixPath('.'))

Bases: object

Stores the paths to the directories and files that make up the ‘tracking_data’ session-specific directory.

The ‘tracking_data’ directory was added in version 5.0.0 to store the ProcessingTracker instance data and .lock files for pipelines and tasks used to work with session data after acquisition.

make_directories()

Ensures that all major subdirectories and the root directory exist, creating any missing directories.

This method is called each time the (wrapper) SessionData class is instantiated and allowed to generate missing data directories.

Return type:

None

resolve_paths(root_directory_path)

Resolves all paths managed by the class instance based on the input root directory path.

This method is called each time the (wrapper) SessionData class is instantiated to regenerate the managed path hierarchy on any machine that instantiates the class.

Parameters:

root_directory_path (Path) – The path to the top-level directory of the session. Typically, this path is assembled using the following hierarchy: root/project/animal/session_id

Return type:

None

session_lock_path: Path = PosixPath('.')

Stores the path to the session_lock.yaml file for the session. This file is used to ensure that only a single manager process has exclusive access to the session’s data on the remote compute server. This ensures that multiple data processing pipelines can safely run for the same session without compromising session data integrity. This file is intended to be used through the SessionLock class.

tracking_data_path: Path = PosixPath('.')

Stores the path to the root tracking_data directory of the session. This directory stores the .yaml ProcessingTracker files and the .lock FileLock files that jointly ensure that session’s data is accessed in a process- and thread-safe way while being processed by multiple different processes and pipelines.

class sl_shared_assets.data_classes.WindowCheckingDescriptor(experimenter, surgery_quality=0, incomplete=True, experimenter_notes='Replace this with your notes.')

Bases: YamlConfig

Stores the outcome information specific to window checking sessions that use the Mesoscope-VR system.

Notes

Window Checking sessions are different from all other sessions. Unlike other sessions, their purpose is not to generate data but rather to assess the suitability of the particular animal to be included in training and experiment cohorts. These sessions are automatically excluded from any automated data processing and analysis.

experimenter: str

The ID of the experimenter running the session.

experimenter_notes: str = 'Replace this with your notes.'

The notes on the quality of the cranial window and animal’s suitability for the target project.

incomplete: bool = True

Window checking sessions are always considered ‘incomplete’, as they do not contain the full range of information collected as part of a ‘standard’ behavior training or experiment session.

surgery_quality: int = 0

The quality of the cranial window and surgical intervention on a scale from 0 (non-usable) to 3 (high-tier publication grade) inclusive.

class sl_shared_assets.data_classes.ZaberPositions(headbar_z=0, headbar_pitch=0, headbar_roll=0, lickport_z=0, lickport_y=0, lickport_x=0, wheel_x=0)

Bases: YamlConfig

Stores Zaber motor positions reused between experiment sessions that use the Mesoscope-VR system.

The class is specifically designed to store, save, and load the positions of the LickPort, HeadBar, and Wheel motors (axes). It is used to both store Zaber motor positions for each session for future analysis and to restore the Zaber motors to the same positions across consecutive runtimes for the same project and animal combination.

Notes

By default, the class initializes all fields to 0, which is the position of the home sensor for each motor. The class assumes that the motor groups are assembled and arranged in a way that ensures all motors can safely move to the home sensor positions from any runtime configuration.

headbar_pitch: int = 0

The absolute position, in native motor units, of the HeadBar pitch-axis motor.

headbar_roll: int = 0

The absolute position, in native motor units, of the HeadBar roll-axis motor.

headbar_z: int = 0

The absolute position, in native motor units, of the HeadBar z-axis motor.

lickport_x: int = 0

The absolute position, in native motor units, of the LickPort x-axis motor.

lickport_y: int = 0

The absolute position, in native motor units, of the LickPort y-axis motor.

lickport_z: int = 0

The absolute position, in native motor units, of the LickPort z-axis motor.

wheel_x: int = 0

The absolute position, in native motor units, of the running wheel platform x-axis motor.

sl_shared_assets.data_classes.create_system_configuration_file(system)

Creates the .yaml configuration file for the requested Sun lab data acquisition system and configures the local machine (PC) to use this file for all future acquisition-system-related calls.

This function is used to initially configure or override the existing configuration of any data acquisition system used in the lab.

Notes

This function creates the configuration file inside the shared Sun lab working directory on the local machine. It assumes that the user has configured (created) the directory before calling this function.

A data acquisition system can consist of multiple machines (PCs). The configuration file is typically only present on the ‘main’ machine that manages all runtimes.

Parameters:

system (AcquisitionSystems | str) – The name (type) of the data acquisition system for which to create the configuration file. Must be one of the following supported options: mesoscope-vr.

Raises:

ValueError – If the input acquisition system name (type) is not recognized.

Return type:

None

sl_shared_assets.data_classes.get_credentials_file_path(service=False)

Resolves and returns the path to the requested .yaml file that stores access credentials for the Sun lab remote compute server.

Depending on the configuration, either returns the path to the ‘user_credentials.yaml’ file (default) or the ‘service_credentials.yaml’ file.

Notes

Assumes that the local working directory has been configured before calling this function.

Parameters:

service (bool, default: False) – Determines whether this function must evaluate and return the path to the ‘service_credentials.yaml’ file (if true) or the ‘user_credentials.yaml’ file (if false).

Raises:
  • FileNotFoundError – If either the ‘service_credentials.yaml’ or the ‘user_credentials.yaml’ files do not exist in the local Sun lab working directory.

  • ValueError – If both credential files exist, but the requested credentials file is not configured.

Return type:

Path

sl_shared_assets.data_classes.get_system_configuration_data()

Resolves the path to the local data acquisition system configuration file and loads the configuration data as a SystemConfiguration instance.

This service function is used by all Sun lab data acquisition runtimes to load the system configuration data from the locally stored configuration file. It supports resolving and returning the data for all data acquisition systems currently used in the lab.

Return type:

MesoscopeSystemConfiguration

Returns:

The initialized SystemConfiguration class instance for the local data acquisition system that stores the loaded configuration parameters.

Raises:

FileNotFoundError – If the local machine does not have a valid data acquisition system configuration file.

sl_shared_assets.data_classes.get_working_directory()

Resolves and returns the path to the local Sun lab working directory.

This service function is primarily used when working with Sun lab data stored on remote compute server(s) to establish local working directories for various jobs and pipelines.

Return type:

Path

Returns:

The path to the local working directory.

Raises:

FileNotFoundError – If the local machine does not have the Sun lab data directory, or the local working directory does not exist (has not been configured).

sl_shared_assets.data_classes.set_working_directory(path)

Sets the specified directory as the Sun lab working directory for the local machine (PC).

This function is used as the first step for configuring any machine to work with the data stored on the remote compute server(s). All lab libraries use this directory for caching configuration data and runtime working (intermediate) data.

Notes

The path to the working directory is stored inside the user’s data directory so that all Sun lab libraries can automatically access and use the same working directory.

If the input path does not point to an existing directory, the function will automatically generate the requested directory.

After setting up the working directory, the user should use other commands from the ‘sl-configure’ CLI to generate the remote compute server access credentials and / or acquisition system configuration files.

Parameters:

path (Path) – The path to the directory to set as the local Sun lab working directory.

Return type:

None

Server

This package provides the classes and methods used by all Sun lab libraries to work with the data stored on remote compute servers, such as the BioHPC server. It provides tools for submitting and monitoring jobs, running complex processing pipelines and interactively working with the data via a Jupyter lab server.

class sl_shared_assets.server.Job(job_name, output_log, error_log, working_directory, conda_environment, cpus_to_use=10, ram_gb=10, time_limit=60)

Bases: object

Aggregates the data of a single SLURM-managed job to be executed on the Sun lab’s remote compute server.

This class provides the API for constructing any server-side job in the Sun lab. Internally, it wraps an instance of a Slurm class to package the job data into the format expected by the SLURM job manager. All jobs managed by this class instance should be submitted to an initialized Server class ‘submit_job’ method to be executed on the server.

Notes

The initialization method of the class contains the arguments for configuring the SLURM and Conda environments used by the job. Do not submit additional SLURM or Conda commands via the ‘add_command’ method, as this may produce unexpected behavior.

Each job can be conceptualized as a sequence of shell instructions to execute on the remote compute server. For the lab, that means that the bulk of the command consists of calling various CLIs exposed by data processing or analysis pipelines, installed in the Conda environment on the server. Other than that, the job contains commands for activating the target conda environment and, in some cases, doing other preparatory or cleanup work. The source code of a ‘remote’ job is typically identical to what a human operator would type in a ‘local’ terminal to run the same job on their PC.

A key feature of server-side jobs is that they are executed on virtual machines managed by SLURM. Since the server has a lot more compute and memory resources than likely needed by individual jobs, each job typically requests a subset of these resources. Upon being executed, SLURM creates an isolated environment with the requested resources and runs the job in that environment.

Since all jobs are expected to use the CLIs from python packages (pre)installed on the BioHPC server, make sure that the target environment is installed and configured before submitting jobs to the server. See notes in ReadMe to learn more about configuring server-side conda environments.

Parameters:
  • job_name (str) – The descriptive name of the SLURM job to be created. Primarily, this name is used in terminal printouts to identify the job to human operators.

  • output_log (Path) – The absolute path to the .txt file on the processing server, where to store the standard output data of the job.

  • error_log (Path) – The absolute path to the .txt file on the processing server, where to store the standard error data of the job.

  • working_directory (Path) – The absolute path to the directory where temporary job files will be stored. During runtime, classes from this library use that directory to store files such as the job’s shell script. All such files are automatically removed from the directory at the end of a non-errors runtime.

  • conda_environment (str) – The name of the conda environment to activate on the server before running the job logic. The environment should contain the necessary Python packages and CLIs to support running the job’s logic.

  • cpus_to_use (int, default: 10) – The number of CPUs to use for the job.

  • ram_gb (int, default: 10) – The amount of RAM to allocate for the job, in Gigabytes.

  • time_limit (int, default: 60) – The maximum time limit for the job, in minutes. If the job is still running at the end of this time period, it will be forcibly terminated. It is highly advised to always set adequate maximum runtime limits to prevent jobs from hogging the server in case of runtime or algorithm errors.

remote_script_path

Stores the path to the script file relative to the root of the remote server that runs the command.

job_id

Stores the unique job identifier assigned by the SLURM manager to this job when it is accepted for execution. This field is initialized to None and is overwritten by the Server class that submits the job.

job_name

Stores the descriptive name of the SLURM job.

_command

Stores the managed SLURM command object.

add_command(command)

Adds the input command string to the end of the managed SLURM job command list.

This method is a wrapper around simple_slurm’s ‘add_cmd’ method. It is used to iteratively build the shell command sequence of the job.

Parameters:

command (str) – The command string to add to the command list, e.g.: ‘python main.py –input 1’.

Return type:

None

property command_script: str

Translates the managed job data into a shell-script-writable string and returns it to caller.

This method is used by the Server class to translate the job into the format that can be submitted to and executed on the remote compute server. Do not call this method manually unless you know what you are doing. The returned string is safe to dump into a .sh (shell script) file and move to the BioHPC server for execution.

class sl_shared_assets.server.JupyterJob(job_name, output_log, error_log, working_directory, conda_environment, notebook_directory, port=9999, cpus_to_use=2, ram_gb=32, time_limit=120, jupyter_args='')

Bases: Job

Specialized Job instance designed to launch a Jupyter notebook server on SLURM.

This class extends the base Job class to include Jupyter-specific configuration and commands for starting a notebook server in a SLURM environment. Using this specialized job allows users to set up remote Jupyter servers while still benefitting from SLURM’s job management and fair airtime policies.

Notes

Jupyter servers directly compete for resources with headless data processing jobs. Therefore, it is important to minimize the resource footprint and the runtime of each Jupyter server, if possible.

Parameters:
  • job_name (str) – The descriptive name of the Jupyter SLURM job to be created. Primarily, this name is used in terminal printouts to identify the job to human operators.

  • output_log (Path) – The absolute path to the .txt file on the processing server, where to store the standard output data of the job.

  • error_log (Path) – The absolute path to the .txt file on the processing server, where to store the standard error data of the job.

  • working_directory (Path) – The absolute path to the directory where temporary job files will be stored. During runtime, classes from this library use that directory to store files such as the job’s shell script. All such files are automatically removed from the directory at the end of a non-errors runtime.

  • conda_environment (str) – The name of the conda environment to activate on the server before running the job logic. The environment should contain the necessary Python packages and CLIs to support running the job’s logic. For Jupyter jobs, this necessarily includes the Jupyter notebook and jupyterlab packages.

  • port (int, default: 9999) – The connection port number for the Jupyter server. Do not change the default value unless you know what you are doing, as the server has most common communication ports closed for security reasons.

  • notebook_directory (Path) – The directory to use as Jupyter’s root. During runtime, Jupyter will only have access to items stored in or under this directory. For most runtimes, this should be set to the user’s root data or working directory.

  • cpus_to_use (int, default: 2) – The number of CPUs to allocate to the Jupyter server. Keep this value as small as possible to avoid interfering with headless data processing jobs.

  • ram_gb (int, default: 32) – The amount of RAM, in GB, to allocate to the Jupyter server. Keep this value as small as possible to avoid interfering with headless data processing jobs.

  • time_limit (int, default: 120) – The maximum Jupyter server uptime, in minutes. Set this to the expected duration of your jupyter session.

  • jupyter_args (str, default: '') – Stores additional arguments to pass to jupyter notebook initialization command.

port

Stores the connection port of the managed Jupyter server.

notebook_dir

Stores the absolute path to the directory used as Jupyter’s root, relative to the remote server root.

connection_info

Stores the JupyterConnectionInfo instance after the Jupyter server is instantiated.

host

Stores the hostname of the remote server.

user

Stores the username used to connect with the remote server.

connection_info_file

The absolute path to the file that stores connection information relative to the remote server root.

_command

Stores the shell command for launching the Jupyter server.

parse_connection_info(info_file)

Parses the connection information file created by the Jupyter job on the server.

Use this method to parse the connection file fetched from the server to finalize setting up the Jupyter server job.

Parameters:

info_file (Path) – The path to the .txt file generated by the remote server that stores the Jupyter connection information to be parsed.

Return type:

None

print_connection_info()

Constructs and displays the command to set up the SSH tunnel to the server and the link to the localhost server view in the terminal.

The SSH command should be used via a separate terminal or subprocess call to establish the secure SSH tunnel to the Jupyter server. Once the SSH tunnel is established, the printed localhost url can be used to view the server from the local machine.

Return type:

None

class sl_shared_assets.server.ProcessingPipeline(pipeline_type, server, manager_id, jobs, remote_tracker_path, local_tracker_path, session, animal, project, keep_job_logs=False, pipeline_status=ProcessingStatus.RUNNING, _pipeline_stage=0)

Bases: object

Encapsulates access to a processing pipeline running on the remote compute server.

This class functions as an interface for all data processing pipelines running on Sun lab compute servers. It is pipeline-type-agnostic and works for all data processing pipelines supported by this library. After instantiation, the class automatically handles all interactions with the server necessary to run the remote processing pipeline and verify the runtime outcome via the runtime_cycle() method that has to be called cyclically until the pipeline is complete.

Notes

Each pipeline may be executed in one or more stages, each stage using one or more parallel jobs. As such, each pipeline can be seen as an execution graph that sequentially submits batches of jobs to the remote server. The processing graph for each pipeline is fully resolved at the instantiation of this class instance, so each instance contains the necessary data to run the entire processing pipeline.

The minimum self-contained unit of the processing pipeline is a single job. Since jobs can depend on the output of other jobs, they are organized into stages based on the dependency graph between jobs. Combined with cluster management software, such as SLURM, this class can efficiently execute processing pipelines on scalable compute clusters.

animal: str

The ID of the animal whose data is being processed by the tracked pipeline.

property is_running: bool

Returns True if the pipeline is currently running, False otherwise.

jobs: dict[int, tuple[tuple[Job, Path], ...]]

Stores the dictionary that maps the pipeline processing stage integer-codes to two-element tuples. Each tuple stores the Job objects and the paths to their remote working directories to be submitted to the server at each stage.

keep_job_logs: bool = False

Determines whether to keep the logs for the jobs making up the pipeline execution graph or (default) to remove them after pipeline successfully ends its runtime. If the pipeline fails to complete its runtime, the logs are kept regardless of this setting.

local_tracker_path: Path

The path to the pipeline’s processing tracker .yaml file on the local machine. The remote file is pulled to this location when the instance verifies the outcome of each tracked pipeline’s processing stage.

manager_id: int

The unique identifier for the manager process that constructs and manages the runtime of the tracked pipeline. This is used to ensure that only a single pipeline instance can work with each session’s data at the same time on the remote server.

pipeline_status: ProcessingStatus | int = 0

Stores the current status of the tracked remote pipeline. This field is updated each time runtime_cycle() instance method is called.

pipeline_type: ProcessingPipelines

Stores the name of the processing pipeline managed by this instance. Primarily, this is used to identify the pipeline to the user in terminal messages and logs.

project: str

The name of the project whose data is being processed by the tracked pipeline.

remote_tracker_path: Path

The path to the pipeline’s processing tracker .yaml file stored on the remote compute server.

runtime_cycle()

Checks the current status of the tracked pipeline and, if necessary, submits additional batches of jobs to the remote server to progress the pipeline.

This method is the main entry point for all interactions with the processing pipeline managed by this instance. It checks the current state of the pipeline, advances the pipeline’s processing stage, and submits the necessary jobs to the remote server. The runtime manager process should call this method repeatedly (cyclically) to run the pipeline until the ‘is_running’ property of the instance returns True.

Return type:

None

Notes

While the ‘is_running’ property can be used to determine whether the pipeline is still running, to resolve the final status of the pipeline (success or failure), the manager process should access the ‘status’ instance property.

server: Server

Stores the reference to the Server object that maintains bidirectional communication with the remote server running the pipeline.

session: str

The ID of the session whose data is being processed by the tracked pipeline.

property status: ProcessingStatus

Returns the current status of the pipeline packaged into a ProcessingStatus instance.

class sl_shared_assets.server.ProcessingPipelines(*values)

Bases: StrEnum

Defines the set of processing pipelines currently supported in the Sun lab.

All processing pipelines currently supported by the lab codebase are defined in this enumeration. Primarily, the elements from this enumeration are used in terminal messages and data logging entries to identify the pipelines to the user.

Notes

The elements in this enumeration match the elements in the ProcessingTracker enumeration, since each valid ProcessingPipeline instance has an associated ProcessingTracker file instance.

The order of pipelines in this enumeration loosely follows the sequence in which they are executed during the lifetime of the Sun lab data on the remote compute server.

ARCHIVING = 'data archiving'

Data archiving pipeline. To conserve the (limited) space on the fast working volume, once the data has been processed and integrated into a stable dataset, the processed data folder is moved to the storage volume and all folders under the root session folder on the processed data volume are deleted.

BEHAVIOR = 'behavior processing'

Behavior processing pipeline. This pipeline is used to process .npz log files to extract animal behavior data acquired during a single session (day). The processed logs also contain the timestamps use to synchronize behavior to video and mesoscope frame data, and experiment configuration and task information.

CHECKSUM = 'checksum resolution'

Checksum resolution pipeline. Primarily, it is used to verify that the raw data has been transferred to the remote storage server from the main acquisition system PC intact. This pipeline is sometimes also used to regenerate (re-checksum) the data stored on the remote compute server.

FORGING = 'dataset forging'

Dataset creation (forging) pipeline. This pipeline typically runs after the multi-day pipeline. It extracts and integrates the processed data from various sources such as brain activity, behavior, videos, etc., into a unified dataset.

MANIFEST = 'manifest generation'

Project manifest generation pipeline. This pipeline is generally not used in most runtime contexts. It allows manually regenerating the project manifest .feather file, which is typically only used during testing. All other pipeline automatically conduct the manifest (re)generation at the end of their runtime.

MULTIDAY = 'multi-day suite2p processing'

Multi-day suite2p processing (cell tracking) pipeline. This pipeline is used to track cells processed with the single-day suite2p pipelines across multiple days. It is executed for all sessions marked for integration into the same dataset as the first step of dataset creation.

PREPARATION = 'processing preparation'

Data processing preparation pipeline. Since the compute server uses a two-volume design with a slow (HDD) storage volume and a fast (NVME) working volume, to optimize data processing performance, the data needs to be transferred to the working volume before processing. This pipeline copies the raw data for the target session from the storage volume to the working volume.

SUITE2P = 'single-day suite2p processing'

Single-day suite2p pipeline. This pipeline is used to extract the cell activity data from 2-photon imaging data acquired during a single session (day).

VIDEO = 'video processing'

DeepLabCut (Video) processing pipeline. This pipeline is used to extract animal pose estimation data from the behavior video frames acquired during a single session (day).

class sl_shared_assets.server.ProcessingStatus(*values)

Bases: IntEnum

Maps integer-based processing pipeline status (state) codes to human-readable names.

This enumeration is used to track and communicate the progress of Sun lab processing pipelines as they are executed by the remote compute server. Specifically, the codes from this enumeration are used by the ProcessingPipeline class to communicate the status of the managed pipelines to external processes.

Notes

The status codes from this enumeration track the state of the pipeline as a whole, instead of tracking the state of each job that comprises the pipeline.

ABORTED = 3

The pipeline execution has been aborted prematurely, either by the manager process or due to an overriding request from another user.

FAILED = 2

The server has failed to complete the pipeline due to a runtime error.

RUNNING = 0

The pipeline is currently running on the remote server. It may be executed (in progress) or waiting for the required resources to become available (queued).

SUCCEEDED = 1

The server has successfully completed the processing pipeline.

class sl_shared_assets.server.ProcessingTracker(file_path, _complete=False, _encountered_error=False, _running=False, _manager_id=-1, _job_count=1, _completed_jobs=0)

Bases: YamlConfig

Wraps the .yaml file that tracks the state of a data processing pipeline and provides tools for communicating the state between multiple processes in a thread-safe manner.

This class is used by all data processing pipelines running on the remote compute server(s) to prevent race conditions and ensure that pipelines have exclusive access to the processed data. It is also used to evaluate the status (success / failure) of each pipeline as they are executed by the remote server.

Note

In library version 4.0.0 the processing trackers have been refactored to work similar to ‘lock’ files. That is, when a pipeline starts running on the remote server, its tracker is switched into the ‘running’ (locked) state until the pipeline completes, aborts, or encounters an error. When the tracker is locked, all modifications to the tracker or processed data have to originate from the same process that started the pipeline that locked the tracker file. This feature supports running complex processing pipelines that use multiple concurrent and / or sequential processing jobs on the remote server.

This instance frequently refers to a ‘manager process’ in method documentation. A ‘manager process’ is the highest-level process that manages the tracked pipeline. When a pipeline runs on remote compute servers, the manager process is typically the process running on the non-server machine (user PC) that submits the remote processing jobs to the compute server (via SSH or similar protocol). The worker process(es) that run the processing job(s) on the remote compute servers are NOT considered manager processes.

abort()

Resets the runtime tracker file to the default state.

This method can be used to reset the runtime tracker file, regardless of the current runtime state. Unlike other instance methods, this method can be called from any manager process, even if the runtime is already locked by another process. This method is only intended to be used in the case of emergency to ‘unlock’ a deadlocked runtime.

Return type:

None

property encountered_error: bool

Returns True if the tracker wrapped by the instance indicates that the processing runtime has aborted due to encountering an error.

error(manager_id)

Configures the tracker file to indicate that the tracked processing runtime encountered an error and failed to complete.

This method fulfills two main purposes. First, it ‘unlocks’ the runtime, allowing other manager processes to interface with the tracked runtime. Second, it updates the tracker file to reflect that the runtime was interrupted due to an error, which is used by the manager processes to detect and handle processing failures.

Parameters:

manager_id (int) – The unique xxHash-64 hash identifier of the manager process which attempts to report that the runtime tracked by this tracker file has encountered an error.

Raises:

TimeoutError – If the .lock file for the target .YAML file cannot be acquired within the timeout period.

Return type:

None

file_path: Path

Stores the path to the .yaml file used to cache the tracker data on disk. The class instance functions as a wrapper around the data stored inside the specified .yaml file.

property is_complete: bool

Returns True if the tracker wrapped by the instance indicates that the processing runtime has been completed successfully and that the runtime is not currently ongoing.

property is_running: bool

Returns True if the tracker wrapped by the instance indicates that the processing runtime is currently ongoing.

start(manager_id, job_count=1)

Configures the tracker file to indicate that a manager process is currently executing the tracked processing runtime.

Calling this method effectively ‘locks’ the tracked session and processing runtime combination to only be accessible from the manager process that calls this method. Calling this method for an already running runtime managed by the same process does not have any effect, so it is safe to call this method at the beginning of each processing job that makes up the runtime.

Parameters:
  • manager_id (int) – The unique xxHash-64 hash identifier of the manager process which attempts to start the runtime tracked by this tracker file.

  • job_count (int, default: 1) – The total number of jobs to be executed as part of the tracked pipeline. This is used to make the stop() method properly track the end of the pipeline as a whole, rather than the end of intermediate jobs. Primarily, this is used by multi-job pipelines where all jobs are submitted as part of a single phase and the job completion order cannot be known in-advance.

Raises:

TimeoutError – If the .lock file for the target .YAML file cannot be acquired within the timeout period.

Return type:

None

stop(manager_id)

Configures the tracker file to indicate that the tracked processing runtime has been completed successfully.

This method ‘unlocks’ the runtime, allowing other manager processes to interface with the tracked runtime. It also configures the tracker file to indicate that the runtime has been completed successfully, which is used by the manager processes to detect and handle processing completion.

Parameters:

manager_id (int) – The unique xxHash-64 hash identifier of the manager process which attempts to report that the runtime tracked by this tracker file has been completed successfully.

Raises:

TimeoutError – If the .lock file for the target .YAML file cannot be acquired within the timeout period.

Return type:

None

class sl_shared_assets.server.Server(credentials_path)

Bases: object

Encapsulates access to a Sun lab processing server.

This class provides the API that allows accessing the remote processing server to create and submit various SLURM-managed jobs to the server. It functions as the central interface used by all processing pipelines in the lab to execute costly data processing on the server.

Notes

All lab processing pipelines expect the data to be stored on the server and all processing logic to be packaged and installed into dedicated conda environments on the server.

This class assumes that the target server has SLURM job manager installed and accessible to the user whose credentials are used to connect to the server as part of this class instantiation.

Parameters:

credentials_path (Path) – The path to the locally stored .yaml file that contains the server hostname and access credentials.

_open

Tracks whether the connection to the server is open or not.

_client

Stores the initialized SSHClient instance used to interface with the server.

abort_job(job)

Aborts the target job if it is currently running on the server.

Use this method to immediately abort running or queued jobs without waiting for the timeout guard. If the job is queued, this method will remove it from the SLURM queue. If the job is already terminated, this method will do nothing.

Parameters:

job (Job | JupyterJob) – The Job object that needs to be aborted.

Return type:

None

close()

Closes the SSH connection to the server.

This method has to be called before destroying the class instance to ensure proper resource cleanup.

Return type:

None

create_directory(remote_path, parents=True)

Creates the specified directory tree on the managed remote server via SFTP.

This method creates directories on the remote server, with options to create parent directories and handle existing directories gracefully.

Parameters:
  • remote_path (Path) – The absolute path to the directory to create on the remote server, relative to the server root.

  • parents (bool, default: True) – Determines whether to create parent directories, if they are missing. Otherwise, if parents do not exist, raises a FileNotFoundError.

Return type:

None

Notes

This method silently assumes that it is fine if the directory already exists and treats it as a successful runtime end-point.

create_job(job_name, conda_environment, cpus_to_use=10, ram_gb=10, time_limit=60)

Creates and returns a new Job instance.

Use this method to generate Job objects for all headless jobs that need to be run on the remote server. The generated Job is a precursor that requires further configuration by the user before it can be submitted to the server for execution.

Parameters:
  • job_name (str) – The descriptive name of the SLURM job to be created. Primarily, this name is used in terminal printouts to identify the job to human operators.

  • conda_environment (str) – The name of the conda environment to activate on the server before running the job logic. The environment should contain the necessary Python packages and CLIs to support running the job’s logic.

  • cpus_to_use (int, default: 10) – The number of CPUs to use for the job.

  • ram_gb (int, default: 10) – The amount of RAM to allocate for the job, in Gigabytes.

  • time_limit (int, default: 60) – The maximum time limit for the job, in minutes. If the job is still running at the end of this time period, it will be forcibly terminated. It is highly advised to always set adequate maximum runtime limits to prevent jobs from hogging the server in case of runtime or algorithm errors.

Return type:

Job

Returns:

The initialized Job instance pre-filled with SLURM configuration data and conda activation commands. Modify the returned instance with any additional commands as necessary for the job to fulfill its intended purpose. Note, the Job requires submission via submit_job() to be executed by the server.

property dlc_projects_directory: Path

Returns the absolute path to the shared directory that stores all DeepLabCut projects.

exists(remote_path)

Returns True if the target file or directory exists on the remote server.

Return type:

bool

property host: str

Returns the hostname or IP address of the server accessible through this class.

job_complete(job)

Returns True if the job managed by the input Job instance has been completed or terminated its runtime due to an error.

If the job is still running or is waiting inside the execution queue, the method returns False.

Parameters:

job (Job | JupyterJob) – The Job object whose status needs to be checked.

Raises:

ValueError – If the input Job object does not contain a valid job_id, suggesting that it has not been submitted to the server.

Return type:

bool

launch_jupyter_server(job_name, conda_environment, notebook_directory, cpus_to_use=2, ram_gb=32, time_limit=240, port=0, jupyter_args='')

Launches a Jupyter notebook server on the target remote Sun lab server.

Use this method to run interactive Jupyter sessions on the remote server under SLURM control. Unlike the create_job(), this method automatically submits the job for execution as part of its runtime. Therefore, the returned JupyterJob instance should only be used to query information about how to connect to the remote Jupyter server.

Parameters:
  • job_name (str) – The descriptive name of the Jupyter SLURM job to be created. Primarily, this name is used in terminal printouts to identify the job to human operators.

  • conda_environment (str) – The name of the conda environment to activate on the server before running the job logic. The environment should contain the necessary Python packages and CLIs to support running the job’s logic. For Jupyter jobs, this necessarily includes the Jupyter notebook and jupyterlab packages.

  • port (int, default: 0) – The connection port number for the Jupyter server. If set to 0 (default), a random port number between 8888 and 9999 will be assigned to this connection to reduce the possibility of colliding with other user sessions.

  • notebook_directory (Path) – The directory to use as Jupyter’s root. During runtime, Jupyter will only have GUI access to items stored in or under this directory. For most runtimes, this should be set to the user’s root data or working directory.

  • cpus_to_use (int, default: 2) – The number of CPUs to allocate to the Jupyter server. Keep this value as small as possible to avoid interfering with headless data processing jobs.

  • ram_gb (int, default: 32) – The amount of RAM, in GB, to allocate to the Jupyter server. Keep this value as small as possible to avoid interfering with headless data processing jobs.

  • time_limit (int, default: 240) – The maximum Jupyter server uptime, in minutes. Set this to the expected duration of your jupyter session.

  • jupyter_args (str, default: '') – Stores additional arguments to pass to jupyter notebook initialization command.

Return type:

JupyterJob

Returns:

The initialized JupyterJob instance that stores information on how to connect to the created Jupyter server. Do NOT re-submit the job to the server, as this is done as part of this method’s runtime.

Raises:
  • TimeoutError – If the target Jupyter server doesn’t start within 120 minutes of this method being called.

  • RuntimeError – If the job submission fails for any reason.

property processed_data_root: Path

Returns the absolute path to the directory used to store the processed data for all Sun lab projects on the server accessible through this class.

pull_directory(local_directory_path, remote_directory_path)

Recursively downloads the entire target directory from the remote server to the local machine.

Parameters:
  • local_directory_path (Path) – The path to the local directory where the remote directory will be copied.

  • remote_directory_path (Path) – The path to the directory on the remote server to be downloaded.

Return type:

None

pull_file(local_file_path, remote_file_path)

Moves the specified file from the remote server to the local machine.

Parameters:
  • local_file_path (Path) – The path to the local instance of the file (where to copy the file).

  • remote_file_path (Path) – The path to the target file on the remote server (the file to be copied).

Return type:

None

push_directory(local_directory_path, remote_directory_path)

Recursively uploads the entire target directory from the local machine to the remote server.

Parameters:
  • local_directory_path (Path) – The path to the local directory to be uploaded.

  • remote_directory_path (Path) – The path on the remote server where the directory will be copied.

Return type:

None

push_file(local_file_path, remote_file_path)

Moves the specified file from the remote server to the local machine.

Parameters:
  • local_file_path (Path) – The path to the file that needs to be copied to the remote server.

  • remote_file_path (Path) – The path to the file on the remote server (where to copy the file).

Return type:

None

property raw_data_root: Path

Returns the absolute path to the directory used to store the raw data for all Sun lab projects on the server accessible through this class.

remove(remote_path, is_dir, recursive=False)

Removes the specified file or directory from the remote server.

Parameters:
  • remote_path (Path) – The path to the file or directory on the remote server to be removed.

  • is_dir (bool) – Determines whether the input path represents a directory or a file.

  • recursive (bool, default: False) – If True and is_dir is True, recursively deletes all contents of the directory before removing it. If False, only removes empty directories (standard rmdir behavior).

Return type:

None

submit_job(job, verbose=True)

Submits the input job to the managed BioHPC server via SLURM job manager.

This method submits various jobs for execution via the SLURM-managed BioHPC cluster. As part of its runtime, the method translates the Job object into the shell script, moves the script to the target working directory on the server, and instructs the server to execute the shell script (via SLURM).

Parameters:
  • job (Job | JupyterJob) – The Job object that contains all job data.

  • verbose (bool, default: True) – Determines whether to notify the user about non-error states of the job submission task. Typically, this is disabled when batch-submitting jobs (for example, as part of running a processing pipeline) and enabled when submitting single jobs.

Return type:

Job | JupyterJob

Returns:

The job object whose ‘job_id’ attribute had been modified with the job ID if the job was successfully submitted.

Raises:

RuntimeError – If job submission to the server fails.

property suite2p_configurations_directory: Path

Returns the absolute path to the shared directory that stores all sl-suite2p runtime configuration files.

property user: str

Returns the username used to authenticate with the server.

property user_data_root: Path

Returns the absolute path to the directory used to store user-specific data on the server accessible through this class.

property user_working_root: Path

Returns the absolute path to the user-specific working (fast) directory on the server accessible through this class.

class sl_shared_assets.server.ServerCredentials(username='YourNetID', password='YourPassword', host='cbsuwsun.biohpc.cornell.edu', storage_root='/local/storage', working_root='/local/workdir', shared_directory_name='sun_data')

Bases: YamlConfig

This class stores the hostname and credentials used to log into the BioHPC cluster to run Sun lab processing pipelines.

Primarily, this is used as part of the sl-experiment library runtime to start data processing once it is transferred to the BioHPC server during preprocessing. However, the same file can be used together with the Server class API to run any computation jobs on the lab’s BioHPC server.

host: str = 'cbsuwsun.biohpc.cornell.edu'

The hostname or IP address of the server to connect to.

password: str = 'YourPassword'

The password to use for server authentication.

processed_data_root: str

The path to the root directory used to store the processed data from all Sun lab projects on the target server.

raw_data_root: str

The path to the root directory used to store the raw data from all Sun lab projects on the target server.

shared_directory_name: str = 'sun_data'

Stores the name of the shared directory used to store all Sun lab project data on the storage and working server volumes.

storage_root: str = '/local/storage'

The path to the root storage (slow) server directory. Typically, this is the path to the top-level (root) directory of the HDD RAID volume.

user_data_root: str

The path to the root directory of the user on the target server. Unlike raw and processed data roots, which are shared between all Sun lab users, each user_data directory is unique for every server user.

user_working_root: str

The path to the root user working directory on the target server. This directory is unique for every user.

username: str = 'YourNetID'

The username to use for server authentication.

working_root: str = '/local/workdir'

The path to the root working (fast) server directory. Typically, this is the path to the top-level (root) directory of the NVME RAID volume. If the server uses the same volume for both storage and working directories, enter the same path under both ‘storage_root’ and ‘working_root’.

class sl_shared_assets.server.TrackerFileNames(*values)

Bases: StrEnum

Defines a set of processing tacker .yaml files used by the Sun lab data preprocessing, processing, and dataset formation pipelines to track the progress of the remotely executed pipelines.

This enumeration standardizes the names for all processing tracker files used in the lab. It is designed to be used via the get_processing_tracker() function to generate ProcessingTracker instances.

Notes:

The elements in this enumeration match the elements in the ProcessingPipelines enumeration, since each valid ProcessingPipeline instance has an associated ProcessingTracker file instance.

ARCHIVING = 'data_archiving_tracker.yaml'

This file is used to track the state of the data archiving pipeline.

BEHAVIOR = 'behavior_processing_tracker.yaml'

This file is used to track the state of the behavior log processing pipeline.

CHECKSUM = 'checksum_resolution_tracker.yaml'

This file is used to track the state of the checksum resolution pipeline.

FORGING = 'dataset_forging_tracker.yaml'

This file is used to track the state of the dataset creation (forging) pipeline.

MANIFEST = 'manifest_generation_tracker.yaml'

This file is used to track the state of the project manifest generation pipeline.

MULTIDAY = 'multiday_processing_tracker.yaml'

This file is used to track the state of the multiday suite2p processing pipeline.

PREPARATION = 'processing_preparation_tracker.yaml'

This file is used to track the state of the data processing preparation pipeline.

SUITE2P = 'suite2p_processing_tracker.yaml'

This file is used to track the state of the single-day suite2p processing pipeline.

VIDEO = 'video_processing_tracker.yaml'

This file is used to track the state of the video (DeepLabCut) processing pipeline.

sl_shared_assets.server.generate_manager_id()

Generates and returns a unique integer identifier that can be used to identify the manager process that calls this function.

The identifier is generated based on the current timestamp, accurate to microseconds, and a random number between 1 and 9999999999999. This ensures that the identifier is unique for each function call. The generated identifier string is converted to a unique integer value using the xxHash-64 algorithm before it is returned to the caller.

Return type:

int

Notes

This function should be used to generate manager process identifiers for working with ProcessingTracker instances from sl-shared-assets version 4.0.0 and above.

sl_shared_assets.server.generate_server_credentials(output_directory, username, password, service=False, host='cbsuwsun.biopic.cornell.edu', storage_root='/local/workdir', working_root='/local/storage', shared_directory_name='sun_data')

Generates a new server access credentials .yaml file under the specified directory, using input information.

This function provides a convenience interface for generating new server access credential files. Depending on configuration, it either creates user access credentials files or service access credentials files.

Parameters:
  • output_directory (Path) – The directory where to save the generated server_credentials.yaml file.

  • username (str) – The username to use for server authentication.

  • password (str) – The password to use for server authentication.

  • service (bool, default: False) – Determines whether the generated credentials file stores the data for a user or a service account.

  • host (str, default: 'cbsuwsun.biopic.cornell.edu') – The hostname or IP address of the server to connect to.

  • storage_root (str, default: '/local/workdir') – The path to the root storage (slow) server directory. Typically, this is the path to the top-level (root) directory of the HDD RAID volume.

  • working_root (str, default: '/local/storage') – The path to the root working (fast) server directory. Typically, this is the path to the top-level (root) directory of the NVME RAID volume. If the server uses the same volume for both storage and working directories, enter the same path under both ‘storage_root’ and ‘working_root’.

  • shared_directory_name (str, default: 'sun_data') – The name of the shared directory used to store all Sun lab project data on the storage and working server volumes.

Return type:

None