Command Line Interfaces

sl-manage

This Command-Line Interface (CLI) allows managing session and project data acquired in the Sun lab.

This CLI is intended to run on the Sun lab remote compute server(s) and should not be called by the end-user directly. Instead, commands from this CLI are designed to be accessed through the bindings in the sl-experiment and sl-forgery libraries.

sl-manage [OPTIONS] COMMAND [ARGS]...

project

This group provides commands for managing the data of a Sun lab project.

Commands from this group are used to support all interactions with the data stored on the Sun lab remote compute server(s).

sl-manage project [OPTIONS] COMMAND [ARGS]...

Options

-pp, --project-path <project_path>: Required The absolute path to the project-specific directory where raw session data is stored.

-pdr, --processed-data-root <processed_data_root>: The absolute path to the directory that stores the processed data from all Sun lab projects, if it is different from the root directory included in the ‘session-path’ argument value.

manifest

Generates the manifest .feather file that captures the current state of the target project’s data.

The manifest file contains the comprehensive snapshot of the available project’s data. It includes the information about the management and processing pipelines that have been applied to each session’s data, as well as the descriptive information about each session. The manifest file is used as an entry-point for all interactions with the Sun lab data stored on the remote compute server(s).

sl-manage project manifest [OPTIONS]

session

This group provides commands for managing the data of a Sun lab data acquisition session.

Commands from this group are used to support data processing and dataset-formation (forging) on remote compute servers.

sl-manage session [OPTIONS] COMMAND [ARGS]...

Options

-sp, --session-path <session_path>: Required The absolute path to the root session directory to process. This directory must contain the ‘raw_data’ subdirectory.

-pdr, --processed-data-root <processed_data_root>: The absolute path to the directory that stores the processed data from all Sun lab projects, if it is different from the root directory included in the ‘session-path’ argument value.

-id, --manager-id <manager_id>

Required The unique identifier of the process that manages this runtime.

Default:: 0

-r, --reset-tracker: Determines whether to forcibly reset the tracker file for the target session management pipeline before processing runtime. This flag should only be used in exceptional cases to recover from improper runtime terminations.

archive

Prepares the target session for long-term storage by moving all session data to the storage volume.

This command is primarily intended to run on remote compute servers that use slow HDD volumes to maximize data integrity and fast NVME volumes to maximize data processing speed. For such systems, moving all sessions that are no longer actively processed or analyzed to the slow drive volume frees up the processing volume space and ensures long-term data integrity.

sl-manage session archive [OPTIONS]

checksum

Resolves the data integrity checksum for the target session’s ‘raw_data’ directory.

This command can be used to verify the integrity of the session’s ‘raw_data’ directory using an existing checksum or to re-generate the checksum to reflect the current state of the directory. It only works with the ‘raw_data’ session directory and ignores all other directories. Primarily, this command is used to verify the integrity of the session’s data as it is transferred from data acquisition systems to long-term storage destinations.

sl-manage session checksum [OPTIONS]

Options

-rc, --recalculate-checksum: Determines whether to recalculate and overwrite the cached checksum value for the processed session. When the command is called with this flag, it effectively re-checksums the data instead of verifying its integrity.

lock

Acquires the lock for the target session’s data.

This command is used to ensure that the target session’s data can only be accessed from the specified manager process. Calling this command is a prerequisite for all other session data management, processing, or dataset formation commands. If this command is called as part of runtime, the ‘unlock’ command must be called at the end of that runtime to properly release the session’s data lock. This command respects the ‘–reset-tracker’ flag of the ‘session’ command group and, if this flag is present, forcibly resets the session lock file before re-acquiring it for the specified manager process.

sl-manage session lock [OPTIONS]

prepare

Prepares the target session for data processing by moving all session data to the working volume.

This command is intended to run on remote compute servers that use slow HDD volumes to maximize data integrity and fast NVME volumes to maximize data processing speed. For such systems, moving the data to the fast volume before processing results in a measurable processing time decrease.

sl-manage session prepare [OPTIONS]

unlock

Releases the lock for the target session’s data.

This command is used to reverse the effect of the ‘lock’ command, allowing other manager processes to work with the session’s data. This command can only be called from the same manager process used to acquire the session’s data lock.

sl-manage session unlock [OPTIONS]

sl-configure

This Command-Line Interface (CLI) allows configuring major components of the Sun lab data acquisition, processing, and analysis workflow, such as acquisition systems and compute server(s).

sl-configure [OPTIONS] COMMAND [ARGS]...

directory

Sets the input directory as the Sun lab working directory, creating any missing path components.

This command as the initial entry-point for setting up any machine (PC) to work with Sun lab libraries and data. After the working directory is configured, all calls to this and all other Sun lab libraries automatically use this directory to store the configuration and runtime data required to perform any requested task. This allows all Sun lab libraries to behave consistently across different user machines and runtime contexts.

sl-configure directory [OPTIONS]

Options

-d, --directory <directory>: Required The absolute path to the directory where to cache Sun lab configuration and local runtime data.

server

Generates a service or user server access credentials’ file.

This command is used to set up access to the lab’s remote compute server(s). The Server class uses the data stored inside the generated credentials .yaml file to connect to and execute remote jobs on the target compute server(s). Depending on the configuration, this command generates either the ‘user_credentials.yaml’ or ‘service_credentials.yaml’ file.

sl-configure server [OPTIONS]

Options

-u, --username <username>: Required The username to use for server authentication.

-p, --password <password>: Required The password to use for server authentication.

-s, --service: Determines whether the credentials’ file is created for a service account. This determines the name of the generated file. Do not provide this flag unless creating a service credentials file.

-h, --host <host>

Required The host name or IP address of the server.

Default:: 'cbsuwsun.biohpc.cornell.edu'

-sr, --storage-root <storage_root>

Required The absolute path to to the root storage server directory. Typically, this is the path to the top-level (root) directory of the HDD RAID volume.

Default:: '/local/storage'

-wr, --working-root <working_root>

Required The absolute path to the root working server directory. Typically, this is the path to the top-level (root) directory of the NVME RAID volume. If the server uses the same volume for both storing and working with data, set this to the same path as the ‘storage-root’ argument.

Default:: '/local/workdir'

-sd, --shared-directory <shared_directory>

Required The name of the shared directory used to store all Sun lab project data on all server volumes.

Default:: 'sun_data'

system

Generates the configuration file for the specified data acquisition system.

This command is typically used when setting up new data acquisition systems in the lab. The sl-experiment library uses the created file to load the acquisition system configuration data during data acquisition runtimes. The system configuration only needs to be created on the machine (PC) that runs the sl-experiment library and manages the acquisition runtime if the system uses multiple machines (PCs). Once the system configuration .yaml file is created via this command, edit the file to modify the acquisition system configuration at any time.

sl-configure system [OPTIONS]

Options

-s, --system <system>

Required The type (name) of the data acquisition system for which to generate the configuration file.

Default:: <AcquisitionSystems.MESOSCOPE_VR: 'mesoscope-vr'>
Options:: mesoscope-vr

Tools

This package provides helper tools used to automate routine operations, such as transferring or verifying the integrity of the data. The tools from this package are used by most other data processing libraries in the lab.

class sl_shared_assets.tools.ProjectManifest(manifest_file)

Bases: object

Wraps the contents of a Sun lab project manifest .feather file and exposes methods for visualizing and working with the data stored inside the file.

This class functions as a high-level API for working with Sun lab projects. It is used both to visualize the current state of various projects and during automated data processing to determine which processing steps to apply to different sessions.

Parameters:: manifest_file (Path) – The path to the .feather manifest file that stores the target project’s state data.

_data: Stores the manifest data as a Polars DataFrame.

_animal_string: Determines whether animal IDs are stored as strings or unsigned integers.

property animals: tuple[str, ...]

Returns all unique animal IDs stored inside the manifest file.

This provides a tuple of all animal IDs participating in the target project.

get_session_info(session)

Returns a Polars DataFrame that stores detailed information for the specified session.

Since session IDs are unique, it is expected that filtering by session ID is enough to get the requested information.

Parameters:: session (str) – The ID of the session for which to retrieve the data.
Returns:: ‘animal’, ‘date’, ‘notes’, ‘session’, ‘type’, ‘system’, ‘complete’, ‘integrity’, ‘suite2p’, ‘behavior’, ‘video’, ‘archived’.
Return type:: A Polars DataFrame with the following columns

get_sessions(animal=None, exclude_incomplete=True)

Returns requested session IDs based on selected filtering criteria.

This method provides a tuple of sessions based on the specified filters. If no animal is specified, returns sessions for all animals in the project.

Parameters:

animal (str | int | None, default: None) – An optional animal ID to filter the sessions. If set to None, the method returns sessions for all animals.
exclude_incomplete (bool, default: True) – Determines whether to exclude sessions not marked as ‘complete’ from the output list.

Return type:

tuple[str, ...]

Returns:

The tuple of session IDs matching the filter criteria.

Raises:

ValueError – If the specified animal is not found in the manifest file.

print_data()

Prints the entire contents of the manifest file to the terminal.

Return type:: None

print_notes(animal=None)

Prints only animal, session, and notes data from the manifest file.

This data view is optimized for experimenters to check what sessions have been recorded for each animal in the project and refresh their memory on the outcomes of each session using experimenter notes.

Parameters:: animal (str | int | None, default: None) – The ID of the animal for which to display the data. If an ID is provided, this method will only display the data for that animal. Otherwise, it will display the data for all animals.
Return type:: None

print_summary(animal=None)

Prints a summary view of the manifest file to the terminal, excluding the ‘experimenter notes’ data for each session.

This data view is optimized for tracking which processing steps have been applied to each session inside the project.

Parameters:: animal (str | int | None, default: None) – The ID of the animal for which to display the data. If an ID is provided, this method will only display the data for that animal. Otherwise, it will display the data for all animals.
Return type:: None

property sessions: tuple[str, ...]

Returns all session IDs stored inside the manifest file.

This property provides a tuple of all sessions, independent of the participating animal, that were recorded as part of the target project. Use the get_sessions() method to get the list of session tuples with filtering.

sl_shared_assets.tools.acquire_lock(session_path, manager_id, processed_data_root=None, reset_lock=False)

Acquires the target session’s data lock for the specified manager process.

Calling this function locks the target session’s data to make it accessible only for the specified manager process.

Notes

Each time this function is called, the release_lock() function must also be called to release the lock file.

Parameters:

session_path (Path) – The path to the session directory to be locked.
manager_id (int) – The unique identifier of the manager process that acquires the lock.
reset_lock (bool, default: False) – Determines whether to reset the lock file before executing the runtime. This allows recovering from deadlocked runtimes, but otherwise should not be used to ensure that the lock performs its intended function of limiting access to session’s data.
processed_data_root (Path | None, default: None) – The path to the root directory used to store the processed data from all Sun lab projects, if different from the ‘session_path’ root.

Return type:

None

sl_shared_assets.tools.archive_session(session_path, manager_id, reset_tracker=False, processed_data_root=None)

Prepares the target session for long-term (cold) storage.

This function is primarily designed to be used on remote compute servers that use different data volumes for storage and processing. It should be called for sessions that are no longer frequently processed or accessed to move all session data to the (slow) storage volume and free up the fast processing volume for working with other data. Typically, this function is used exactly once during each session’s life cycle: when the session’s project is officially concluded.

Parameters:

session_path (Path) – The path to the session directory to be processed.
manager_id (int) – The unique identifier of the manager process that manages the runtime.
reset_tracker (bool, default: False) – Determines whether to reset the tracker file before executing the runtime. This allows recovering from deadlocked runtimes, but otherwise should not be used to ensure runtime safety.
processed_data_root (Path | None, default: None) – The path to the root directory used to store the processed data from all Sun lab projects, if different from the ‘session_path’ root.

Return type:

None

Notes

This function inverses the result of running the prepare_session() function.

sl_shared_assets.tools.calculate_directory_checksum(directory, num_processes=None, batch=False, save_checksum=True)

Calculates xxHash3-128 checksum for the input directory, which includes the data of all contained files and the directory structure information.

Checksums are used to verify the data integrity during transmission within machines (from one storage volume to another) and between machines. The function can be configured to write the generated checksum as a hexadecimal string to the ax_checksum.txt file stored at the highest level of the input directory.

Note

This function uses multiprocessing to efficiently parallelize checksum calculation for multiple files. In combination with xxHash3, this achieves a significant speedup over other common checksum options, such as MD5 and SHA256. Note that xxHash3 is not suitable for security purposes and is only used to ensure data integrity.

The returned checksum accounts for both the contents of each file and the layout of the input directory structure.

Parameters:

directory (Path) – The Path to the directory to be checksummed.
num_processes (int | None, default: None) – The number of CPU processes to use for parallelizing checksum calculation. If set to None, the function defaults to using (logical CPU count - 4).
batch (bool, default: False) – Determines whether the function is called as part of batch-processing multiple directories. This is used to optimize progress reporting to avoid cluttering the terminal.
save_checksum (bool, default: True) – Determines whether the checksum should be saved (written to) a .txt file.

Return type:

str

Returns:

The xxHash3-128 checksum for the input directory as a hexadecimal string.

sl_shared_assets.tools.delete_directory(directory_path)

Removes the input directory and all its subdirectories using parallel processing.

This function outperforms default approaches like subprocess call with rm -rf and shutil rmtree for directories with a comparably small number of large files. For example, this is the case for the mesoscope frame directories, which are deleted ~6 times faster with this method over sh.rmtree. Potentially, it may also outperform these approaches for all comparatively shallow directories.

Notes

This function is often combined with the transfer_directory function to remove the source directory after it has been transferred.

Parameters:: directory_path (Path) – The path to the directory to delete.
Return type:: None

sl_shared_assets.tools.generate_project_manifest(raw_project_directory, manager_id, processed_data_root=None)

Builds and saves the project manifest .feather file under the specified output directory.

This function evaluates the input project directory and builds the ‘manifest’ file for the project. The file includes the descriptive information about every session stored inside the input project folder and the state of the session’s data processing (which processing pipelines have been applied to each session). The file is created under the input raw project directory and uses the following name pattern: ProjectName_manifest.feather.

Notes

The manifest file is primarily used to capture and move project state information between machines, typically in the context of working with data stored on a remote compute server or cluster.

Parameters:

raw_project_directory (Path) – The path to the root project directory used to store raw session data.
manager_id (int) – The unique identifier of the manager process that manages the runtime.
processed_data_root (Path | None, default: None) – The path to the root directory (volume) used to store processed data for all Sun lab projects if it is different from the parent of the ‘raw_project_directory’.

Return type:

None

sl_shared_assets.tools.prepare_session(session_path, manager_id, processed_data_root, reset_tracker=False)

Prepares the target session for data processing and dataset integration.

This function is primarily designed to be used on remote compute servers that use different data volumes for storage and processing. Since storage volumes are often slow, the session data needs to be copied to the fast volume before executing processing pipelines. Typically, this function is used exactly once during each session’s life cycle: when it is first transferred to the remote compute server.

Parameters:

session_path (Path) – The path to the session directory to be processed.
manager_id (int) – The unique identifier of the manager process that manages the runtime.
processed_data_root (Path | None) – The path to the root directory used to store the processed data from all Sun lab projects, if different from the ‘session_path’ root.
reset_tracker (bool, default: False) – Determines whether to reset the tracker file before executing the runtime. This allows recovering from deadlocked runtimes, but otherwise should not be used to ensure runtime safety.

Return type:

None

Notes

This function inverses the result of running the archive_session() function.

sl_shared_assets.tools.release_lock(session_path, manager_id, processed_data_root=None)

Releases the target session’s data lock if it is owned by the specified manager process.

Calling this function unlocks the session’s data, making it possible for other manager processes to acquire the lock and work with the session’s data. This step has to be performed by every manager process as part of its shutdown sequence if the manager called the acquire_lock() function.

Parameters:

session_path (Path) – The path to the session directory to be unlocked.
manager_id (int) – The unique identifier of the manager process that releases the lock.
processed_data_root (Path | None, default: None) – The path to the root directory used to store the processed data from all Sun lab projects, if different from the ‘session_path’ root.

Return type:

None

sl_shared_assets.tools.resolve_checksum(session_path, manager_id, processed_data_root=None, reset_tracker=False, regenerate_checksum=False)

Verifies the integrity of the session’s data by generating the checksum of the raw_data directory and comparing it against the checksum stored in the ax_checksum.txt file.

Primarily, this function is used to verify data integrity after transferring it from the data acquisition system PC to the remote server for long-term storage.

Notes

Any session that does not successfully pass checksum verification (or recreation) is automatically excluded from all further automatic processing steps.

Since version 5.0.0, this function also supports recalculating and overwriting the checksum stored inside the ax_checksum.txt file. This allows this function to re-checksum session data, which is helpful if the experimenter deliberately alters the session’s data post-acquisition (for example, to comply with new data storage guidelines).

Parameters:

session_path (Path) – The path to the session directory to be processed.
manager_id (int) – The unique identifier of the manager process that manages the runtime.
processed_data_root (None | Path, default: None) – The path to the root directory used to store the processed data from all Sun lab projects, if different from the ‘session_path’ root.
reset_tracker (bool, default: False) – Determines whether to reset the tracker file before executing the runtime. This allows recovering from deadlocked runtimes, but otherwise should not be used to ensure runtime safety.
regenerate_checksum (bool, default: False) – Determines whether to update the checksum stored in the ax_checksum.txt file before carrying out the verification. In this case, the verification necessarily succeeds, and the session’s reference checksum is changed to reflect the current state of the session data.

Return type:

None

sl_shared_assets.tools.transfer_directory(source, destination, num_threads=1, verify_integrity=False, remove_source=False)

Copies the contents of the input directory tree from source to destination while preserving the folder structure.

Notes

This method recreates the moved directory hierarchy on the destination if the hierarchy does not exist. This is done before copying the files.

The method executes a multithreading copy operation and does not by default remove the source data after the copy is complete.

If the method is configured to verify transferred data integrity, it generates xxHash-128 checksum of the data before and after the transfer and compares the two checksums to detect data corruption.

Parameters:

source (Path) – The path to the directory that needs to be moved.
destination (Path) – The path to the destination directory where to move the contents of the source directory.
num_threads (int, default: 1) – The number of threads to use for parallel file transfer. This number should be set depending on the type of transfer (local or remote) and is not guaranteed to provide improved transfer performance. For local transfers, setting this number above 1 will likely provide a performance boost. For remote transfers using a single TCP / IP socket (such as non-multichannel SMB protocol), the number should be set to 1. Setting this value to a number below 1 instructs the function to use all available CPU cores.
verify_integrity (bool, default: False) – Determines whether to perform integrity verification for the transferred files.
remove_source (bool, default: False) – Determines whether to remove the source directory and all of its contents after the transfer is complete and optionally verified.

Raises:

RuntimeError – If the transferred files do not pass the xxHas3-128 checksum integrity verification.

Return type:

None

Data and Configuration Assets

This package provides the classes used to store data acquired in the Sun lab and to configure all elements and pipelines making up the lab’s data workflow. Many classes from this package are designed to be saved to disk as .yaml files and restored from the .yaml files as needed.

class sl_shared_assets.data_classes.AcquisitionSystems(*values)

Bases: StrEnum

Stores the names for all data acquisition systems currently used in the Sun lab.

MESOSCOPE_VR = 'mesoscope-vr': The Mesoscope-VR data acquisition system. It is built around 2-Photon Random Access Mesoscope (2P-RAM) and relies on Unity-backed virtual reality task-environments to conduct experiments.

class sl_shared_assets.data_classes.DrugData(lactated_ringers_solution_volume_ml, lactated_ringers_solution_code, ketoprofen_volume_ml, ketoprofen_code, buprenorphine_volume_ml, buprenorphine_code, dexamethasone_volume_ml, dexamethasone_code)

Bases: object

Stores the information about all medical substances (drugs) administered to the subject before, during, and immediately after the surgical intervention.

buprenorphine_code: str: Stores the manufacturer code or internal reference code for buprenorphine. This code is used to identify the buprenorphine batch in additional datasheets and lab ordering documents.

buprenorphine_volume_ml: float: Stores the volume of buprenorphine diluted with saline administered during surgery, in ml.

dexamethasone_code: str: Stores the manufacturer code or internal reference code for dexamethasone. This code is used to identify the dexamethasone batch in additional datasheets and lab ordering documents.

dexamethasone_volume_ml: float: Stores the volume of dexamethasone diluted with saline administered during surgery, in ml.

ketoprofen_code: str: Stores the manufacturer code or internal reference code for ketoprofen. This code is used to identify the ketoprofen batch in additional datasheets and lab ordering documents.

ketoprofen_volume_ml: float: Stores the volume of ketoprofen diluted with saline administered during surgery, in ml.

lactated_ringers_solution_code: str: Stores the manufacturer code or internal reference code for Lactated Ringer’s Solution (LRS). This code is used to identify the LRS batch in additional datasheets and lab ordering documents.

lactated_ringers_solution_volume_ml: float: Stores the volume of Lactated Ringer’s Solution (LRS) administered during surgery, in ml.

class sl_shared_assets.data_classes.ExperimentState(experiment_state_code, system_state_code, state_duration_s, initial_guided_trials, recovery_failed_trial_threshold, recovery_guided_trials)

Bases: object

Stores the information used to set and maintain the desired experiment and system state.

Broadly, each experiment runtime can be conceptualized as a two-state system. The first is the experiment task state, which reflects the behavior goal, the rules for achieving the goal, and the reward for achieving the goal. The second is the data acquisition system state, which is a snapshot of all hardware module states that make up the system that acquires the data and controls the task environment.

Note

This class is acquisition-system-agnostic. All data acquisition systems use this class as part of their specific ExperimentConfiguration class instances.

experiment_state_code: int: The integer code of the experiment state. Note, each experiment is expected to define and follow its own experiment state code mapping. Typically, the experiment state code is used to denote major experiment stages, such as ‘baseline’, ‘task’, ‘cooldown’, etc. The same experiment state code can be used by multiple sequential ExperimentState instances to change the data acquisition system states while maintaining the same experiment state.

initial_guided_trials: int: The number of trials (laps), counting from the onset of the experiment state, during which the animal should receive water rewards for entering the reward zone. Once the specified number of guided trials passes, the system disables guidance, requiring the animal to lick in the reward zone to get rewards.

recovery_failed_trial_threshold: int: The number of sequentially failed (non-rewarded) trials (laps), after which the system should re-enable lick guidance for the following ‘recovery_guided_trials’ number of trials.

recovery_guided_trials: int: The number of trials (laps) for which the system should re-enable lick guidance, when the animal sequentially fails ‘failed_trial_threshold’ number of trials. This field works similar to the ‘initial_guided_trials’ field, but is triggered by repeated performance failures, rather than experiment state onset.

state_duration_s: float: The time, in seconds, to maintain the experiment and system state combination specified by this instance during runtime.

system_state_code: int: One of the supported data acquisition system state-codes. Note, the meaning of each system state code depends on the specific data acquisition system used during the experiment.

class sl_shared_assets.data_classes.ExperimentTrial(cue_sequence, trial_length_cm, trial_reward_size_ul, reward_zone_start_cm, reward_zone_end_cm, guidance_trigger_location_cm)

Bases: object

Stores the information about a single experiment trial.

All Virtual Reality (VR) tasks can be broadly conceptualized as repeating motifs (sequences) of VR environment wall cues, associated with a specific goal, for which animals receive water rewards. Each complete motif is typically interpreted as a single experiment trial.

Notes

Since some experiments use multiple distinct trial types as part of the same experiment session, multiple instances of this class can be used by an ExperimentConfiguration class instance to represent multiple used trial types.

cue_sequence: list[int]: The sequence of Virtual Reality environment wall cues experienced by the animal while running this trial. Note, the cues must be specified as integer-codes matching the codes used in the ‘cue_map’ dictionary of the ExperimentConfiguration class for the experiment.

guidance_trigger_location_cm: float: The location of the invisible boundary (wall) with which the animal must collide to trigger water reward delivery during guided trials.

reward_zone_end_cm: float: The ending boundary of the trial reward zone, in centimeters.

reward_zone_start_cm: float: The starting boundary of the trial reward zone, in centimeters.

trial_length_cm: float: The length of the trial cue sequence in centimeters.

trial_reward_size_ul: float: The volume of water, in microliters, dispensed when the animal successfully completes the trial’s task.

class sl_shared_assets.data_classes.ImplantData(implant, implant_target, implant_code, implant_ap_coordinate_mm, implant_ml_coordinate_mm, implant_dv_coordinate_mm)

Bases: object

Stores information about a single implantation procedure performed during the surgical intervention.

Multiple ImplantData instances can be used at the same time if the surgery involves multiple implants.

implant: str: Stores the descriptive name of the implant.

implant_ap_coordinate_mm: float: Stores the implant’s antero-posterior stereotactic coordinate, in millimeters, relative to bregma.

implant_code: str: Stores the manufacturer code or internal reference code for the implant. This code is used to identify the implant in additional datasheets and lab ordering documents.

implant_dv_coordinate_mm: float: Stores the implant’s dorsal-ventral stereotactic coordinate, in millimeters, relative to bregma.

implant_ml_coordinate_mm: float: Stores the implant’s medial-lateral stereotactic coordinate, in millimeters, relative to bregma.

implant_target: str: Stores the name of the brain region or cranium section targeted by the implant.

class sl_shared_assets.data_classes.InjectionData(injection, injection_target, injection_volume_nl, injection_code, injection_ap_coordinate_mm, injection_ml_coordinate_mm, injection_dv_coordinate_mm)

Bases: object

Stores information about a single injection performed during the surgical intervention.

Multiple InjectionData instances can be used at the same time if the surgery involves multiple injections.

injection: str: Stores the descriptive name of the injection.

injection_ap_coordinate_mm: float: Stores the injection’s antero-posterior stereotactic coordinate, in millimeters, relative to bregma.

injection_code: str: Stores the manufacturer code or internal reference code for the injected substance. This code is used to identify the substance in additional datasheets and lab ordering documents.

injection_dv_coordinate_mm: float: Stores the injection’s dorsal-ventral stereotactic coordinate, in millimeters, relative to bregma.

injection_ml_coordinate_mm: float: Stores the injection’s medial-lateral stereotactic coordinate, in millimeters, relative to bregma.

injection_target: str: Stores the name of the brain region targeted by the injection.

injection_volume_nl: float: Stores the volume of substance, in nanoliters, delivered during the injection.

class sl_shared_assets.data_classes.LickTrainingDescriptor(experimenter, mouse_weight_g, minimum_reward_delay_s, maximum_reward_delay_s, maximum_water_volume_ml, maximum_training_time_m, maximum_unconsumed_rewards=1, dispensed_water_volume_ml=0.0, pause_dispensed_water_volume_ml=0.0, experimenter_given_water_volume_ml=0.0, preferred_session_water_volume_ml=0.0, incomplete=False, experimenter_notes='Replace this with your notes.')

Bases: YamlConfig

Stores the task and outcome information specific to lick training sessions that use the Mesoscope-VR system.

dispensed_water_volume_ml: float = 0.0: Stores the total water volume, in milliliters, dispensed during runtime. This excludes the water volume dispensed during the paused (idle) state.

experimenter: str: The ID of the experimenter running the session.

experimenter_given_water_volume_ml: float = 0.0: The additional volume of water, in milliliters, administered by the experimenter to the animal after the session.

experimenter_notes: str = 'Replace this with your notes.': This field is not set during runtime. It is expected that each experimenter replaces this field with their notes made during runtime.

incomplete: bool = False: If this field is set to True, the session is marked as ‘incomplete’ and automatically excluded from all further Sun lab automated processing and analysis.

maximum_reward_delay_s: int: Stores the maximum delay, in seconds, that can separate the delivery of two consecutive water rewards.

maximum_training_time_m: int: Stores the maximum time, in minutes, the system is allowed to run the training for.

maximum_unconsumed_rewards: int = 1: Stores the maximum number of consecutive rewards that can be delivered without the animal consuming them. If the animal receives this many rewards without licking (consuming) them, reward delivery is paused until the animal consumes the rewards.

maximum_water_volume_ml: float: Stores the maximum volume of water the system is allowed to dispense during training.

minimum_reward_delay_s: int: Stores the minimum delay, in seconds, that can separate the delivery of two consecutive water rewards.

mouse_weight_g: float: The weight of the animal, in grams, at the beginning of the session.

pause_dispensed_water_volume_ml: float = 0.0: Stores the total water volume, in milliliters, dispensed during the paused (idle) state.

preferred_session_water_volume_ml: float = 0.0: The volume of water, in milliliters, the animal should be receiving during the session runtime if its performance matches experimenter-specified threshold.

class sl_shared_assets.data_classes.MesoscopeAdditionalFirmware(headbar_port='/dev/ttyUSB0', lickport_port='/dev/ttyUSB1', wheel_port='/dev/ttyUSB2', unity_ip='127.0.0.1', unity_port=1883)

Bases: object

Stores the configuration parameters for all firmware and hardware components not assembled in the Sun lab.

headbar_port: str = '/dev/ttyUSB0': The USB port used by the HeadBar Zaber motor controllers (devices).

lickport_port: str = '/dev/ttyUSB1': The USB port used by the LickPort Zaber motor controllers (devices).

unity_ip: str = '127.0.0.1': The IP address of the MQTT broker used to communicate with the Unity game engine.

unity_port: int = 1883: The port number of the MQTT broker used to communicate with the Unity game engine.

wheel_port: str = '/dev/ttyUSB2': The USB port used by the (running) Wheel Zaber motor controllers (devices).

class sl_shared_assets.data_classes.MesoscopeCameras(face_camera_index=0, left_camera_index=0, right_camera_index=2, face_camera_quantization_parameter=15, body_camera_quantization_parameter=15, display_face_camera_frames=True, display_body_camera_frames=True)

Bases: object

Stores the configuration parameters for the cameras used by the Mesoscope-VR system to record behavior videos.

body_camera_quantization_parameter: int = 15: The quantization parameter used by the left and right body cameras to encode acquired frames as video files.

display_body_camera_frames: bool = True: Determines whether to display the frames grabbed from the left and right body cameras during runtime.

display_face_camera_frames: bool = True: Determines whether to display the frames grabbed from the face camera during runtime.

face_camera_index: int = 0: The index of the face camera in the list of all available Harvester-managed cameras.

face_camera_quantization_parameter: int = 15: The quantization parameter used by the face camera to encode acquired frames as video files.

left_camera_index: int = 0: The index of the left body camera (from animal’s perspective) in the list of all available OpenCV-managed cameras.

right_camera_index: int = 2: The index of the right body camera (from animal’s perspective) in the list of all available OpenCV-managed cameras.

class sl_shared_assets.data_classes.MesoscopeExperimentConfiguration(cue_map=<factory>, cue_offset_cm=10.0, unity_scene_name='IvanScene', experiment_states=<factory>, trial_structures=<factory>)

Bases: YamlConfig

Stores the configuration of an experiment runtime that uses the Mesoscope_VR data acquisition system.

During runtime, the acquisition system executes the sequence of states stored in this class instance. Together with custom Unity projects, which define the task environment and logic, this class allows flexibly implementing a wide range of experiments using the Mesoscope-VR system.

cue_map: dict[int, float]: Maps each integer-code associated with the experiment’s Virtual Reality (VR) environment wall cue to its length in real-world centimeters. It is used to map each VR cue to the distance the animal needs to travel to fully traverse the wall cue region from start to end.

cue_offset_cm: float = 10.0: Specifies the offset distance, in centimeters, by which the animal’s running track is shifted relative to the Virtual Reality (VR) environment’s wall cue sequence. Due to how the VR environment is displayed to the animal, most runtimes need to shift the animal slightly forward relative to the VR cue sequence origin (0), to prevent it from seeing the portion of the environment before the first VR wall cue. This offset statically shifts the entire track (in centimeters) against the set of VR wall cues used during runtime.

experiment_states: dict[str, ExperimentState]: Maps human-readable experiment state names to corresponding ExperimentState instances. Each ExperimentState instance represents a phase of the experiment. During runtime, the phases are executed in the same order as specified in this dictionary.

trial_structures: dict[str, ExperimentTrial]: Maps human-readable trial structure names to corresponding ExperimentTrial instances. Each ExperimentTrial instance specifies the Virtual Reality (VR) environment layout and task parameters associated with a single type of trials supported by the experiment runtime.

unity_scene_name: str = 'IvanScene': The name of the Virtual Reality task (Unity Scene) used during experiment.

class sl_shared_assets.data_classes.MesoscopeExperimentDescriptor(experimenter, mouse_weight_g, maximum_unconsumed_rewards=1, dispensed_water_volume_ml=0.0, pause_dispensed_water_volume_ml=0.0, experimenter_given_water_volume_ml=0.0, preferred_session_water_volume_ml=0.0, incomplete=False, experimenter_notes='Replace this with your notes.')

Bases: YamlConfig

Stores the task and outcome information specific to experiment sessions that use the Mesoscope-VR system.

dispensed_water_volume_ml: float = 0.0: Stores the total water volume, in milliliters, dispensed during runtime. This excludes the water volume dispensed during the paused (idle) state.

experimenter: str: The ID of the experimenter running the session.

experimenter_given_water_volume_ml: float = 0.0: The additional volume of water, in milliliters, administered by the experimenter to the animal after the session.

experimenter_notes: str = 'Replace this with your notes.': This field is not set during runtime. It is expected that each experimenter will replace this field with their notes made during runtime.

incomplete: bool = False: If this field is set to True, the session is marked as ‘incomplete’ and automatically excluded from all further Sun lab automated processing and analysis.

maximum_unconsumed_rewards: int = 1: Stores the maximum number of consecutive rewards that can be delivered without the animal consuming them. If the animal receives this many rewards without licking (consuming) them, reward delivery is paused until the animal consumes the rewards.

mouse_weight_g: float: The weight of the animal, in grams, at the beginning of the session.

pause_dispensed_water_volume_ml: float = 0.0: Stores the total water volume, in milliliters, dispensed during the paused (idle) state.

preferred_session_water_volume_ml: float = 0.0: The volume of water, in milliliters, the animal should be receiving during the session runtime if its performance matches experimenter-specified threshold.

class sl_shared_assets.data_classes.MesoscopeHardwareState(cm_per_pulse=None, maximum_break_strength=None, minimum_break_strength=None, lick_threshold=None, valve_scale_coefficient=None, valve_nonlinearity_exponent=None, torque_per_adc_unit=None, screens_initially_on=None, recorded_mesoscope_ttl=None, system_state_codes=None)

Bases: YamlConfig

Stores configuration parameters (states) of the Mesoscope-VR system hardware modules used during training or experiment runtime.

This information is used to read and decode the data saved to the .npz log files during runtime as part of data processing.

Notes

This class stores ‘static’ Mesoscope-VR system configuration that does not change during experiment or training session runtime. This is in contrast to MesoscopeExperimentConfiguration class, which reflects the ‘dynamic’ state of the Mesoscope-VR system during each experiment.

This class partially overlaps with the MesoscopeSystemConfiguration class, which is also stored in the raw_data folder of each session. The primary reason to keep both classes is to ensure that the math (rounding) used during runtime matches the math (rounding) used during data processing. MesoscopeSystemConfiguration does not do any rounding or otherwise attempt to be repeatable, which is in contrast to hardware modules that read and apply those parameters. Reading values from this class guarantees the read value exactly matches the value used during runtime.

Notes

All fields in this dataclass initialize to None. During log processing, any log associated with a hardware module that provides the data stored in a field will be processed, unless that field is None. Therefore, setting any field in this dataclass to None also functions as a flag for whether to parse the log associated with the module that provides this field’s information.

This class is automatically configured by _MesoscopeVRSystem class from the sl-experiment library to facilitate proper log parsing.

cm_per_pulse: float | None = None: EncoderInterface instance property. Stores the conversion factor used to translate encoder pulses into real-world centimeters. This conversion factor is fixed for each data acquisition system and does not change between experiments.

lick_threshold: int | None = None: LickInterface instance property. Determines the threshold, in 12-bit Analog to Digital Converter (ADC) units, above which an interaction value reported by the lick sensor is considered a lick (compared to noise or non-lick touch).

maximum_break_strength: float | None = None: BreakInterface instance property. Stores the breaking torque, in Newton centimeters, applied by the break to the edge of the running wheel when it is engaged at 100% strength.

minimum_break_strength: float | None = None: BreakInterface instance property. Stores the breaking torque, in Newton centimeters, applied by the break to the edge of the running wheel when it is engaged at 0% strength (completely disengaged).

recorded_mesoscope_ttl: bool | None = None: TTLInterface instance property. A boolean flag that determines whether the processed session recorded brain activity data with the mesoscope. In that case, attempts to parse the Mesoscope frame scanning TTL pulse data to synchronize Mesoscope data to behavior data.

screens_initially_on: bool | None = None: ScreenInterface instance property. Stores the initial state of the Virtual Reality screens at the beginning of the session runtime.

system_state_codes: dict[str, int] | None = None: A _MesoscopeVRSystem instance property. A dictionary that maps integer state-codes used by the Mesoscope-VR system to communicate its states (system states) to human-readable state names.

torque_per_adc_unit: float | None = None: TorqueInterface instance property. Stores the conversion factor used to translate torque values reported by the sensor as 12-bit Analog to Digital Converter (ADC) units, into real-world Newton centimeters (N·cm) of torque that had to be applied to the edge of the running wheel to produce the observed ADC value.

valve_nonlinearity_exponent: float | None = None: ValveInterface instance property. To dispense precise water volumes during runtime, ValveInterface uses power law equation applied to valve calibration data to determine how long to keep the valve open. This stores the nonlinearity_exponent of the power law equation that describes the relationship between valve open time and dispensed water volume, derived from calibration data.

valve_scale_coefficient: float | None = None: ValveInterface instance property. To dispense precise water volumes during runtime, ValveInterface uses power law equation applied to valve calibration data to determine how long to keep the valve open. This stores the scale_coefficient of the power law equation that describes the relationship between valve open time and dispensed water volume, derived from calibration data.

class sl_shared_assets.data_classes.MesoscopeMicroControllers(actor_port='/dev/ttyACM0', sensor_port='/dev/ttyACM1', encoder_port='/dev/ttyACM2', debug=False, minimum_break_strength_g_cm=43.2047, maximum_break_strength_g_cm=1152.1246, wheel_diameter_cm=15.0333, lick_threshold_adc=400, lick_signal_threshold_adc=300, lick_delta_threshold_adc=300, lick_averaging_pool_size=1, torque_baseline_voltage_adc=2046, torque_maximum_voltage_adc=2750, torque_sensor_capacity_g_cm=720.0779, torque_report_cw=True, torque_report_ccw=True, torque_signal_threshold_adc=100, torque_delta_threshold_adc=70, torque_averaging_pool_size=1, wheel_encoder_ppr=8192, wheel_encoder_report_cw=False, wheel_encoder_report_ccw=True, wheel_encoder_delta_threshold_pulse=15, wheel_encoder_polling_delay_us=500, cm_per_unity_unit=10.0, screen_trigger_pulse_duration_ms=500, auditory_tone_duration_ms=300, valve_calibration_pulse_count=200, sensor_polling_delay_ms=1, valve_calibration_data=((15000, 1.1), (30000, 3.0), (45000, 6.25), (60000, 10.9)))

Bases: object

Stores the configuration parameters for the microcontrollers used by the Mesoscope-VR system.

actor_port: str = '/dev/ttyACM0': The USB port used by the Actor Microcontroller.

auditory_tone_duration_ms: int = 300: The time, in milliseconds, to sound the auditory tone when water rewards are delivered to the animal.

cm_per_unity_unit: float = 10.0: The length of each Unity ‘unit’ in real-world centimeters recorded by the running wheel encoder.

debug: bool = False: Determines whether the acquisition system is running in the ‘debug mode’. This mode is used during the initial system calibration and testing. It should be disabled during all non-testing sessions to maximize system’s runtime performance.

encoder_port: str = '/dev/ttyACM2': The USB port used by the Encoder Microcontroller.

lick_averaging_pool_size: int = 1: The number of lick sensor readouts to average together to produce the final lick sensor readout value. Note, when using a Teensy controller, this number is multiplied by the built-in analog readout averaging (default is 4).

lick_delta_threshold_adc: int = 300: The minimum absolute difference in raw analog units recorded by a 12-bit Analog-to-Digital-Converter (ADC) for the change to be reported to the PC.

lick_signal_threshold_adc: int = 300: The minimum voltage, in raw analog units recorded by a 12-bit Analog-to-Digital-Converter (ADC), reported to the PC as a non-zero value. Voltages below this level are interpreted as ‘no-lick’ noise and are always pulled to 0.

lick_threshold_adc: int = 400: The threshold voltage, in raw analog units recorded by a 12-bit Analog-to-Digital-Converter (ADC), interpreted as the animal’s tongue contacting the sensor.

maximum_break_strength_g_cm: float = 1152.1246: The maximum torque applied by the running wheel break in gram centimeter. This is the torque the break delivers at maximum operational voltage.

minimum_break_strength_g_cm: float = 43.2047: The minimum torque applied by the running wheel break in gram centimeter. This is the torque the break delivers at minimum operational voltage.

screen_trigger_pulse_duration_ms: int = 500: The duration of the HIGH phase of the TTL pulse used to toggle the VR screens between ON and OFF states.

sensor_polling_delay_ms: int = 1: The delay, in milliseconds, between any two successive readouts of any sensor other than the encoder. Note, the encoder uses a dedicated parameter, as the encoder needs to be sampled at a higher frequency than all other sensors.

sensor_port: str = '/dev/ttyACM1': The USB port used by the Sensor Microcontroller.

torque_averaging_pool_size: int = 1: The number of torque sensor readouts to average together to produce the final torque sensor readout value. Note, when using a Teensy controller, this number is multiplied by the built-in analog readout averaging (default is 4).

torque_baseline_voltage_adc: int = 2046: The voltage level, in raw analog units measured by a 12-bit Analog-to-Digital-Converter (ADC) after the AD620 amplifier, that corresponds to no torque (0) readout.

torque_delta_threshold_adc: int = 70: The minimum absolute difference in raw analog units recorded by a 12-bit Analog-to-Digital-Converter (ADC) for the change to be reported to the PC.

torque_maximum_voltage_adc: int = 2750: The voltage level, in raw analog units measured by a 12-bit Analog-to-Digital-Converter (ADC) after the AD620 amplifier, that corresponds to the absolute maximum torque detectable by the sensor.

torque_report_ccw: bool = True: Determines whether the sensor should report torque in the Counter-Clockwise (CCW) direction. This direction corresponds to the animal trying to move backward on the wheel.

torque_report_cw: bool = True: Determines whether the sensor should report torque in the Clockwise (CW) direction. This direction corresponds to the animal trying to move forward on the wheel.

torque_sensor_capacity_g_cm: float = 720.0779: The maximum torque detectable by the sensor, in grams centimeter (g cm).

torque_signal_threshold_adc: int = 100: The minimum voltage, in raw analog units recorded by a 12-bit Analog-to-Digital-Converter (ADC), reported to the PC as a non-zero value. Voltages below this level are interpreted as noise and are always pulled to 0.

valve_calibration_data: dict[int | float, int | float] | tuple[tuple[int | float, int | float], ...] = ((15000, 1.1), (30000, 3.0), (45000, 6.25), (60000, 10.9)): A tuple of tuples that maps water delivery solenoid valve open times, in microseconds, to the dispensed volume of water, in microliters. During training and experiment runtimes, this data is used by the ValveModule to translate the requested reward volumes into times the valve needs to be open to deliver the desired volume of water.

valve_calibration_pulse_count: int = 200: The number of times to cycle opening and closing (pulsing) the valve during each calibration runtime. This determines how many reward deliveries are used at each calibrated time-interval to produce the average dispensed water volume readout used to calibrate the valve.

wheel_diameter_cm: float = 15.0333: The diameter of the running wheel, in centimeters.

wheel_encoder_delta_threshold_pulse: int = 15: The minimum difference, in encoder pulse counts, between two encoder readouts for the change to be reported to the PC.

wheel_encoder_polling_delay_us: int = 500: The delay, in microseconds, between any two successive encoder state readouts.

wheel_encoder_ppr: int = 8192: The resolution of the managed quadrature encoder, in Pulses Per Revolution (PPR). This is the number of quadrature pulses the encoder emits per full 360-degree rotation.

wheel_encoder_report_ccw: bool = True: Determines whether to report encoder rotation in the CCW (positive) direction. This corresponds to the animal moving forward on the wheel.

wheel_encoder_report_cw: bool = False: Determines whether to report encoder rotation in the CW (negative) direction. This corresponds to the animal moving backward on the wheel.

class sl_shared_assets.data_classes.MesoscopePaths(google_credentials_path=PosixPath('/media/Data/Experiments/sl-surgery-log-0f651e492767.json'), root_directory=PosixPath('/media/Data/Experiments'), server_storage_directory=PosixPath('/home/cybermouse/server/storage/sun_data'), server_working_directory=PosixPath('/home/cybermouse/server/workdir/sun_data'), nas_directory=PosixPath('/home/cybermouse/nas/rawdata'), mesoscope_directory=PosixPath('/home/cybermouse/scanimage/mesodata'), harvesters_cti_path=PosixPath('/opt/mvIMPACT_Acquire/lib/x86_64/mvGenTLProducer.cti'))

Bases: object

Stores the filesystem configuration parameters for the Mesoscope-VR data acquisition system.

Notes

All directories specified in this instance must be mounted to the local PC’s filesystem using an SMB or an equivalent protocol.

google_credentials_path: Path = PosixPath('/media/Data/Experiments/sl-surgery-log-0f651e492767.json'): The absolute path to the locally stored .JSON file that contains the service account credentials used to read and write Google Sheet data. This is used to access and work with the Google Sheet files used in the Sun lab.

harvesters_cti_path: Path = PosixPath('/opt/mvIMPACT_Acquire/lib/x86_64/mvGenTLProducer.cti'): The path to the GeniCam CTI file used to connect to Harvesters-managed cameras.

mesoscope_directory: Path = PosixPath('/home/cybermouse/scanimage/mesodata'): The absolute path to the root ScanImagePC (mesoscope-connected PC) local-filesystem-mounted directory where all mesoscope-acquired data is aggregated during acquisition.

nas_directory: Path = PosixPath('/home/cybermouse/nas/rawdata'): The absolute path to the local-filesystem-mounted directory where the raw data from all projects is stored on the NAS (backup long-term storage destination).

root_directory: Path = PosixPath('/media/Data/Experiments'): The absolute path to the directory where all projects are stored on the main data acquisition system PC.

server_storage_directory: Path = PosixPath('/home/cybermouse/server/storage/sun_data'): The absolute path to the local-filesystem-mounted directory where the raw data from all projects is stored on the remote compute server.

server_working_directory: Path = PosixPath('/home/cybermouse/server/workdir/sun_data'): The absolute path to the local-filesystem-mounted directory where the processed data from all projects is stored on the remote compute server.

class sl_shared_assets.data_classes.MesoscopePositions(mesoscope_x=0.0, mesoscope_y=0.0, mesoscope_roll=0.0, mesoscope_z=0.0, mesoscope_fast_z=0.0, mesoscope_tip=0.0, mesoscope_tilt=0.0, laser_power_mw=0.0, red_dot_alignment_z=0.0)

Bases: YamlConfig

Stores the positions of real and virtual Mesoscope objective axes reused between experiment sessions that use the Mesoscope-VR system.

This class is designed to help the experimenter move the Mesoscope to the same imaging plane across imaging sessions. It stores both the physical (real) position of the objective along the motorized X, Y, Z, and Roll axes, and the virtual (ScanImage software) tip, tilt, and fastZ (virtual zoom) axes.

laser_power_mw: float = 0.0: The laser excitation power at the sample, in milliwatts.

mesoscope_fast_z: float = 0.0: The ScanImage FastZ (virtual Z-axis) position, in micrometers.

mesoscope_roll: float = 0.0: The Mesoscope objective Roll-axis position, in degrees.

mesoscope_tilt: float = 0.0: The ScanImage Tip position, in degrees.

mesoscope_tip: float = 0.0: The ScanImage Tilt position, in degrees..

mesoscope_x: float = 0.0: The Mesoscope objective X-axis position, in micrometers.

mesoscope_y: float = 0.0: The Mesoscope objective Y-axis position, in micrometers.

mesoscope_z: float = 0.0: The Mesoscope objective Z-axis position, in micrometers.

red_dot_alignment_z: float = 0.0: The Mesoscope objective Z-axis position, in micrometers, used for red-dot alignment procedure.

class sl_shared_assets.data_classes.MesoscopeSystemConfiguration(name='mesoscope-vr', paths=<factory>, sheets=<factory>, cameras=<factory>, microcontrollers=<factory>, additional_firmware=<factory>)

Bases: YamlConfig

Stores the hardware and filesystem configuration parameters for the Mesoscope-VR data acquisition system.

This class is specifically designed to encapsulate the configuration parameters for the Mesoscope-VR system. It expects the system to be configured according to the specifications outlined in the sl-experiment repository (https://github.com/Sun-Lab-NBB/sl-experiment) and should be used exclusively on the VRPC machine (main Mesoscope-VR PC).

additional_firmware: MesoscopeAdditionalFirmware: Stores the configuration parameters for all firmware and hardware components not assembled in the Sun lab.

cameras: MesoscopeCameras: Stores the configuration parameters for the cameras used by the Mesoscope-VR system to record behavior videos.

microcontrollers: MesoscopeMicroControllers: Stores the configuration parameters for the microcontrollers used by the Mesoscope-VR system.

name: str = 'mesoscope-vr': Stores the descriptive name of the data acquisition system.

paths: MesoscopePaths: Stores the filesystem configuration parameters for the Mesoscope-VR data acquisition system.

save(path)

Saves class instance data to disk as a .yaml file.

This method converts certain class variables to yaml-safe types (for example, Path objects -> strings) and saves class data to disk as a .yaml file. The method is intended to be used solely by the create_system_configuration_file() function and should not be called from any other context.

Parameters:: path (Path) – The path to the .yaml file to save the data to.
Return type:: None

sheets: MesoscopeSheets: Stores the IDs of Google Sheets used by the Mesoscope-VR data acquisition system.

class sl_shared_assets.data_classes.ProcedureData(surgery_start_us, surgery_end_us, surgeon, protocol, surgery_notes, post_op_notes, surgery_quality=0)

Bases: object

Stores general information about the surgical intervention.

post_op_notes: str: Stores surgeon’s notes taken during the post-surgery recovery period.

protocol: str: Stores the experiment protocol number (ID) used during the surgery.

surgeon: str: Stores the name or ID of the surgeon. If the intervention was carried out by multiple surgeons, the data for all participants is stored as part of the same string.

surgery_end_us: int: Stores the surgery’s stop date and time as microseconds elapsed since UTC epoch onset.

surgery_notes: str: Stores surgeon’s notes taken during the surgery.

surgery_quality: int = 0: Stores the quality of the surgical intervention as a numeric level. 0 indicates unusable (bad) result, 1 indicates usable result that does not meet the publication threshold, 2 indicates publication-grade result, 3 indicates high-tier publication grade result.

surgery_start_us: int: Stores the surgery’s start date and time as microseconds elapsed since UTC epoch onset.

class sl_shared_assets.data_classes.ProcessedData(processed_data_path=PosixPath('.'), camera_data_path=PosixPath('.'), mesoscope_data_path=PosixPath('.'), behavior_data_path=PosixPath('.'), root_path=PosixPath('.'))

Bases: object

Stores the paths to the directories and files that make up the ‘processed_data’ session-specific directory.

The processed_data directory stores the data generated by various processing pipelines from the raw data (contents of the raw_data directory). Processed data represents an intermediate step between raw data and the dataset used in the data analysis, but is not itself designed to be analyzed.

behavior_data_path: Path = PosixPath('.'): Stores the path to the directory that contains the non-video and non-brain-activity data extracted from .npz log files by the sl-behavior log processing pipeline.

camera_data_path: Path = PosixPath('.'): Stores the path to the directory that contains video tracking data generated by the Sun lab DeepLabCut-based video processing pipeline(s).

make_directories()

Ensures that all major subdirectories and the root directory exist, creating any missing directories.

This method is called each time the (wrapper) SessionData class is instantiated and allowed to generate missing data directories.

Return type:: None

mesoscope_data_path: Path = PosixPath('.'): Stores path to the directory that contains processed brain activity (cell) data generated by sl-suite2p processing pipelines (single-day and multi-day). This directory is only used by sessions acquired with the Mesoscope-VR system.

processed_data_path: Path = PosixPath('.'): Stores the path to the root processed_data directory of the session. This directory stores the processed session data, generated from raw_data directory contents by various data processing pipelines.

resolve_paths(root_directory_path)

Resolves all paths managed by the class instance based on the input root directory path.

This method is called each time the (wrapper) SessionData class is instantiated to regenerate the managed path hierarchy on any machine that instantiates the class.

Parameters:: root_directory_path (Path) – The path to the top-level directory of the session. Typically, this path is assembled using the following hierarchy: root/project/animal/session_id
Return type:: None

root_path: Path = PosixPath('.'): Stores the path to the root directory of the volume that stores processed data from all Sun lab projects. Primarily, this is necessary for pipelines working with the data on the remote compute server to efficiently move it between storage and working (processing) volumes.

class sl_shared_assets.data_classes.RawData(raw_data_path=PosixPath('.'), camera_data_path=PosixPath('.'), mesoscope_data_path=PosixPath('.'), behavior_data_path=PosixPath('.'), zaber_positions_path=PosixPath('.'), session_descriptor_path=PosixPath('.'), hardware_state_path=PosixPath('.'), surgery_metadata_path=PosixPath('.'), session_data_path=PosixPath('.'), experiment_configuration_path=PosixPath('.'), mesoscope_positions_path=PosixPath('.'), window_screenshot_path=PosixPath('.'), system_configuration_path=PosixPath('.'), checksum_path=PosixPath('.'), telomere_path=PosixPath('.'), ubiquitin_path=PosixPath('.'), nk_path=PosixPath('.'), root_path=PosixPath('.'))

Bases: object

Stores the paths to the directories and files that make up the ‘raw_data’ session-specific directory.

The raw_data directory stores the data acquired during the session data acquisition runtime, before and after preprocessing. Since preprocessing does not irreversibly alter the data, any data in that folder is considered ‘raw,’ event if preprocessing losslessly re-compresses the data for efficient transfer.

Notes

Sun lab data management strategy primarily relies on keeping multiple redundant copies of the raw_data for each acquired session. Typically, one copy is stored on the lab’s processing server and the other is stored on the NAS.

behavior_data_path: Path = PosixPath('.'): Stores the path to the directory that contains all non-video behavior data acquired during the session. Primarily, this includes the .npz log files that store serialized data acquired by all hardware components of the data acquisition system other than cameras and brain activity data acquisition devices (such as the Mesoscope).

camera_data_path: Path = PosixPath('.'): Stores the path to the directory that contains all camera data acquired during the session. Primarily, this includes .mp4 video files from each recorded camera.

checksum_path: Path = PosixPath('.'): Stores the path to the ax_checksum.txt file. This file is generated as part of packaging the data for transmission and stores the xxHash-128 checksum of the data. It is used to verify that the transmission did not damage or otherwise alter the data.

experiment_configuration_path: Path = PosixPath('.'): Stores the path to the experiment_configuration.yaml file. This file contains the snapshot of the experiment runtime configuration used by the session. This file is only created for experiment sessions.

hardware_state_path: Path = PosixPath('.'): Stores the path to the hardware_state.yaml file. This file contains the partial snapshot of the calibration parameters used by the data acquisition system modules during the session. Primarily, it is used during data processing to interpret the raw data stored inside .npz log files.

make_directories()

Ensures that all major subdirectories and the root directory exist, creating any missing directories.

This method is called each time the (wrapper) SessionData class is instantiated and allowed to generate missing data directories.

Return type:: None

mesoscope_data_path: Path = PosixPath('.'): Stores the path to the directory that contains all Mesoscope data acquired during the session. Primarily, this includes the mesoscope-acquired .tiff files (brain activity data) and the MotionEstimator.me file (motion estimation data). This directory is created for all sessions, but is only used (filled) by the sessions that use the Mesoscope-VR system to acquire brain activity data.

mesoscope_positions_path: Path = PosixPath('.'): Stores the path to the mesoscope_positions.yaml file. This file contains the snapshot of the positions used by the Mesoscope at the end of the session. This includes both the physical position of the mesoscope objective and the ‘virtual’ tip, tilt, and fastZ positions set via ScanImage software. This file is only created for sessions that use the Mesoscope-VR system to acquire brain activity data.

nk_path: Path = PosixPath('.'): Stores the path to the nk.bin file. This file is used by the sl-experiment library to mark sessions undergoing runtime initialization. Since runtime initialization is a complex process that may encounter a runtime error, the marker is used to discover sessions that failed to initialize. Since uninitialized sessions by definition do not contain any valuable data, they are marked for immediate deletion from all managed destinations.

raw_data_path: Path = PosixPath('.'): Stores the path to the root raw_data directory of the session. This directory stores all raw data during acquisition and preprocessing. Note, preprocessing does not alter raw data, so at any point in time all data inside the folder is considered ‘raw’.

resolve_paths(root_directory_path)

Resolves all paths managed by the class instance based on the input root directory path.

This method is called each time the (wrapper) SessionData class is instantiated to regenerate the managed path hierarchy on any machine that instantiates the class.

Parameters:: root_directory_path (Path) – The path to the top-level directory of the session. Typically, this path is assembled using the following hierarchy: root/project/animal/session_id
Return type:: None

root_path: Path = PosixPath('.'): Stores the path to the root directory of the volume that stores raw data from all Sun lab projects. Primarily, this is necessary for pipelines working with the data on the remote compute server to efficiently move it between storage and working (processing) volumes.

session_data_path: Path = PosixPath('.'): Stores the path to the session_data.yaml file. This path is used by the SessionData instance to save itself to disk as a .yaml file. In turn, the cached data is reused to reinstate the same data hierarchy across all supported destinations, enabling various libraries to interface with the session data.

session_descriptor_path: Path = PosixPath('.'): Stores the path to the session_descriptor.yaml file. This file is filled jointly by the data acquisition system and the experimenter. It contains session-specific information, such as the specific task parameters and the notes made by the experimenter during runtime. Each supported session type uses a unique SessionDescriptor class to define the format and content of the session_descriptor.yaml file.

surgery_metadata_path: Path = PosixPath('.'): Stores the path to the surgery_metadata.yaml file. This file contains the most actual information about the surgical intervention(s) performed on the animal prior to the session.

system_configuration_path: Path = PosixPath('.'): Stores the path to the system_configuration.yaml file. This file contains the exact snapshot of the data acquisition system configuration parameters used to acquire session data.

telomere_path: Path = PosixPath('.'): Stores the path to the telomere.bin file. This file is statically generated at the end of the session’s data acquisition based on experimenter feedback to mark sessions that ran in-full with no issues. Sessions without a telomere.bin file are considered ‘incomplete’ and are excluded from all automated processing, as they may contain corrupted, incomplete, or otherwise unusable data.

ubiquitin_path: Path = PosixPath('.'): Stores the path to the ubiquitin.bin file. This file is primarily used by the sl-experiment library to mark local session data directories for deletion (purging). Typically, it is created once the data is safely moved to the long-term storage destinations (NAS and Server) and the integrity of the moved data is verified on at least one destination. During ‘sl-purge’ sl-experiment runtimes, the library discovers and removes all session data marked with ‘ubiquitin.bin’ files from the machine that runs the command.

window_screenshot_path: Path = PosixPath('.'): Stores the path to the .png screenshot of the ScanImagePC screen. As a minimum, the screenshot should contain the image of the imaging plane and the red-dot alignment window. This is used to generate a visual snapshot of the cranial window alignment and cell appearance for each experiment session. This file is only created for sessions that use the Mesoscope-VR system to acquire brain activity data.

zaber_positions_path: Path = PosixPath('.'): Stores the path to the zaber_positions.yaml file. This file contains the snapshot of all Zaber motor positions at the end of the session. Zaber motors are used to position the LickPort, HeadBar, and Wheel Mesoscope-VR modules to support proper brain activity recording and behavior during the session. This file is only created for sessions that use the Mesoscope-VR system.

class sl_shared_assets.data_classes.RunTrainingDescriptor(experimenter, mouse_weight_g, final_run_speed_threshold_cm_s, final_run_duration_threshold_s, initial_run_speed_threshold_cm_s, initial_run_duration_threshold_s, increase_threshold_ml, run_speed_increase_step_cm_s, run_duration_increase_step_s, maximum_water_volume_ml, maximum_training_time_m, maximum_unconsumed_rewards=1, maximum_idle_time_s=0.0, dispensed_water_volume_ml=0.0, pause_dispensed_water_volume_ml=0.0, experimenter_given_water_volume_ml=0.0, preferred_session_water_volume_ml=0.0, incomplete=False, experimenter_notes='Replace this with your notes.')

Bases: YamlConfig

Stores the task and outcome information specific to run training sessions that use the Mesoscope-VR system.

dispensed_water_volume_ml: float = 0.0: Stores the total water volume, in milliliters, dispensed during runtime. This excludes the water volume dispensed during the paused (idle) state.

experimenter: str: The ID of the experimenter running the session.

experimenter_given_water_volume_ml: float = 0.0: The additional volume of water, in milliliters, administered by the experimenter to the animal after the session.

experimenter_notes: str = 'Replace this with your notes.': This field is not set during runtime. It is expected that each experimenter will replace this field with their notes made during runtime.

final_run_duration_threshold_s: float: Stores the final running duration threshold, in seconds, that was active at the end of training.

final_run_speed_threshold_cm_s: float: Stores the final running speed threshold, in centimeters per second, that was active at the end of training.

incomplete: bool = False: If this field is set to True, the session is marked as ‘incomplete’ and automatically excluded from all further Sun lab automated processing and analysis.

increase_threshold_ml: float: Stores the volume of water delivered to the animal, in milliliters, that triggers the increase in the running speed and duration thresholds.

initial_run_duration_threshold_s: float: Stores the initial running duration threshold, in seconds, used during training.

initial_run_speed_threshold_cm_s: float: Stores the initial running speed threshold, in centimeters per second, used during training.

maximum_idle_time_s: float = 0.0: Stores the maximum time, in seconds, the animal can dip below the running speed threshold to still receive the reward. This allows animals that ‘run’ by taking a series of large steps, briefly dipping below speed threshold at the end of each step, to still get water rewards.

maximum_training_time_m: int: Stores the maximum time, in minutes, the system is allowed to run the training for.

maximum_unconsumed_rewards: int = 1: Stores the maximum number of consecutive rewards that can be delivered without the animal consuming them. If the animal receives this many rewards without licking (consuming) them, reward delivery is paused until the animal consumes the rewards.

maximum_water_volume_ml: float: Stores the maximum volume of water the system is allowed to dispense during training.

mouse_weight_g: float: The weight of the animal, in grams, at the beginning of the session.

pause_dispensed_water_volume_ml: float = 0.0: Stores the total water volume, in milliliters, dispensed during the paused (idle) state.

preferred_session_water_volume_ml: float = 0.0: The volume of water, in milliliters, the animal should be receiving during the session runtime if its performance matches experimenter-specified threshold.

run_duration_increase_step_s: float: Stores the value, in seconds, used by the system to increment the duration threshold each time the animal receives ‘increase_threshold’ volume of water.

run_speed_increase_step_cm_s: float: Stores the value, in centimeters per second, used by the system to increment the running speed threshold each time the animal receives ‘increase_threshold’ volume of water.

class sl_shared_assets.data_classes.SessionData(project_name, animal_id, session_name, session_type, acquisition_system=AcquisitionSystems.MESOSCOPE_VR, experiment_name=None, python_version='3.11.13', sl_experiment_version='3.0.0', raw_data=<factory>, processed_data=<factory>, source_data=<factory>, archived_data=<factory>, tracking_data=<factory>)

Bases: YamlConfig

Stores and manages the data layout of a single Sun lab data acquisition session.

The primary purpose of this class is to maintain the session data structure across all supported destinations and to provide a unified data access interface shared by all Sun lab libraries. It is specifically designed for working with the data from a single session, performed by a single animal under the specific project. The class is used to manage both raw and processed data: it follows the data through acquisition, preprocessing, and processing stages of the Sun lab data workflow. This class serves as an entry point for all interactions with the managed session’s data.

Notes

The class is not designed to be instantiated directly. Instead, use the create() method to generate a new session or load() method to access the data of an already existing session.

When the class is used to create a new session, it generates the new session’s name using the current UTC timestamp, accurate to microseconds. This ensures that each session ‘name’ is unique and preserves the overall session order.

acquisition_system: str | AcquisitionSystems = 'mesoscope-vr': Stores the name of the data acquisition system that acquired the data. Has to be set to one of the supported acquisition systems, defined in the AcquisitionSystems enumeration exposed by the sl-shared-assets library.

animal_id: str: Stores the unique identifier of the animal that participates in the session.

archived_data: ProcessedData: Similar to the ‘source_data’ field, stores the absolute path to the same data as the ‘processed_data’ field, but with all paths resolved relative to the ‘raw_data’ root. This path is used as part of the session data archiving process to collect all session data (raw and processed) on the slow ‘storage’ volume of the remote compute server.

classmethod create(project_name, animal_id, session_type, python_version, sl_experiment_version, experiment_name=None, session_name=None)

Creates a new SessionData object and generates the new session’s data structure on the local PC.

This method is intended to be called exclusively by the sl-experiment library to create new training or experiment sessions and generate the session data directory tree.

Notes

To load an already existing session data structure, use the load() method instead.

This method automatically dumps the data of the created SessionData instance into the session_data.yaml file inside the root ‘raw_data’ directory of the created hierarchy. It also finds and dumps other configuration files, such as experiment_configuration.yaml and system_configuration.yaml into the same ‘raw_data’ directory. If the session’s runtime is interrupted unexpectedly, the acquired data can still be processed using these pre-saved class instances.

Parameters:

project_name (str) – The name of the project for which the session is carried out.
animal_id (str) – The ID code of the animal participating in the session.
session_type (SessionTypes | str) – The type of the session. Has to be one of the supported session types exposed by the SessionTypes enumeration.
python_version (str) – The string that specifies the Python version used to collect session data. Has to be specified using the major.minor.patch version format.
sl_experiment_version (str) – The string that specifies the version of the sl-experiment library used to collect session data. Has to be specified using the major.minor.patch version format.
experiment_name (str | None, default: None) – The name of the experiment executed during the session. This optional argument is only used for experiment sessions. Note! The name passed to this argument has to match the name of the experiment configuration .yaml file.
session_name (str | None, default: None) – An optional session name override. Generally, this argument should not be provided for most sessions. When provided, the method uses this name instead of generating a new timestamp-based name. This is only used during the ‘ascension’ runtime to convert old data structures to the modern lab standards.

Return type:

SessionData

Returns:

An initialized SessionData instance that stores the layout of the newly created session’s data.

experiment_name: str | None = None: Stores the name of the experiment performed during the session. If the session_type field indicates that the session is an experiment, this field communicates the specific experiment configuration used by the session. During runtime, this name is used to load the specific experiment configuration data stored in a .yaml file with the same name. If the session is not an experiment session, this field should be left as Null (None).

classmethod load(session_path, processed_data_root=None)

Loads the SessionData instance from the target session’s session_data.yaml file.

This method is used to load the data layout information of an already existing session. Primarily, this is used when processing session data. Due to how SessionData is stored and used in the lab, this method always loads the data layout from the session_data.yaml file stored inside the ‘raw_data’ session subfolder. Currently, all interactions with Sun lab data require access to the ‘raw_data’ folder of each session.

Notes

To create a new session, use the create() method instead.

Parameters:

session_path (Path) – The path to the root directory of an existing session, e.g.: root/project/animal/session.
processed_data_root (Path | None, default: None) – If processed data is kept on a drive different from the one that stores raw data, provide the path to the root project directory (directory that stores all Sun lab projects) on that drive. The method will automatically resolve the project/animal/session/processed_data hierarchy using this root path. If raw and processed data are kept on the same drive, keep this set to None.

Return type:

SessionData

Returns:

An initialized SessionData instance for the session whose data is stored at the provided path.

Raises:

FileNotFoundError – If multiple or no ‘session_data.yaml’ file instances are found under the input session path directory.

processed_data: ProcessedData: Stores absolute paths to all directories and files that jointly make the session’s processed data hierarchy. Processed data encompasses all data generated from the raw data as part of data processing.

project_name: str: Stores the name of the project for which the session was acquired.

python_version: str = '3.11.13': Stores the Python version that was used to acquire session data.

raw_data: RawData: Stores absolute paths to all directories and files that jointly make the session’s raw data hierarchy. This hierarchy is initially resolved by the acquisition system that acquires the session and used to store all data acquired during the session runtime.

runtime_initialized()

Ensures that the ‘nk.bin’ marker file is removed from the session’s raw_data folder.

The ‘nk.bin’ marker is generated as part of the SessionData initialization (creation) process to mark sessions that did not fully initialize during runtime. This service method is designed to be called by the sl-experiment library classes to remove the ‘nk.bin’ marker when it is safe to do so. It should not be called by end-users.

Return type:: None

save()

Saves the instance data to the ‘raw_data’ directory of the managed session as a ‘session_data.yaml’ file.

This is used to save the data stored in the instance to disk so that it can be reused during further stages of data processing. The method is intended to only be used by the SessionData instance itself during its create() method runtime.

Return type:: None

session_name: str: Stores the name (timestamp-based ID) of the session.

session_type: str | SessionTypes: Stores the type of the session. Has to be set to one of the supported session types, defined in the SessionTypes enumeration exposed by the sl-shared-assets library.

sl_experiment_version: str = '3.0.0': Stores the version of the sl-experiment library that was used to acquire the session data.

source_data: RawData: Stores absolute paths to the same data as the ‘raw_data’ field, but with all paths resolved relative to the ‘processed_data’ root. On systems that use the same root for processed and raw data, the source and raw directories are identical. On systems that use different root directories for processed and raw data, the source and raw directories are different. This is used to optimize data processing on the remote compute server by temporarily copying all session data to the fast processed data volume.

tracking_data: TrackingData: Stores absolute paths to all directories and files that jointly make the session’s tracking data hierarchy. This hierarchy is used during all stages of data processing to track the processing progress and ensure only a single manager process can modify the session’s data at any given time, ensuring access safety.

class sl_shared_assets.data_classes.SessionLock(file_path, _manager_id=-1)

Bases: YamlConfig

Provides thread-safe session locking to ensure exclusive access during data processing.

This class manages a lock file that tracks which manager process currently has exclusive access to a data acquisition session’s data. It prevents race conditions when multiple manager processes attempt to modify session data simultaneously. Primarily, this class is used on remote compute server(s).

Notes

The lock owner is identified by a manager process ID, allowing distributed processing across multiple jobs while maintaining data integrity.

acquire(manager_id)

Acquires the session access lock.

Parameters:

manager_id (int) – The unique identifier of the manager process requesting the lock.

Raises:

TimeoutError – If the .lock file cannot be acquired for a long period of time due to being held by another process.
RuntimeError – If the lock is held by another process and forcing lock acquisition is disabled.

Return type:

None

check_owner(manager_id)

Ensures that the managed session is locked for processing by the specified manager process.

This method is used by worker functions to ensure it is safe to interact with the session’s data. It is designed to abort the runtime with an error if the session’s lock file is owned by a different manager process.

Parameters:

manager_id (int) – The unique identifier of the manager process attempting to access the session’s data.

Raises:

TimeoutError – If the .lock file cannot be acquired for a long period of time due to being held by another process.
ValueError – If the lock file is held by a different manager process.

Return type:

None

file_path: Path: Stores the absolute path to the .yaml file that stores the lock state on disk.

force_release()

Forcibly releases the session access lock regardless of ownership.

This method should only be used for emergency recovery from improper processing shutdowns. It can be called by any process to unlock any session, but it does not attempt to terminate the processes that the lock’s owner might have deployed to work with the session’s data.

Raises:: TimeoutError – If the .lock file cannot be acquired for a long period of time due to being held by another process.
Return type:: None

release(manager_id)

Releases the session access lock.

Parameters:

manager_id (int) – The unique identifier of the manager process releasing the lock.

Raises:

TimeoutError – If the .lock file cannot be acquired for a long period of time due to being held by another process.
RuntimeError – If the lock is held by another process.

Return type:

None

class sl_shared_assets.data_classes.SessionTypes(*values)

Bases: StrEnum

Stores the data acquisition session types supported by all data acquisition systems used in the Sun lab.

A data acquisition session is typically carried out to acquire experiment data, train the animal for the upcoming experiment sessions, or to assess the quality of surgical or other pre-experiment interventions. After acquisition, the session is treated as a uniform package whose components can be accessed via the SessionData class.

Notes

Different acquisition systems support different session types and may not be suited for acquiring some of the session types listed in this enumeration.

LICK_TRAINING = 'lick training': A Mesoscope-VR session designed to teach animals to use the water delivery port while being head-fixed.

MESOSCOPE_EXPERIMENT = 'mesoscope experiment': A Mesoscope-VR experiment session. The session uses Unity game engine to run virtual reality tasks and collects brain activity data using 2-Photon Random Access Mesoscope (2P-RAM).

RUN_TRAINING = 'run training': A Mesoscope-VR session designed to teach animals to run on the treadmill while being head-fixed.

WINDOW_CHECKING = 'window checking': A Mesoscope-VR session designed to evaluate the quality of the cranial window implantation procedure and the suitability of the animal for being imaged with the Mesoscope. The session uses the Mesoscope to assess the quality of the cell activity data.

class sl_shared_assets.data_classes.SubjectData(id, ear_punch, sex, genotype, date_of_birth_us, weight_g, cage, location_housed, status)

Bases: object

Stores information about the subject of the surgical intervention (animal).

cage: int: Stores the unique identifier (number) for the cage used to house the subject after surgery.

date_of_birth_us: int: Stores the date of birth of the subject as the number of microseconds elapsed since UTC epoch onset.

ear_punch: str: Stores the location and the number of ear-tags used to distinguish teh animal from its cage-mates.

genotype: str: Stores the genotype of the subject.

id: int: Stores the unique ID (name) of the subject. Assumes all animals are given a numeric ID, rather than a string name.

location_housed: str: Stores the location (room) used to house the subject.

sex: str: Stores the gender of the subject.

status: str: Stores the current status of the subject (alive / deceased).

weight_g: float: Stores the pre-surgery weight of the subject, in grams.

class sl_shared_assets.data_classes.SurgeryData(subject, procedure, drugs, implants, injections)

Bases: YamlConfig

Stores the data about the surgical intervention performed on an animal before data acquisition session(s).

Primarily, this class is used to ensure that each data acquisition session contains a copy of the surgical intervention data as a .yaml file. In turn, this improves the experimenter’s experience during data analysis by allowing quickly referencing the surgical intervention data.

drugs: DrugData: Stores information about the medical substances administered to the subject before, during, and immediately after the surgical intervention.

implants: list[ImplantData]: Stores information about cranial and transcranial implants introduced to the subject as part of the surgical intervention.

injections: list[InjectionData]: Stores information about substances infused into the brain of the subject as part the surgical intervention.

procedure: ProcedureData: Stores general information about the surgical intervention.

subject: SubjectData: Stores information about the subject (mouse).

class sl_shared_assets.data_classes.TrackingData(tracking_data_path=PosixPath('.'), session_lock_path=PosixPath('.'))

Bases: object

Stores the paths to the directories and files that make up the ‘tracking_data’ session-specific directory.

This directory was added in version 5.0.0 to store the ProcessingTracker files and .lock files for pipelines and tasks used to work with session’s data after acquisition.

make_directories()

Ensures that all major subdirectories and the root directory exist, creating any missing directories.

This method is called each time the (wrapper) SessionData class is instantiated and allowed to generate missing data directories.

Return type:: None

resolve_paths(root_directory_path)

Resolves all paths managed by the class instance based on the input root directory path.

This method is called each time the (wrapper) SessionData class is instantiated to regenerate the managed path hierarchy on any machine that instantiates the class.

Parameters:: root_directory_path (Path) – The path to the top-level directory of the session. Typically, this path is assembled using the following hierarchy: root/project/animal/session_id
Return type:: None

session_lock_path: Path = PosixPath('.'): Stores the path to the session’s session_lock.yaml file. This file is used to ensure that only a single manager process has exclusive access to the session’s data on the remote compute server. This allows multiple data processing pipelines to safely run for the same session without compromising session data integrity. This file is intended to be used through the SessionLock class.

tracking_data_path: Path = PosixPath('.'): Stores the path to the root tracking_data directory of the session. This directory stores the .yaml ProcessingTracker files and the .lock FileLock files that jointly ensure that session’s data is accessed in a thread-safe way while being processed by multiple different processes and pipelines.

class sl_shared_assets.data_classes.WindowCheckingDescriptor(experimenter, surgery_quality=0, incomplete=True, experimenter_notes='Replace this with your notes.')

Bases: YamlConfig

Stores the outcome information specific to window checking sessions that use the Mesoscope-VR system.

Notes

Window Checking sessions are different from all other sessions. Unlike other sessions, their purpose is not to generate data but rather to assess the suitability of the particular animal to be included in training and experiment cohorts. These sessions are automatically excluded from any automated data processing and analysis.

experimenter: str: The ID of the experimenter running the session.

experimenter_notes: str = 'Replace this with your notes.': The notes on the quality of the cranial window and animal’s suitability for the target project.

incomplete: bool = True: Window checking sessions are always considered ‘incomplete’, as they do not contain the full range of information collected as part of a ‘standard’ behavior training or experiment session.

surgery_quality: int = 0: The quality of the cranial window and surgical intervention on a scale from 0 (non-usable) to 3 (high-tier publication grade) inclusive.

class sl_shared_assets.data_classes.ZaberPositions(headbar_z=0, headbar_pitch=0, headbar_roll=0, lickport_z=0, lickport_y=0, lickport_x=0, wheel_x=0)

Bases: YamlConfig

Stores Zaber motor positions reused between experiment sessions that use the Mesoscope-VR system.

The class is specifically designed to store, save, and load the positions of the LickPort, HeadBar, and Wheel motors (axes). It is used to both store Zaber motor positions for each session for future analysis and to restore the Zaber motors to the same positions across consecutive runtimes for the same project and animal combination.

Notes

By default, the class initializes all fields to 0, which is the position of the home sensor for each motor. The class assumes that the motor groups are assembled and arranged in a way that ensures all motors can safely move to the home sensor positions from any runtime configuration.

headbar_pitch: int = 0: The absolute position, in native motor units, of the HeadBar pitch-axis motor.

headbar_roll: int = 0: The absolute position, in native motor units, of the HeadBar roll-axis motor.

headbar_z: int = 0: The absolute position, in native motor units, of the HeadBar z-axis motor.

lickport_x: int = 0: The absolute position, in native motor units, of the LickPort x-axis motor.

lickport_y: int = 0: The absolute position, in native motor units, of the LickPort y-axis motor.

lickport_z: int = 0: The absolute position, in native motor units, of the LickPort z-axis motor.

wheel_x: int = 0: The absolute position, in native motor units, of the running wheel platform x-axis motor.

sl_shared_assets.data_classes.create_system_configuration_file(system)

Creates the .yaml configuration file for the requested Sun lab data acquisition system and configures the local machine (PC) to use this file for all future acquisition-system-related calls.

This function is used to initially configure or override the existing configuration of any data acquisition system used in the lab.

Notes

This function creates the configuration file inside the shared Sun lab working directory on the local machine. It assumes that the user has configured (created) the directory before calling this function.

A data acquisition system can consist of multiple machines (PCs). The configuration file is typically only present on the ‘main’ machine that manages all runtimes.

Parameters:: system (AcquisitionSystems | str) – The name (type) of the data acquisition system for which to create the configuration file. Must be one of the following supported options: mesoscope-vr.
Raises:: ValueError – If the input acquisition system name (type) is not recognized.
Return type:: None

sl_shared_assets.data_classes.get_credentials_file_path(service=False)

Resolves and returns the path to the requested .yaml file that stores access credentials for the Sun lab remote compute server.

Depending on the configuration, either returns the path to the ‘user_credentials.yaml’ file (default) or the ‘service_credentials.yaml’ file.

Notes

Assumes that the local working directory has been configured before calling this function.

Parameters:

service (bool, default: False) – Determines whether this function must evaluate and return the path to the ‘service_credentials.yaml’ file (if true) or the ‘user_credentials.yaml’ file (if false).

Raises:

FileNotFoundError – If either the ‘service_credentials.yaml’ or the ‘user_credentials.yaml’ files do not exist in the local Sun lab working directory.
ValueError – If both credential files exist, but the requested credentials file is not configured.

Return type:

Path

sl_shared_assets.data_classes.get_system_configuration_data()

Resolves the path to the local data acquisition system configuration file and loads the configuration data as a SystemConfiguration instance.

This service function is used by all Sun lab data acquisition runtimes to load the system configuration data from the locally stored configuration file. It supports resolving and returning the data for all data acquisition systems currently used in the lab.

Return type:: MesoscopeSystemConfiguration
Returns:: The initialized SystemConfiguration class instance for the local data acquisition system that stores the loaded configuration parameters.
Raises:: FileNotFoundError – If the local machine does not have a valid data acquisition system configuration file.

sl_shared_assets.data_classes.get_working_directory()

Resolves and returns the path to the local Sun lab working directory.

This service function is primarily used when working with Sun lab data stored on remote compute server(s) to establish local working directories for various jobs and pipelines.

Return type:: Path
Returns:: The path to the local working directory.
Raises:: FileNotFoundError – If the local machine does not have the Sun lab data directory, or the local working directory does not exist (has not been configured).

sl_shared_assets.data_classes.set_working_directory(path)

Sets the specified directory as the Sun lab working directory for the local machine (PC).

This function is used as the first step for configuring any machine to work with the data stored on the remote compute server(s). All lab libraries use this directory for caching configuration data and runtime working (intermediate) data.

Notes

The path to the working directory is stored inside the user’s data directory so that all Sun lab libraries can automatically access and use the same working directory.

If the input path does not point to an existing directory, the function will automatically generate the requested directory.

After setting up the working directory, the user should use other commands from the ‘sl-configure’ CLI to generate the remote compute server access credentials and / or acquisition system configuration files.

Parameters:: path (Path) – The path to the directory to set as the local Sun lab working directory.
Return type:: None

Server

This package provides the classes and methods used by all Sun lab libraries to work with the data stored on remote compute servers.

class sl_shared_assets.server.Job(job_name, output_log, error_log, working_directory, conda_environment, cpus_to_use=10, ram_gb=10, time_limit=60)

Bases: object

Aggregates the data of a single SLURM-managed job to be executed on the Sun lab’s remote compute server.

This class provides the API for constructing any server-side job in the Sun lab. Internally, it wraps an instance of a Slurm class to package the job data into the format expected by the SLURM job manager. All jobs managed by this class instance should be submitted to an initialized Server instance’s submit_job() method to be executed on the server.

Notes

The initialization method of the class contains the arguments for configuring the SLURM and Conda environments used by the job. Do not submit additional SLURM or Conda commands via the ‘add_command’ method, as this may produce unexpected behavior.

Each job can be conceptualized as a sequence of shell instructions to execute on the remote compute server. For the lab, that means that the bulk of the command consists of calling various CLIs exposed by data processing or analysis pipelines, installed in the calling user’s Conda environments on the server. The Job instance also contains commands for activating the target conda environment and, in some cases, doing other preparatory or cleanup work. The source code of a ‘remote’ job is typically identical to what a human operator would type in a ‘local’ terminal to run the same job on their PC.

A key feature of server-side jobs is that they are executed on virtual machines managed by SLURM. Since the server has a lot more compute and memory resources than likely needed by individual jobs, each job typically requests a subset of these resources. Upon being executed, SLURM creates an isolated environment with the requested resources and runs the job in that environment.

Parameters:

job_name (str) – The descriptive name of the SLURM job to be created. Primarily, this name is used in terminal printouts to identify the job to human operators.
output_log (Path) – The absolute path to the .txt file on the processing server, where to store the standard output data of the job.
error_log (Path) – The absolute path to the .txt file on the processing server, where to store the standard error data of the job.
working_directory (Path) – The absolute path to the directory where temporary job files will be stored. During runtime, classes from this library use that directory to store files such as the job’s shell script. All such files are automatically removed from the directory at the end of a non-errors runtime.
conda_environment (str) – The name of the conda environment to activate on the server before running the job logic. The environment should contain the necessary Python packages and CLIs to support running the job’s logic.
cpus_to_use (int, default: 10) – The number of CPUs to use for the job.
ram_gb (int, default: 10) – The amount of RAM to allocate for the job, in Gigabytes.
time_limit (int, default: 60) – The maximum time limit for the job, in minutes. If the job is still running at the end of this time period, it will be forcibly terminated. It is highly advised to always set adequate maximum runtime limits to prevent jobs from hogging the server in case of runtime or algorithm errors.

remote_script_path: Stores the path to the script file relative to the root of the remote server that runs the command.

job_id: Stores the unique job identifier assigned by the SLURM manager to this job when it is accepted for execution. This field is initialized to None and is overwritten by the Server class that submits the job.

job_name: Stores the descriptive name of the SLURM job.

_command: Stores the managed SLURM command object.

add_command(command)

Adds the input command string to the end of the managed SLURM job command list.

This method is a wrapper around simple-slurm’s add_cmd() method. It is used to iteratively build the shell command sequence for the managed job.

Parameters:: command (str) – The command string to add to the command list, e.g.: ‘python main.py –input 1’.
Return type:: None

property command_script: str

Translates the managed job data into a shell-script-writable string and returns it to caller.

This method is used by the Server class to translate the job into the format that can be submitted to and executed on the remote compute server. The returned string is safe to dump into a .sh (shell script) file and move to the remote compute server for execution.

class sl_shared_assets.server.JupyterJob(job_name, output_log, error_log, working_directory, conda_environment, notebook_directory, port=9999, cpus_to_use=2, ram_gb=32, time_limit=120, jupyter_args='')

Bases: Job

Aggregates the data of a specialized job used to launch a Jupyter notebook server under SLURM’s control.

This class extends the base Job class to include specific configuration and commands for starting a Jupyter notebook server in a SLURM environment. Using this specialized job allows users to set up remote Jupyter servers while benefitting from SLURM’s job scheduling and resource management policies.

Notes

Jupyter servers directly compete for resources with headless data processing jobs. Therefore, it is important to minimize the resource footprint and the runtime of each Jupyter server, if possible.

Parameters:

job_name (str) – The descriptive name of the Jupyter SLURM job to be created. Primarily, this name is used in terminal printouts to identify the job to human operators.
output_log (Path) – The absolute path to the .txt file on the processing server, where to store the standard output data of the job.
error_log (Path) – The absolute path to the .txt file on the processing server, where to store the standard error data of the job.
working_directory (Path) – The absolute path to the directory where to store temporary job files.
conda_environment (str) – The name of the conda environment to activate on the server before running the job. The environment should contain the necessary Python packages and CLIs to support running the job’s logic. For Jupyter jobs, this necessarily includes the Jupyter notebook and jupyterlab packages.
port (int, default: 9999) – The connection port to use for the Jupyter server.
notebook_directory (Path) – The root directory where to run the Jupyter notebook. During runtime, the notebook will only have access to items stored under this directory. For most runtimes, this should be set to the user’s root working directory.
cpus_to_use (int, default: 2) – The number of CPUs to allocate to the Jupyter server.
ram_gb (int, default: 32) – The amount of RAM, in GB, to allocate to the Jupyter server.
time_limit (int, default: 120) – The maximum Jupyter server uptime, in minutes.
jupyter_args (str, default: '') – Stores additional arguments to pass to the jupyter notebook initialization command.

port: Stores the connection port for the managed Jupyter server.

notebook_dir: Stores the absolute path to the directory used to run the Jupyter notebook, relative to the remote server root.

connection_info: Stores the JupyterConnectionInfo instance after the Jupyter server is instantiated.

host: Stores the hostname of the remote server.

user: Stores the username used to connect with the remote server.

connection_info_file: Stores the absolute path to the file that contains the connection information for the initialized Jupyter session, relative to the remote server root.

_command: Stores the shell command for launching the Jupyter server.

parse_connection_info(info_file)

Parses the connection information file created by the Jupyter job on the remote server.

This method is used to finalize the remote Jupyter session initialization by parsing the connection session instructions from the temporary storage file created by the remote Job running on the server. After this method’s runtime, the print_connection_info() method can be used to print the connection information to the terminal.

Parameters:: info_file (Path) – The path to the .txt file generated by the remote server that stores the Jupyter connection information to be parsed.
Return type:: None

print_connection_info()

Constructs and displays the command to set up the SSH tunnel to the server and the link to the localhost server view in the terminal.

The SSH command should be used via a separate terminal or subprocess call to establish the secure SSH tunnel to the Jupyter server. Once the SSH tunnel is established, the printed localhost url can be used to view the server from the local machine’s browser.

Return type:: None

class sl_shared_assets.server.ProcessingPipeline(pipeline_type, server, manager_id, jobs, remote_tracker_path, local_tracker_path, session, animal, project, keep_job_logs=False, pipeline_status=ProcessingStatus.RUNNING, _pipeline_stage=0)

Bases: object

Provides an interface to construct and execute data processing pipelines on the target remote compute server.

This class functions as an interface for all data processing pipelines running on Sun lab compute servers. It is pipeline-type-agnostic and works for all data processing pipelines used in the lab. After instantiation, the class automatically handles all interactions with the server necessary to run the remote processing pipeline and verify the runtime outcome via the runtime_cycle() method that has to be called cyclically until the pipeline is complete.

Notes

Each pipeline is executed as a series of one or more stages with each stage using one or more parallel jobs. Therefore, each pipeline can be seen as an execution graph that sequentially submits batches of jobs to the remote server. The processing graph for each pipeline is fully resolved at the instantiation of this class, so each instance contains the necessary data to run the entire processing pipeline.

The minimum self-contained unit of the processing pipeline is a single job. Since jobs can depend on the output of other jobs, they are organized into stages based on the dependency graph between jobs. Combined with cluster management software, such as SLURM, this class can efficiently execute processing pipelines on scalable compute clusters.

animal: str: Stores the ID of the animal whose data is being processed by the tracked pipeline.

property is_running: bool: Returns True if the pipeline is currently running, False otherwise.

jobs: dict[int, tuple[tuple[Job, Path], ...]]: Stores the dictionary that maps the pipeline processing stage integer-codes to two-element tuples. Each tuple stores the Job object and the path to its remote working directory to be submitted to the server as part of that executing that stage.

keep_job_logs: bool = False: Determines whether to keep the logs for the jobs making up the pipeline execution graph or (default) to remove them after pipeline successfully ends its runtime. If the pipeline fails to complete its runtime, the logs are kept regardless of this setting.

local_tracker_path: Path: Stores the path to the pipeline’s processing tracker .yaml file on the local machine. The remote file is pulled to this location when the instance verifies the outcome of the tracked processing pipeline.

manager_id: int: The unique identifier for the manager process that constructs and manages the runtime of the tracked pipeline.

pipeline_status: ProcessingStatus | int = 0: Stores the current status of the tracked remote pipeline. This field is updated each time runtime_cycle() instance method is called.

pipeline_type: ProcessingPipelines: Stores the name of the processing pipeline managed by this instance. Primarily, this is used to identify the pipeline to the user in terminal messages and logs.

project: str: Stores the name of the project whose data is being processed by the tracked pipeline.

remote_tracker_path: Path: Stores the path to the pipeline’s processing tracker .yaml file stored on the remote compute server.

runtime_cycle()

Checks the current status of the tracked pipeline and, if necessary, submits additional batches of jobs to the remote server to progress the pipeline.

This method is the main entry point for all interactions with the processing pipeline managed by this instance. It checks the current state of the pipeline, advances the pipeline’s processing stage, and submits the necessary jobs to the remote server. The runtime manager process should call this method repeatedly (cyclically) to run the pipeline until the ‘is_running’ property of the instance returns True.

Return type:: None

Notes

While the ‘is_running’ property can be used to determine whether the pipeline is still running, to resolve the final status of the pipeline (success or failure), the manager process should access the ‘status’ instance property.

server: Server: Store the reference to the Server object used to interface with the remote server running the pipeline.

session: str: Stores the ID of the session whose data is being processed by the tracked pipeline.

property status: ProcessingStatus: Returns the current status of the pipeline packaged into a ProcessingStatus instance.

class sl_shared_assets.server.ProcessingPipelines(*values)

Bases: StrEnum

Stores the names of the data processing pipelines currently used in the lab.

Notes

The elements in this enumeration match the elements in the TrackerFileNames enumeration, since each valid ProcessingPipeline instance has an associated ProcessingTracker file instance.

The order of pipelines in this enumeration loosely follows the sequence in which they are executed during the Sun lab data workflow.

ARCHIVING = 'data archiving': Data archiving pipeline. To conserve the (limited) space on the remote compute server’s fast working volume, once the data has been processed and integrated into a stable dataset, the processed data folder is moved to the storage volume. After the data is moved, all folders under the root session folder on the processed data volume are deleted to free up the processing volume space.

BEHAVIOR = 'behavior processing': Behavior processing pipeline. This pipeline is used to process .npz log files to extract animal behavior data acquired during a single session (day).

CHECKSUM = 'checksum resolution': Checksum resolution pipeline. Primarily, it is used to verify that the raw data has been transferred to the remote storage server from the main acquisition system PC intact. This pipeline is also used to regenerate (re-checksum) the data stored on the remote compute server.

FORGING = 'dataset forging': Dataset creation (forging) pipeline. This pipeline typically runs after the multi-day pipeline. It extracts and integrates the processed data from all sources into a unified dataset.

MANIFEST = 'manifest generation': Project manifest generation pipeline. This pipeline is generally not used in most runtime contexts. It allows manually regenerating the project manifest .feather file, which is typically only used during testing. All other pipeline automatically conduct the manifest (re)generation at the end of their runtime.

MULTIDAY = 'multi-day suite2p processing': Multi-day suite2p processing (cell tracking) pipeline. This pipeline is used to track cells processed with the single-day suite2p pipelines across multiple days.

PREPARATION = 'processing preparation': Data processing preparation pipeline. Since the compute server uses a two-volume design with a slow (HDD) storage volume and a fast (NVME) working volume, to optimize data processing performance, the data needs to be transferred to the working volume before processing. This pipeline copies the raw data for the target session from the storage volume to the working volume.

SUITE2P = 'single-day suite2p processing': Single-day suite2p pipeline. This pipeline is used to extract the cell activity data from 2-photon imaging data acquired during a single session (day).

VIDEO = 'video processing': DeepLabCut (Video) processing pipeline. This pipeline is used to extract animal pose estimation data from the behavior video frames acquired during a single session (day).

class sl_shared_assets.server.ProcessingStatus(*values)

Bases: IntEnum

Maps integer-based processing pipeline status (state) codes to human-readable names.

The codes from this enumeration are used by the ProcessingPipeline class to communicate the status of the managed pipelines to manager processes that oversee the execution of each pipeline.

Notes

The status codes from this enumeration track the state of the pipeline as a whole, instead of tracking the state of each job that comprises the pipeline.

ABORTED = 3: The pipeline execution has been aborted prematurely, either by the manager process or due to an overriding request from another user.

FAILED = 2: The server has failed to complete the pipeline due to a runtime error.

RUNNING = 0: The pipeline is currently running on the remote server. It may be executed (in progress) or waiting for the required resources to become available (queued).

SUCCEEDED = 1: The server has successfully completed the processing pipeline.

class sl_shared_assets.server.ProcessingTracker(file_path, _complete=False, _encountered_error=False, _running=False, _manager_id=-1, _job_count=1, _completed_jobs=0)

Bases: YamlConfig

Wraps the .yaml file that tracks the state of a data processing pipeline and provides tools for communicating this state between multiple processes in a thread-safe manner.

This class is used by all data processing pipelines running on the remote compute server(s) to prevent race conditions. It is also used to evaluate the status (success / failure) of each pipeline as they are executed by the remote server.

Note

This instance frequently refers to the ‘manager process’ in method documentation. A ‘manager process’ is the highest-level process that manages the tracked pipeline. When a pipeline runs on remote compute servers, the manager process is typically the process running on the non-server machine (user PC) that submits the remote processing jobs to the compute server. The worker process(es) that run the processing job(s) on the remote compute servers are not considered manager processes.

The processing trackers work similar to ‘lock’ files. When a pipeline starts running on the remote server, its tracker is switched into the ‘running’ (locked) state until the pipeline completes, aborts, or encounters an error. When the tracker is locked, all modifications to the tracker have to originate from the same manager process that started the pipeline. This feature supports running complex processing pipelines that use multiple concurrent and / or sequential processing jobs on the remote server.

abort()

Resets the pipeline tracker file to the default state.

This method can be used to reset the pipeline tracker file, regardless of the current pipeline state. Unlike other instance methods, this method can be called from any manager process, even if the pipeline is already locked by another process. This method is only intended to be used in the case of emergency to unlock a deadlocked pipeline.

Return type:: None

property encountered_error: bool: Returns True if the tracker wrapped by the instance indicates that the processing pipeline has aborted due to encountering an error.

error(manager_id)

Configures the tracker file to indicate that the tracked processing pipeline encountered an error and failed to complete.

This method unlocks the pipeline, allowing other manager processes to interface with the tracked pipeline. It also updates the tracker file to reflect that the pipeline was interrupted due to an error, which is used by the manager processes to detect and handle processing failures.

Parameters:: manager_id (int) – The unique identifier of the manager process which attempts to report that the pipeline tracked by this tracker file has encountered an error.
Raises:: TimeoutError – If the .lock file for the target .YAML file cannot be acquired within the timeout period.
Return type:: None

file_path: Path: Stores the path to the .yaml file used to cache the tracker data on disk. The class instance functions as a wrapper around the data stored inside the specified .yaml file.

property is_complete: bool: Returns True if the tracker wrapped by the instance indicates that the processing pipeline has been completed successfully and that the pipeline is not currently ongoing.

property is_running: bool: Returns True if the tracker wrapped by the instance indicates that the processing pipeline is currently ongoing.

start(manager_id, job_count=1)

Configures the tracker file to indicate that a manager process is currently executing the tracked processing pipeline.

Calling this method locks the tracked session and processing pipeline combination to only be accessible from the manager process that calls this method. Calling this method for an already running pipeline managed by the same process does not have any effect, so it is safe to call this method at the beginning of each processing job that makes up the pipeline.

Parameters:

manager_id (int) – The unique identifier of the manager process which attempts to start the pipeline tracked by this tracker file.
job_count (int, default: 1) – The total number of jobs to be executed as part of the tracked pipeline.

Raises:

TimeoutError – If the .lock file for the target .YAML file cannot be acquired within the timeout period.

Return type:

None

stop(manager_id)

Configures the tracker file to indicate that the tracked processing pipeline has been completed successfully.

This method unlocks the pipeline, allowing other manager processes to interface with the tracked pipeline. It also configures the tracker file to indicate that the pipeline has been completed successfully, which is used by the manager processes to detect and handle processing completion.

Notes

This method tracks how many jobs executed as part of the tracked pipeline have been completed and only marks the pipeline as complete if all it’s processing jobs have been completed.

Parameters:: manager_id (int) – The unique identifier of the manager process which attempts to report that the pipeline tracked by this tracker file has been completed successfully.
Raises:: TimeoutError – If the .lock file for the target .YAML file cannot be acquired within the timeout period.
Return type:: None

class sl_shared_assets.server.Server(credentials_path)

Bases: object

Establishes and maintains a bidirectional interface that allows working with a remote compute server.

This class provides the API that allows accessing the remote processing server. Primarily, the class is used to submit SLURM-managed jobs to the server and monitor their execution status. It functions as the central interface used by many data workflow pipelines in the lab to execute costly data processing on the server.

Notes

This class assumes that the target server has SLURM job manager installed and accessible to the user whose credentials are used to connect to the server as part of this class instantiation.

Parameters:: credentials_path (Path) – The path to the locally stored .yaml file that contains the server hostname and access credentials.

_open: Tracks whether the connection to the server is open or not.

_client: Stores the initialized SSHClient instance used to interface with the server.

abort_job(job)

Aborts the target job if it is currently running on the server.

If the job is currently running, this method forcibly terminates its runtime. If the job is queued for execution, this method removes it from the SLURM queue. If the job is already terminated, this method will do nothing.

Parameters:: job (Job | JupyterJob) – The Job object that needs to be aborted.
Return type:: None

close()

Closes the SSH connection to the server.

This method has to be called before destroying the class instance to ensure proper resource cleanup.

Return type:: None

create_directory(remote_path, parents=True)

Creates the specified directory tree on the managed remote server.

Parameters:

remote_path (Path) – The absolute path to the directory to create on the remote server, relative to the server root.
parents (bool, default: True) – Determines whether to create parent directories, if they are missing. Otherwise, if parents do not exist, raises a FileNotFoundError.

Return type:

None

Notes

This method silently assumes that it is fine if the directory already exists and treats it as a successful runtime end-point.

property dlc_projects_directory: Path: Returns the absolute path to the shared directory that stores all DeepLabCut projects.

exists(remote_path)

Returns True if the target file or directory exists on the remote server.

Return type:: bool

property host: str: Returns the hostname or IP address of the server accessible through this class.

job_complete(job)

Returns True if the job managed by the input Job instance has been completed or terminated its runtime due to an error.

If the job is still running or queued for runtime, the method returns False.

Parameters:: job (Job | JupyterJob) – The Job object whose status needs to be checked.
Raises:: ValueError – If the input Job object does not contain a valid job_id, suggesting that it has not been submitted to the server.
Return type:: bool

launch_jupyter_server(job_name, conda_environment, notebook_directory, cpus_to_use=2, ram_gb=32, time_limit=240, port=0, jupyter_args='')

Launches a remote Jupyter notebook session (server) on the target remote compute server.

This method allows running interactive Jupyter sessions on the remote server under SLURM control.

Parameters:

job_name (str) – The descriptive name of the Jupyter SLURM job to be created.
conda_environment (str) – The name of the conda environment to activate on the server before running the job logic. The environment should contain the necessary Python packages and CLIs to support running the job’s logic. For Jupyter jobs, this necessarily includes the Jupyter notebook and jupyterlab packages.
port (int, default: 0) – The connection port number for the Jupyter server. If set to 0 (default), a random port number between 8888 and 9999 is assigned to this connection to reduce the possibility of colliding with other user sessions.
notebook_directory (Path) – The root directory where to run the Jupyter notebook. During runtime, the notebook will only have access to items stored under this directory. For most runtimes, this should be set to the user’s root working directory.
cpus_to_use (int, default: 2) – The number of CPUs to allocate to the Jupyter server.
ram_gb (int, default: 32) – The amount of RAM, in GB, to allocate to the Jupyter server.
time_limit (int, default: 240) – The maximum Jupyter server uptime, in minutes.
jupyter_args (str, default: '') – Stores additional arguments to pass to jupyter notebook initialization command.

Return type:

JupyterJob

Returns:

The initialized JupyterJob instance that stores information on how to connect to the created Jupyter server. Do NOT re-submit the job to the server, as this is done as part of this method’s runtime.

Raises:

TimeoutError – If the target Jupyter server doesn’t start within 120 minutes of this method being called.
RuntimeError – If the job submission fails for any reason.

property processed_data_root: Path: Returns the absolute path to the directory used to store the processed data for all Sun lab projects on the server accessible through this class.

pull_directory(local_directory_path, remote_directory_path)

Recursively downloads the entire target directory from the remote server to the local machine.

Parameters:

local_directory_path (Path) – The path to the local directory where the remote directory will be copied.
remote_directory_path (Path) – The path to the directory on the remote server to be downloaded.

Return type:

None

pull_file(local_file_path, remote_file_path)

Moves the specified file from the remote server to the local machine.

Parameters:

local_file_path (Path) – The path to the local instance of the file (where to copy the file).
remote_file_path (Path) – The path to the target file on the remote server (the file to be copied).

Return type:

None

push_directory(local_directory_path, remote_directory_path)

Recursively uploads the entire target directory from the local machine to the remote server.

Parameters:

local_directory_path (Path) – The path to the local directory to be uploaded.
remote_directory_path (Path) – The path on the remote server where the directory will be copied.

Return type:

None

push_file(local_file_path, remote_file_path)

Moves the specified file from the remote server to the local machine.

Parameters:

local_file_path (Path) – The path to the file that needs to be copied to the remote server.
remote_file_path (Path) – The path to the file on the remote server (where to copy the file).

Return type:

None

property raw_data_root: Path: Returns the absolute path to the directory used to store the raw data for all Sun lab projects on the server accessible through this class.

remove(remote_path, is_dir, recursive=False)

Removes the specified file or directory from the remote server.

Parameters:

remote_path (Path) – The path to the file or directory on the remote server to be removed.
is_dir (bool) – Determines whether the input path represents a directory or a file.
recursive (bool, default: False) – If True and is_dir is True, recursively deletes all contents of the directory before removing it. If False, only removes empty directories (standard rmdir behavior).

Return type:

None

submit_job(job, verbose=True)

Submits the input job to the managed remote compute server via the SLURM job manager.

This method functions as the entry point for all headless jobs that are executed on the remote compute server.

Parameters:

job (Job | JupyterJob) – The initialized Job instance that contains remote job’s data.
verbose (bool, default: True) – Determines whether to notify the user about non-error states of the job submission process.

Return type:

Job | JupyterJob

Returns:

The job object whose ‘job_id’ attribute had been modified to include the SLURM-assigned job ID if the job was successfully submitted.

Raises:

RuntimeError – If the job cannot be submitted to the server for any reason.

property suite2p_configurations_directory: Path: Returns the absolute path to the shared directory that stores all sl-suite2p runtime configuration files.

property user: str: Returns the username used to authenticate with the server.

property user_data_root: Path: Returns the absolute path to the directory used to store user-specific data on the server accessible through this class.

property user_working_root: Path: Returns the absolute path to the user-specific working (fast) directory on the server accessible through this class.

class sl_shared_assets.server.ServerCredentials(username='YourNetID', password='YourPassword', host='cbsuwsun.biohpc.cornell.edu', storage_root='/local/storage', working_root='/local/workdir', shared_directory_name='sun_data')

Bases: YamlConfig

This class stores the information used to interface with Sun lab’s remote compute servers.

host: str = 'cbsuwsun.biohpc.cornell.edu': The hostname or IP address of the server to connect to.

password: str = 'YourPassword': The password to use for server authentication.

processed_data_root: str: The path to the root directory used to store the processed data from all Sun lab projects on the target server.

raw_data_root: str: The path to the root directory used to store the raw data from all Sun lab projects on the target server.

shared_directory_name: str = 'sun_data': Stores the name of the shared directory used to store all Sun lab project data on the storage and working server volumes.

storage_root: str = '/local/storage': The path to the root storage (slow) server directory. Typically, this is the path to the top-level (root) directory of the HDD RAID volume.

user_data_root: str: The path to the root directory of the user on the target server. Unlike raw and processed data roots, which are shared between all Sun lab users, each user_data directory is unique for every server user.

user_working_root: str: The path to the root user working directory on the target server. This directory is unique for every user.

username: str = 'YourNetID': The username to use for server authentication.

working_root: str = '/local/workdir': The path to the root working (fast) server directory. Typically, this is the path to the top-level (root) directory of the NVME RAID volume. If the server uses the same volume for both storage and working directories, enter the same path under both ‘storage_root’ and ‘working_root’.

class sl_shared_assets.server.TrackerFileNames(*values)

Bases: StrEnum

Stores the names of the processing tacker .yaml files used by the Sun lab data preprocessing, processing, and dataset formation pipelines to track the pipeline’s progress.

Notes:
The elements in this enumeration match the elements in the ProcessingPipelines enumeration, since each valid ProcessingPipeline instance has an associated ProcessingTracker file instance.

ARCHIVING = 'data_archiving_tracker.yaml': This file is used to track the state of the data archiving pipeline.

BEHAVIOR = 'behavior_processing_tracker.yaml': This file is used to track the state of the behavior log processing pipeline.

CHECKSUM = 'checksum_resolution_tracker.yaml': This file is used to track the state of the checksum resolution pipeline.

FORGING = 'dataset_forging_tracker.yaml': This file is used to track the state of the dataset creation (forging) pipeline.

MANIFEST = 'manifest_generation_tracker.yaml': This file is used to track the state of the project manifest generation pipeline.

MULTIDAY = 'multiday_processing_tracker.yaml': This file is used to track the state of the multiday suite2p processing pipeline.

PREPARATION = 'processing_preparation_tracker.yaml': This file is used to track the state of the data processing preparation pipeline.

SUITE2P = 'suite2p_processing_tracker.yaml': This file is used to track the state of the single-day suite2p processing pipeline.

VIDEO = 'video_processing_tracker.yaml': This file is used to track the state of the video (DeepLabCut) processing pipeline.

sl_shared_assets.server.generate_manager_id()

Generates and returns a unique integer value that can be used to identify the manager process that calls this function.

The identifier is generated based on the current timestamp, accurate to microseconds, and a random number between 1 and 9999999999999. This ensures that the identifier is unique for each function call. The generated identifier string is converted to a unique integer value using the xxHash-64 algorithm before it is returned to the caller.

Return type:: int

Notes

This function should be used to generate manager process identifiers for working with ProcessingTracker instances from sl-shared-assets version 4.0.0 and above.

sl_shared_assets.server.generate_server_credentials(output_directory, username, password, service=False, host='cbsuwsun.biopic.cornell.edu', storage_root='/local/workdir', working_root='/local/storage', shared_directory_name='sun_data')

Generates the server access credentials .yaml file under the specified directory, using input information.

This function provides a convenience interface for generating new server access credential files. Depending on configuration, it either creates user access credentials files or service access credentials files.

Parameters:

output_directory (Path) – The directory where to save the generated server_credentials.yaml file.
username (str) – The username to use for server authentication.
password (str) – The password to use for server authentication.
service (bool, default: False) – Determines whether the generated credentials file stores the data for a user or a service account.
host (str, default: 'cbsuwsun.biopic.cornell.edu') – The hostname or IP address of the server to connect to.
storage_root (str, default: '/local/workdir') – The path to the root storage (slow) server directory. Typically, this is the path to the top-level (root) directory of the HDD RAID volume.
working_root (str, default: '/local/storage') – The path to the root working (fast) server directory. Typically, this is the path to the top-level (root) directory of the NVME RAID volume. If the server uses the same volume for both storage and working directories, enter the same path under both ‘storage_root’ and ‘working_root’.
shared_directory_name (str, default: 'sun_data') – The name of the shared directory used to store all Sun lab project data on the storage and working server volumes.

Return type:

None