Main Entry Point¶
Main execution module¶
This module serves as the primary entry point for the FANTASIA system within the Protein Information System (PIS). It orchestrates the end-to-end workflow, from initializing the reference embeddings database to running the functional annotation pipeline.
Main Functions¶
initialize: Downloads the reference embeddings and loads them into the database.
run_pipeline: Executes the main FANTASIA pipeline, including sequence embedding and subsequent database lookup.
setup_experiment_directories: Manages directory creation and experiment-specific configuration files.
load_and_merge_config: Loads the base YAML configuration, applies CLI overrides, ensures backward compatibility, and performs early validation checks.
main: CLI entry point that parses arguments, initializes logging, validates services, and dispatches subcommands.
Notes
Designed for CLI usage through the
initializeandrunsubcommands.Configuration is driven by YAML files and command-line arguments.
- fantasia.main.initialize(conf)¶
Initialize the FANTASIA environment by downloading and loading reference embeddings.
This function:
Creates the embeddings directory if it does not exist.
Downloads the reference embeddings archive from the configured URL.
Loads the extracted embeddings into the database for subsequent lookup.
- Parameters:
conf (dict) –
Configuration dictionary. Must contain:
base_directory(str): Base directory where embeddings and experiments are stored.embeddings_url(str): URL of the reference embeddings archive to be downloaded.
- Return type:
None
- Raises:
FileNotFoundError – If the embeddings archive cannot be found or accessed after download.
RuntimeError – If loading the embeddings into the database fails.
- fantasia.main.load_and_merge_config(args, unknown_args)¶
Load the base YAML configuration and apply CLI overrides.
This function merges different sources of configuration into a normalized dictionary ready for pipeline execution. The process includes:
Loading the YAML configuration file specified by
--config.Applying known CLI arguments as flat overrides.
Parsing unknown CLI key-value pairs (
--key value) into overrides.Mapping selected CLI flags into their canonical nested structure.
Sanitizing taxonomy ID lists.
Restoring legacy compatibility for
embedding.types.Validating redundancy thresholds and taxonomy lists.
- Parameters:
args (argparse.Namespace) –
Namespace of parsed known arguments from
argparse. Must include:config(str): Path to the base YAML configuration.Other optional CLI overrides such as
device,redundancy_filter,alignment_coverage, andthreads.
unknown_args (list of str) – List of additional CLI arguments in the form
["--key", "value", ...]. These are parsed into dictionary entries.
- Returns:
A fully merged and validated configuration dictionary. Keys include:
embedding(dict): Embedding-related settings, including enabled models.lookup(dict): Lookup and redundancy-related parameters.taxonomy(dict): Taxonomy filtering options.Other keys inherited from the YAML file and CLI overrides.
- Return type:
dict
- Raises:
ValueError – If redundancy thresholds are out of range, or taxonomy lists are provided in an invalid format.
- fantasia.main.main()¶
Command-line interface (CLI) entry point for FANTASIA.
- This function:
Builds the argument parser and reads CLI inputs.
Loads and merges the configuration from YAML and CLI overrides.
Sets up logging with timestamped log files.
Verifies that required background services are available.
Dispatches execution to the selected subcommand.
Supported Subcommands¶
initialize: Download and load reference embeddings into the database.run: Execute the full FANTASIA pipeline (embedding + lookup).
Behavior¶
If no command is provided, the function prints the help message and exits.
The
runcommand requires at least one embedding model to be enabled in the configuration underembedding.models.
- param None:
- rtype:
None
- raises ValueError:
If no embedding models are enabled, or if redundancy thresholds are invalid.
- raises SystemExit:
If the user requests help, or if a fatal error occurs during execution.
- fantasia.main.run_pipeline(conf)¶
Execute the main FANTASIA pipeline.
This function coordinates the entire functional annotation workflow:
Prepares experiment directories and saves the configuration.
Runs the embedding step unless
only_lookupis enabled.Validates that the embeddings file has been generated.
Performs database lookup using the generated or provided embeddings.
- Parameters:
conf (dict) –
Configuration dictionary. Must include:
base_directory(str): Root path for storing experiments.only_lookup(bool): If True, skip embedding and use provided input file.input(str, optional): Path to an existing HDF5 embeddings file (required ifonly_lookupis True).Other pipeline settings required by embedding and lookup components.
- Return type:
None
- Raises:
FileNotFoundError – If the embeddings file is missing after the embedding step.
SystemExit – If a fatal error occurs during pipeline execution.
Exception – For any other unexpected runtime errors.
- fantasia.main.setup_experiment_directories(conf, timestamp)¶
Prepare and configure directories for a new experiment.
This function:
Expands the base directory and ensures an
experimentsfolder exists.Creates a unique experiment directory using the provided timestamp.
Stores the experiment configuration into
experiment_config.yaml.Updates the configuration dictionary with the generated experiment path.
- Parameters:
conf (dict) –
Configuration dictionary containing at least:
base_directory(str, optional): Root path for experiments. Defaults to~/fantasia/if not provided.prefix(str, optional): Prefix for experiment naming. Defaults toexperiment.
timestamp (str) – Unique identifier (usually
YYYYMMDDHHMMSS) appended to the experiment name.
- Returns:
Updated configuration dictionary including the key: -
experiment_path(str): Path to the newly created experiment directory.- Return type:
dict
- Raises:
OSError – If the experiment directory or configuration file cannot be created.