FANTASIA Array Job Execution on Greisenwald HPC (GPU + Singularity)¶

This section explains how to execute multiple FANTASIA jobs in parallel on the Greisenwald HPC system, using SLURM job arrays and GPU acceleration. All services and containers (PostgreSQL, RabbitMQ, FANTASIA) are launched locally per job using Singularity, ensuring isolation and reproducibility.

This method is ideal for batch annotation of multiple input FASTA files, each with independent configuration.

Job Script Summary¶

The script performs the following for each job in the array:

Reads parameters (FASTA file, output prefix, extra args) from a tab-separated input list
Builds containers if missing
Initializes PostgreSQL with pgvector extension
Launches RabbitMQ
Runs fantasia initialize and fantasia run with job-specific parameters
Performs automatic cleanup

SLURM Directives¶

#SBATCH --job-name=fantasia
#SBATCH --output=fantasia_%A_%a.out
#SBATCH --error=fantasia_%A_%a.err
#SBATCH --partition=vision
#SBATCH --gres=gpu:1
#SBATCH --cpus-per-task=64
#SBATCH --mem=128G
#SBATCH --time=3-00:00:00
#SBATCH --array=1-8%1

Explanation:

–array=1-8: launch jobs for lines 1 to 8 in the input list
%1: limit to 1 concurrent job (adjust depending on GPU availability)

Parameter File Format¶

The job array reads from a plain-text, tab-separated file located at:

PARAM_FILE="$HOME/fantasia_input_list.txt"

Each line must contain:

<input_fasta>    <output_prefix>    <extra_arguments>

Example:

proteomes_uniprot/UP000000589.fasta    MOUSE_nr100_k1    --redundancy_filter 1.0 --taxonomy_ids_to_exclude 10090,9606

Containers and Paths¶

Paths and image definitions:

REPO_DIR="$HOME/FANTASIA"
WORK_DIR="$HOME/FANTASIA"
SHM_DIR="/tmp/fantasia_pgvector_\${SLURM_ARRAY_TASK_ID}"
DB_DIR="$SHM_DIR/data"
DB_SOCKET="$SHM_DIR/socket"
RABBIT_DIR="$HOME/fantasia_rabbitmq_\${SLURM_ARRAY_TASK_ID}"
FANTASIA_RUN_DIR="$HOME/fantasia_\${SLURM_ARRAY_TASK_ID}"

PGVECTOR_SIF="$WORK_DIR/pgvector.sif"
RABBITMQ_SIF="$WORK_DIR/rabbitmq.sif"
FANTASIA_SIF="$WORK_DIR/fantasia.sif"

If missing, containers are built from Docker images.

Dynamic Parameter Parsing¶

Each job extracts its corresponding line from the parameter file using:

LINE=$(sed -n "${SLURM_ARRAY_TASK_ID}p" "$PARAM_FILE")
INPUT=$(echo "$LINE" | cut -f1)
OUTPUT=$(echo "$LINE" | cut -f2)
EXTRA=$(echo "$LINE" | cut -f3-)

This allows job-specific execution.

Execution Phase¶

PostgreSQL and RabbitMQ are launched with dedicated folders per job. Then:

singularity exec --nv --bind "$FANTASIA_RUN_DIR:/fantasia" "$FANTASIA_SIF" \
    fantasia initialize

singularity exec --nv --bind "$FANTASIA_RUN_DIR:/fantasia" "$FANTASIA_SIF" \
    fantasia run --input "$INPUT" --prefix "$OUTPUT" $EXTRA

Cleanup¶

The script defines a cleanup() function to:

Kill PostgreSQL and RabbitMQ
Remove temporary folders

Registered using trap EXIT.

Launching the Job¶

Submit the array job with:

sbatch greisenwald_array.sh

Logs for each task will be stored as:

fantasia_<arrayjobid>_<taskid>.out
fantasia_<arrayjobid>_<taskid>.err