HPC Deployment Guide

Step 1: Connect to the HPC via VPN

To access the HPC system, you must first connect to the private network using a VPN.

Instructions:

  1. Open the VPN client configured on your system.

  2. Enter the credentials provided by the system administrator.

  3. Connect to the VPN and verify that the connection is successful.

  4. Once connected, open a terminal and test the connection to the HPC:

    ssh user@hpc.domain.com
    

    Replace user with your username and hpc.domain.com with the HPC server address.

  5. If this is your first time connecting, accept the host key by typing yes when prompted.

Step 2: Reserve Resources on the Cluster

Once inside the cluster, you need to reserve resources to execute the pipeline.

Command Used:

salloc --partition=vision --gres=gpu:1 --cpus-per-task=64 --mem=128G --time=03-00:00:00

Command Breakdown:

  • --partition=vision: Specifies the partition to use.

  • --gres=gpu:1: Reserves 1 GPU for the job.

  • --cpus-per-task=64: Allocates 64 CPUs for the job.

  • --mem=128G: Allocates 128 GB of RAM.

  • --time=03-00:00:00: Sets a maximum runtime of 3 days for the job.

Expected Output:

salloc: Pending job allocation 5134882
salloc: job 5134882 queued and waiting for resources
salloc: job 5134882 has been allocated resources
salloc: Granted job allocation 5134882
salloc: Waiting for resource configuration
salloc: Nodes vision-04 are ready for job

Step 3: Connect to the Assigned Node

After resources are allocated, connect to the assigned node for pipeline execution.

Command:

ssh vision-04

Notes:

  • Replace vision-04 with the node name shown in the salloc output.

  • Once connected to the node, you are ready to configure and execute the pipeline.

Step 3.1: Create Separate Screen Sessions

To keep each service (RabbitMQ, PostgreSQL, and FANTASIA) isolated, it is recommended to run each one in its own screen session:

  1. Create/attach a session for RabbitMQ:

    screen -S rabbitmq
    
    • This opens (or attaches to) a screen session named rabbitmq.

    • To detach from it (but leave it running), press Ctrl + A followed by D.

  2. Create/attach a session for PostgreSQL:

    screen -S postgres
    
  3. Create/attach a session for FANTASIA:

    screen -S fantasia
    

Managing Screen Sessions:

  • To detach from a session while it keeps running, press Ctrl + A then D.

  • To reattach to a session by name:

    screen -r rabbitmq
    screen -r postgres
    screen -r fantasia
    

This allows you to run each service separately, check logs independently, and ensure that if any service crashes or needs debugging, it will not interrupt the others.

Step 4: Load Required Modules

Before running the pipeline (and/or building containers), load the necessary modules into the node environment. These commands can be run in any session, but you can typically run them once in your main terminal or in each session if needed:

module load gcc/13.2.0
module load hdf5/1.14.0
module load singularity/3.11.3
module load cuda/12.0.0
module load openmpi/4.1.1

Notes:

  • Ensure the loaded module versions are compatible with the pipeline.

  • If a module is unavailable, contact the HPC system administrator for assistance.

Step 5: Build and Configure the Singularity Container for PostgreSQL in RAM

Note

It is recommended that you run all PostgreSQL-related commands inside the postgres screen session created in Step 3.1. This ensures PostgreSQL remains isolated from other services.

5.1. Build the Container

Use the following command to build a Singularity container from the official pgvector image:

singularity build pgvector.sif docker://pgvector/pgvector:pg16

5.2. Create Directories in /dev/shm

Since we are running PostgreSQL entirely in RAM, create separate directories in /dev/shm (a tmpfs filesystem):

mkdir -p /dev/shm/pgvector_data
mkdir -p /dev/shm/pgvector_temp

Why /dev/shm? - /dev/shm is a volatile filesystem stored in memory. Data here offers very fast I/O, but all data will be lost when the job ends or the node reboots. - Plan a backup/restore strategy if you need to preserve important results.

5.3. Initialize the Database in RAM

Next, initialize a new PostgreSQL cluster within the RAM-based directory:

singularity exec pgvector.sif initdb -D /dev/shm/pgvector_data

5.4. Start the PostgreSQL Server in RAM

Launch the PostgreSQL server, pointing to the RAM directories:

singularity exec pgvector.sif postgres \
    -D /dev/shm/pgvector_data \
    -k /dev/shm/pgvector_temp

Tips: - Run this inside your postgres screen session so that PostgreSQL continues running even if you detach (Ctrl +A, D). - The -k /dev/shm/pgvector_temp argument configures PostgreSQL to listen on a Unix domain socket located in /dev/shm, which is handy for local connections within the same HPC node.

5.5. Verify and Configure Permissions

In another terminal (or by reattaching the same screen session), test connectivity:

singularity exec pgvector.sif psql -h /dev/shm/pgvector_temp -d postgres

If the connection succeeds, your PostgreSQL instance is live in RAM.

Step 6: Configure PostgreSQL User, Database, and Restart the Server

Once you have verified the service by running:

singularity exec pgvector.sif psql -h /dev/shm/pgvector_temp -d postgres

you will be inside the PostgreSQL interactive shell (psql). From there, you can create users, databases, and adjust settings as needed.

6.1. Create a User and Database

Run these commands directly in the PostgreSQL shell:

CREATE USER usuario WITH PASSWORD 'clave' SUPERUSER;
CREATE DATABASE "BioData" OWNER usuario;
GRANT ALL PRIVILEGES ON DATABASE "BioData" TO usuario;

ALTER SYSTEM SET shared_buffers = '256GB';
ALTER SYSTEM SET effective_cache_size = '516GB';
ALTER SYSTEM SET work_mem = '1GB';
ALTER SYSTEM SET max_worker_processes = '256';
ALTER SYSTEM SET max_connections = '500';
  • Replace usuario and clave with your desired username and password.

  • The above ALTER SYSTEM commands modify server parameters (for example, memory settings).

When finished, exit the PostgreSQL client:

\q

6.2. Restarting PostgreSQL

Some configuration changes require a server restart to take effect. In your postgres screen session (where the server is running), you can stop and start PostgreSQL as follows:

  1. Restart the PostgreSQL Server:

    singularity exec pgvector.sif pg_ctl -D /dev/shm/pgvector_data restart
    

With the server restarted, your new settings and user/database configuration are now active.

Step 7: Build and Run RabbitMQ

Switch to (or create) the rabbitmq screen session for these commands:

  1. Build the Singularity container for RabbitMQ:

    singularity build rabbitmq.sif docker://rabbitmq:management
    
  2. Create the data directory in your home (or local storage):

    mkdir -p ~/rabbitmq_data
    
  3. Start the RabbitMQ server within the container:

    singularity exec --bind ~/rabbitmq_data:/var/lib/rabbitmq rabbitmq.sif rabbitmq-server
    

You can leave RabbitMQ running in this screen session. Detach with Ctrl + A, D if desired.

Step 8: Build and Configure the Singularity Container for FANTASIA

This step can be done in your main terminal or in the fantasia session:

Build the Container:

singularity build fantasia.sif docker://frapercan/fantasia

Notes:

  • Ensure you have permissions to build containers in the HPC environment.

Step 9: Initialize FANTASIA

The following command initialize the information system with a frozen copy through the parameter --embeddings_url. By default, a Late 2024 UniProt mirror is provided through Zenodo.

singularity exec --nv --bind ~/fantasia:/fantasia fantasia.sif python3 -m fantasia.main initialize

Step 10: Run FANTASIA

The following command runs the FANTASIA pipeline inside a Singularity container:

singularity exec --nv --bind ~/fantasia:/fantasia fantasia.sif python3 -m fantasia.main run \
   --input data_sample/sample.fasta \
   --length_filter 50000000 \
   --redundancy_filter 0. \
   --sequence_queue_package 1000 \
   --models esm,prot \
   --distance_threshold esm:1.2,prot:0.7,prost:0.7 \
   --batch_size 1:32,2:32,3:32
   --device cuda
   --base_directory ~/fantasia

Explanation of the Commands

  • --nv: allows CUDA in Singularity.

  • --bind ~/fantasia:/fantasia: Mounts your local ~/fantasia directory inside the container at /fantasia.

  • python3 -m fantasia.main run: Executes the main run function of FANTASIA.

Arguments

  • --fasta: Specifies the input FASTA file containing protein sequences to process. The path is relative to the mounted directory inside the container.

  • --prefix: Sets a prefix for output files. This helps organize results and logs for different runs.

  • --length_filter: Filters out sequences longer than the specified length (in this case, 50,000,000 base pairs). Sequences exceeding this length will be ignored.

  • --redundancy_filter: Specifies the redundancy threshold (0.0 in this case). Sequences with redundancy above this threshold will be excluded.

  • --sequence_queue_package: Determines the size of sequence batches (1000 sequences per package). This controls how many sequences are processed in each batch.

  • --esm, --prost, --prot: Enables different processing modes or models in the pipeline. These flags activate specific embedding models (ESM, ProstT5, and ProtT5, respectively).

  • --distance_threshold: Sets thresholds for distances across different embedding types. The format is a comma-separated list of embedding_type:threshold pairs. For example, esm:1.2,prot:0.7,prost:0.7 sets distance thresholds.

  • --batch_size: Specifies batch sizes for different embedding types. The format is a comma-separated list of embedding_type:size pairs. For example, esm:32,prot:32,prost:32 sets batch sizes.

  • --device: Specifies the device to use for computation. Options are cuda (for GPU acceleration) or cpu (for CPU-only execution). Default is cuda if available.

  • --base_directory: Specifies the base directory where all experiments, results, and execution parameters will be stored. This is the root location for organizing output files and logs.

Output

  • Results will be stored in the directory mounted to /fantasia (e.g., ~/fantasia on your local system).

  • Log messages will be displayed in the terminal, indicating the pipeline’s progress.