.. _fantasia_hpc_deployment: ======================================= HPC Deployment Guide ======================================= Step 1: Connect to the HPC via VPN ================================== To access the HPC system, you must first connect to the private network using a VPN. **Instructions:** 1. Open the VPN client configured on your system. 2. Enter the credentials provided by the system administrator. 3. Connect to the VPN and verify that the connection is successful. 4. Once connected, open a terminal and test the connection to the HPC: .. code-block:: console ssh user@hpc.domain.com Replace ``user`` with your username and ``hpc.domain.com`` with the HPC server address. 5. If this is your first time connecting, accept the host key by typing ``yes`` when prompted. Step 2: Reserve Resources on the Cluster ======================================== Once inside the cluster, you need to reserve resources to execute the pipeline. **Command Used:** .. code-block:: console salloc --partition=vision --gres=gpu:1 --cpus-per-task=64 --mem=128G --time=03-00:00:00 **Command Breakdown:** - ``--partition=vision``: Specifies the partition to use. - ``--gres=gpu:1``: Reserves 1 GPU for the job. - ``--cpus-per-task=64``: Allocates 64 CPUs for the job. - ``--mem=128G``: Allocates 128 GB of RAM. - ``--time=03-00:00:00``: Sets a maximum runtime of 3 days for the job. **Expected Output:** .. code-block:: console salloc: Pending job allocation 5134882 salloc: job 5134882 queued and waiting for resources salloc: job 5134882 has been allocated resources salloc: Granted job allocation 5134882 salloc: Waiting for resource configuration salloc: Nodes vision-04 are ready for job Step 3: Connect to the Assigned Node ==================================== After resources are allocated, connect to the assigned node for pipeline execution. **Command:** .. code-block:: console ssh vision-04 **Notes:** - Replace ``vision-04`` with the node name shown in the ``salloc`` output. - Once connected to the node, you are ready to configure and execute the pipeline. Step 3.1: Create Separate Screen Sessions ========================================= To keep each service (RabbitMQ, PostgreSQL, and FANTASIA) isolated, it is recommended to run each one in its own `screen` session: 1. **Create/attach a session for RabbitMQ**: .. code-block:: console screen -S rabbitmq - This opens (or attaches to) a screen session named ``rabbitmq``. - To detach from it (but leave it running), press ``Ctrl + A`` followed by ``D``. 2. **Create/attach a session for PostgreSQL**: .. code-block:: console screen -S postgres 3. **Create/attach a session for FANTASIA**: .. code-block:: console screen -S fantasia **Managing Screen Sessions:** - To detach from a session while it keeps running, press ``Ctrl + A`` then ``D``. - To reattach to a session by name: .. code-block:: console screen -r rabbitmq screen -r postgres screen -r fantasia This allows you to run each service separately, check logs independently, and ensure that if any service crashes or needs debugging, it will not interrupt the others. Step 4: Load Required Modules ============================= Before running the pipeline (and/or building containers), load the necessary modules into the node environment. These commands can be run in any session, but you can typically run them once in your main terminal or in each session if needed: .. code-block:: console module load gcc/13.2.0 module load hdf5/1.14.0 module load singularity/3.11.3 module load cuda/12.0.0 module load openmpi/4.1.1 **Notes:** - Ensure the loaded module versions are compatible with the pipeline. - If a module is unavailable, contact the HPC system administrator for assistance. Step 5: Build and Configure the Singularity Container for PostgreSQL in RAM =========================================================================== .. note:: It is recommended that you run **all PostgreSQL-related commands** inside the ``postgres`` screen session created in Step 3.1. This ensures PostgreSQL remains isolated from other services. 5.1. Build the Container ------------------------ Use the following command to build a Singularity container from the official pgvector image: .. code-block:: console singularity build pgvector.sif docker://pgvector/pgvector:pg16 5.2. Create Directories in /dev/shm ----------------------------------- Since we are running PostgreSQL entirely in RAM, create separate directories in ``/dev/shm`` (a tmpfs filesystem): .. code-block:: console mkdir -p /dev/shm/pgvector_data mkdir -p /dev/shm/pgvector_temp **Why /dev/shm?** - ``/dev/shm`` is a volatile filesystem stored in memory. Data here offers very fast I/O, but **all data will be lost** when the job ends or the node reboots. - Plan a backup/restore strategy if you need to preserve important results. 5.3. Initialize the Database in RAM ----------------------------------- Next, initialize a new PostgreSQL cluster within the RAM-based directory: .. code-block:: console singularity exec pgvector.sif initdb -D /dev/shm/pgvector_data 5.4. Start the PostgreSQL Server in RAM --------------------------------------- Launch the PostgreSQL server, pointing to the RAM directories: .. code-block:: console singularity exec pgvector.sif postgres \ -D /dev/shm/pgvector_data \ -k /dev/shm/pgvector_temp **Tips**: - Run this inside your ``postgres`` screen session so that PostgreSQL continues running even if you detach (Ctrl +A, D). - The ``-k /dev/shm/pgvector_temp`` argument configures PostgreSQL to listen on a Unix domain socket located in ``/dev/shm``, which is handy for local connections within the same HPC node. 5.5. Verify and Configure Permissions ------------------------------------- In another terminal (or by reattaching the same screen session), test connectivity: .. code-block:: console singularity exec pgvector.sif psql -h /dev/shm/pgvector_temp -d postgres If the connection succeeds, your PostgreSQL instance is live in RAM. Step 6: Configure PostgreSQL User, Database, and Restart the Server =================================================================== Once you have verified the service by running: .. code-block:: console singularity exec pgvector.sif psql -h /dev/shm/pgvector_temp -d postgres you will be inside the PostgreSQL interactive shell (``psql``). From there, you can create users, databases, and adjust settings as needed. 6.1. Create a User and Database ------------------------------- Run these commands directly in the PostgreSQL shell: .. code-block:: sql CREATE USER usuario WITH PASSWORD 'clave' SUPERUSER; CREATE DATABASE "BioData" OWNER usuario; GRANT ALL PRIVILEGES ON DATABASE "BioData" TO usuario; ALTER SYSTEM SET shared_buffers = '256GB'; ALTER SYSTEM SET effective_cache_size = '516GB'; ALTER SYSTEM SET work_mem = '1GB'; ALTER SYSTEM SET max_worker_processes = '256'; ALTER SYSTEM SET max_connections = '500'; - Replace ``usuario`` and ``clave`` with your desired username and password. - The above `ALTER SYSTEM` commands modify server parameters (for example, memory settings). When finished, exit the PostgreSQL client: .. code-block:: console \q 6.2. Restarting PostgreSQL -------------------------- Some configuration changes require a server restart to take effect. In your ``postgres`` screen session (where the server is running), you can stop and start PostgreSQL as follows: 1. **Restart the PostgreSQL Server**: .. code-block:: console singularity exec pgvector.sif pg_ctl -D /dev/shm/pgvector_data restart With the server restarted, your new settings and user/database configuration are now active. Step 7: Build and Run RabbitMQ ============================== Switch to (or create) the ``rabbitmq`` screen session for these commands: 1. **Build the Singularity container for RabbitMQ**: .. code-block:: console singularity build rabbitmq.sif docker://rabbitmq:management 2. **Create the data directory** in your home (or local storage): .. code-block:: console mkdir -p ~/rabbitmq_data 3. **Start the RabbitMQ server** within the container: .. code-block:: console singularity exec --bind ~/rabbitmq_data:/var/lib/rabbitmq rabbitmq.sif rabbitmq-server You can leave RabbitMQ running in this screen session. Detach with ``Ctrl + A, D`` if desired. Step 8: Build and Configure the Singularity Container for FANTASIA =================================================================== This step can be done in your main terminal or in the ``fantasia`` session: **Build the Container:** .. code-block:: console singularity build fantasia.sif docker://frapercan/fantasia **Notes:** - Ensure you have permissions to build containers in the HPC environment. Step 9: Initialize FANTASIA ============================ The following command initialize the information system with a frozen copy through the parameter ``--embeddings_url``. By default, a Late 2024 UniProt mirror is provided through Zenodo. .. code-block:: console singularity exec --nv --bind ~/fantasia:/fantasia fantasia.sif python3 -m fantasia.main initialize Step 10: Run FANTASIA ============================= The following command runs the FANTASIA pipeline inside a Singularity container: .. code-block:: console singularity exec --nv --bind ~/fantasia:/fantasia fantasia.sif python3 -m fantasia.main run \ --input data_sample/sample.fasta \ --length_filter 50000000 \ --redundancy_filter 0. \ --sequence_queue_package 1000 \ --models esm,prot \ --distance_threshold esm:1.2,prot:0.7,prost:0.7 \ --batch_size 1:32,2:32,3:32 --device cuda --base_directory ~/fantasia Explanation of the Commands ============================== - ``--nv``: allows CUDA in Singularity. - ``--bind ~/fantasia:/fantasia``: Mounts your local ``~/fantasia`` directory inside the container at ``/fantasia``. - ``python3 -m fantasia.main run``: Executes the main ``run`` function of FANTASIA. Arguments --------- - ``--fasta``: Specifies the input FASTA file containing protein sequences to process. The path is relative to the mounted directory inside the container. - ``--prefix``: Sets a prefix for output files. This helps organize results and logs for different runs. - ``--length_filter``: Filters out sequences longer than the specified length (in this case, 50,000,000 base pairs). Sequences exceeding this length will be ignored. - ``--redundancy_filter``: Specifies the redundancy threshold (0.0 in this case). Sequences with redundancy above this threshold will be excluded. - ``--sequence_queue_package``: Determines the size of sequence batches (1000 sequences per package). This controls how many sequences are processed in each batch. - ``--esm``, ``--prost``, ``--prot``: Enables different processing modes or models in the pipeline. These flags activate specific embedding models (ESM, ProstT5, and ProtT5, respectively). - ``--distance_threshold``: Sets thresholds for distances across different embedding types. The format is a comma-separated list of ``embedding_type:threshold`` pairs. For example, ``esm:1.2,prot:0.7,prost:0.7`` sets distance thresholds. - ``--batch_size``: Specifies batch sizes for different embedding types. The format is a comma-separated list of ``embedding_type:size`` pairs. For example, ``esm:32,prot:32,prost:32`` sets batch sizes. - ``--device``: Specifies the device to use for computation. Options are ``cuda`` (for GPU acceleration) or ``cpu`` (for CPU-only execution). Default is ``cuda`` if available. - ``--base_directory``: Specifies the base directory where all experiments, results, and execution parameters will be stored. This is the root location for organizing output files and logs. **Output** - Results will be stored in the directory mounted to ``/fantasia`` (e.g., ``~/fantasia`` on your local system). - Log messages will be displayed in the terminal, indicating the pipeline’s progress.