FANTASIA Deployment Guide

This guide provides a step-by-step process for deploying FANTASIA locally.

Prerequisites

Before proceeding, ensure you have the following dependencies installed:

System Requirements

  • Operating System: Linux (Ubuntu recommended)

  • Python: Version 3.10 or higher

  • Docker: Installed and running. If not installed, follow the Docker installation guide and the post-installation steps to run Docker without sudo.

  • CD-HIT: Must be installed and available in the system PATH. You can install it from your package manager (e.g., sudo apt install cd-hit) or compile it from source at the [CD-HIT website](http://weizhong-lab.ucsd.edu/cd-hit/).

Machine Learning Dependencies

  • NVIDIA Driver: Version 550.120 or newer (verify using nvidia-smi).

  • CUDA: Version 12.4 or newer (verify using nvcc --version).

Database Dependencies

  • PostgreSQL Client: Version 16 or later, required to restore database backups without compatibility issues.

    Warning

    🚨 Important for Ubuntu 22.04 and older 🚨

    PostgreSQL 16 is not available in the default repositories for Ubuntu 22.04 and earlier. If you try to restore a backup using pg_restore, you may encounter incompatibility issues.

Python Environment

  • Poetry: Used for dependency management.

    curl -sSL https://install.python-poetry.org | python3 -
    export PATH="$HOME/.local/bin:$PATH"
    source ~/.bashrc  # o source ~/.zshrc
    

Cloning the Repository

Clone the repository and navigate into the project directory:

git clone https://github.com/CBBIO/FANTASIA.git
cd FANTASIA

Creating and Activating the Virtual Environment

Use poetry to manage the virtual environment. Follow these steps:

  1. Ensure Poetry is installed and up to date:

    poetry self update
    
  2. If using Poetry 1.5 or later, install the required shell plugin:

    poetry self add poetry-plugin-shell
    
  3. Create and activate the virtual environment:

    poetry env use <python_version>  # Specify the desired Python version (e.g., 3.12)
    poetry install
    poetry env activate
    

Note

If using Conda, avoid managing environments with both Poetry and Conda simultaneously to prevent dependency conflicts.

We recommend using PyCharm for development due to its seamless integration with Poetry, making environment management and package handling more intuitive.

Starting Required Services

Ensure PostgreSQL and RabbitMQ services are running.

docker run -d --name pgvectorsql \
    -e POSTGRES_USER=usuario \
    -e POSTGRES_PASSWORD=clave \
    -e POSTGRES_DB=BioData \
    -p 5432:5432 \
    pgvector/pgvector:pg16
docker run -d --name rabbitmq \
    -p 15672:15672 \
    -p 5672:5672 \
    rabbitmq:management

You can access the RabbitMQ management interface at: http://localhost:15672 (Default credentials: guest/guest).

Configuration

Before proceeding, create the necessary directories with proper permissions:

mkdir -p ~/fantasia/dumps ~/fantasia/embeddings ~/fantasia/results ~/fantasia/redundancy
chmod -R 755 ~/fantasia

Ensure the following parameters are correctly set in fantasia/config.yaml:

DB_USERNAME: usuario
DB_PASSWORD: clave
DB_HOST: pgvectorsql
DB_PORT: 5432
DB_NAME: BioData

rabbitmq_host: rabbitmq
rabbitmq_user: guest
rabbitmq_password: guest

Initialization

Download embeddings and initialize the database:

python fantasia/main.py initialize --config ./fantasia/config.yaml

Verify that the embeddings are loaded into:

  • The directory specified in base_directory.

  • The configured PostgreSQL database.

Running the Pipeline

fantasia --help