Deployment Instructions#

This section describes the deployment of REEV, both for development and production.

Prerequisites#

You will need to fetch some of this from our S3 server. We recommend the s5cmd tool as it is easy to install, use, and fast. You can download it from github.com/peak/s5cmd/releases. For example:

wget -O /tmp/s5cmd_2.1.0_Linux-64bit.tar.gz \
    https://github.com/peak/s5cmd/releases/download/v2.1.0/s5cmd_2.1.0_Linux-64bit.tar.gz
tar -C /tmp -xf /tmp/s5cmd_2.1.0_Linux-64bit.tar.gz
sudo cp /tmp/s5cmd /usr/local/bin/

You will need to install Docker Compose. Note that the “modern” way is to do this by using the docker compose plugin. Instructions can be found here on the Docker.com website.

Checkout and Configure#

First, clone the repository:

git clone git@github.com:bihealth/reev-docker-compose.git

From here on, the commands should be executed from within this repository (cd reev-docker-compose).

We will use the directory .dev within the checkout for storing data and secrets. In a production deployment, these directories should live outside of the checkout, of course.

Now, we create the directories for data storage.

mkdir -p .dev/volumes/pgadmin/data
mkdir -p .dev/volumes/postgres/data
mkdir -p .dev/volumes/rabbitmq/data
mkdir -p .dev/volumes/redis/data
mkdir -p .dev/volumes/reev-static/data

Next, we setup some “secrets” for the passwords.

mkdir -p .dev/secrets
echo db-password >.dev/secrets/db-password
echo pgadmin-password >.dev/secrets/pgadmin-password

We now copy the env.tpl file to the default location for the environment .env.

cp env.tpl .env

Next, create a docker-compose.override.yml with the contents of the file docker-compose.override.yml-dev. This will disable everything that we assume is running on your host when you are developing. This includes the REEV backend, rabbitmq, celery workers, postgres.

cp docker-compose.override.yml-dev docker-compose.override.yml

Download Data#

To serve data via the mehari, viguno, and annonars containers, you need to obtain the required datasets. We have prepared significantly reduced datasets (totaling less than 2GB as opposed to hundreds of GB) for development purposes.

We provide a script that sets up the necessary directories, downloads the data, and creates symlinks.

By default, the script verifies SSL certificates when downloading data. If you encounter SSL verification issues or operate in an environment where SSL verification is not required, you can disable SSL verification by setting the NO_VERIFY_SSL variable to 1 when running the script.

To download the data with SSL verification (default behavior):

bash download-data.sh

Note

Note that you can also download the full data by using DOWNLOAD=full bash download-data.sh below. To use a reduced dataset to exons plus/minus 100bp, use DOWNLOAD=reduced-exomes bash download-data.sh.

To download the data without SSL verification:

NO_VERIFY_SSL=1 bash download-data.sh

Note: Disabling SSL verification can make the connection less secure. Use this option only if you understand the risks and it is necessary for your environment.

Setup Configuration#

The next step step is to create the configuration files in .dev/config.

mkdir -p .dev/config/nginx
cp utils/nginx/nginx.conf .dev/config/nginx

mkdir -p .dev/config/pgadmin
cp utils/pgadmin/servers.json .dev/config/pgadmin

Startup and Check#

Now, you can bring up the docker compose environment (stop with Ctrl+C).

docker compose up

To verify the results, have a look at the following URLs. These URLs are used by the REEV application.

Note that the development subset only has variants for a few genes, including BRCA1 (the example above).

You will also have the following services useful for introspection during development. For production, you probably don’t want to expose them publically.

  • flower, login is admin, with password flower-password

  • pgAdmin for Postgres DB administration: http://127.0.0.1:3041 login is admin@example.com with password pgadmin-password

Service Information#

This section describes the services that are started with this Docker Compose.

Traefik#

Traefik is a reverse proxy that is used as the main entry point for all services behind HTTP(S). The software is well-documented by its creators. However, it is central to the setup and for much of the additional setup, touching Trafik configuration is needed. We thus summarize some important points here.

  • Almost all configuration is done using labels on the traefik container itself or other containers.

  • In the case of using configuration files, you will have to mount them from the host into the container.

  • By default, we use “catch-all” configuration based on regular expressions on the host/domain name.

Dotty#

Dotty (by the REEV authors) provides mapping from c./n./g. notation to SPDI.

Mehari#

Mehari (by the REEV authors) provides information about variants and their effect on individual transcripts.

Viguno#

Viguno (by the REEV authors) provides HPO/OMIM related information.

Annonars#

Annonars (by the REEV authors) provides variant annotation from public databases.

Postgres#

We use postgres for the database backend of REEV.

Rabbitmq#

We use rabbitmq for message queues.

Redis#

REDIS is used for storing authentication sessions.

PgAdmin#

PgAdmin is a web-based administration tool for Postgres. We provide it for development and debugging but it can also come in handy in production.

Flower#

Flower is a web-based application for monitoring and administrating Celery.