Specific Application How-Tos¶

Here is a collection of instructions to run some applications of interest in the cluster.

Tmux for persistent shell sessions¶

Tmux is a “terminal multiplexer” which allows multiple terminal sessions to be accessed simultaneously in a single window. Processes can be detached from their controlling terminals, allowing remote sessions to remain active without being visible. This is particularly useful to protect programs running on a remote server (e.g. a sciCORE login node) from connection drops (e.g. when going through a tunnel on the train ).

This software is already installed on all sciCORE login nodes. We recommend checking the official manual, and provide here a quick cheat sheet of common operations in Tmux:

tmux new -s <name>  # create new session

tmux ls # list active tmux sessions

tmux a -t <name>  # attach target

ctrl-b  # base action call

ctrl-b c # create new tab in session (asterisk indicates the active tab)

ctrl-b n # next tab
ctrl-b p # previous tab

ctrl-b [ # navigation mode indicated by [0/199], use q to quit

ctrl-b &  # kills the currently active tab

ctrl-b d # detach and leave tabs running

AlphaFold¶

AlphaFold is a protein structure prediction database and program designed by DeepMind using deep learning techniques.

You can run the inference pipeline of AlphaFold in the sciCORE cluster. To explore the AlphaFold versions available run:

ml spider AlphaFold

Once you choose a version and load it with ml AlphaFold/<version>, two scripts will be available to you, run_alphafold.sh and run_alphafold_multimer.sh. For instance:

$ ml AlphaFold/2.3.2
$ which run_alphafold.sh
/scicore/soft/easybuild/apps/AlphaFold/2.3.2/bin/run_alphafold.sh

Under the hood, these scripts will run AlphaFold as if it was run via docker container. See the official documentation for information on the parameters you can pass to the script.

use_gpu_relax error

For AlphaFold > 2.1.1 you need to specify if you want the Relax step to run on CPU or GPU.

Running on GPU is faster but is not hardcoded on the scripts, so you need to specify it when running AlphaFold by providing either --use_gpu_relax=true for GPU or --use_gpu_relax=false for CPU

The monomer is the original model used at CASP14 with no ensembling. Here’s an example SLURM script to run AlphaFold with a monomer:

#!/bin/bash
#SBATCH --job-name=AlphaFold-Monomer-Example
#SBATCH --time=01:00:00
#SBATCH --mem=64G
#SBATCH --cpus-per-task=8
#SBATCH --partition=a100
#SBATCH --gres=gpu:1

module load AlphaFold/2.3.2

run_alphafold.sh \
    --fasta_paths /scicore/home/group/user/path/to/fasta/fasta.fa \
    --output_dir /scicore/home/group/user/path/to/output/ \
    --max_template_date 2021-09-01

Tip

db_preset can be changed by updating the environment variable DB_PRESET (defaults to full_dbs)

Example:

export DB_PRESET="reduced_dbs"
run_alphafold.sh <remaining parameters>

Similarly, model_preset can be changed by updating the environment variable MODEL_PRESET_MONOMER (defaults to monomer)

The multimer is AlphaFold’s Multimer model. Here’s an example SLURM script to run AlphaFold with a multimer:

#!/bin/bash
#SBATCH --job-name=AlphaFold-Monomer-Example
#SBATCH --time=01:00:00
#SBATCH --mem=64G
#SBATCH --cpus-per-task=8
#SBATCH --partition=a100
#SBATCH --gres=gpu:1

module load AlphaFold/2.2.0

run_alphafold_multimer.sh \
    --fasta_paths /scicore/home/group/user/path/to/fasta/fasta.fa \
    --output_dir /scicore/home/group/user/path/to/output/ \
    --max_template_date 2021-09-01

Info

For multimers, model_preset currently only works with “multimer”

Loading and using LLMs on the cluster¶

The compute nodes do not in general have access to the Internet. So you may have run into errors when trying to follow an LLM tutorial online because the model files could not be fetched at runtime.

The simple solution to that is to download the model files you need to the cluster beforehand. Then point to these local files when running your code from a compute node.

For example, if you have huggingface-cli installed in your (virtual) environment and want to download the Qwen/Qwen3-4B model, you can run the following from a transfer node:

MODEL=Qwen/Qwen3-4B
OUTDIR=path/to/your/local/Qwen3-4B

mkdir -p ${OUTDIR}

huggingface-cli download ${MODEL} config.json model*safetensors* tokenizer* \
    --local-dir ${OUTDIR} \
    --local-dir-use-symlinks False

Info

Adapt this command to the virtual environment manager you are using. For example, if using uv, prepend huggingface-cli with uv run.

This will download the model files to the specified OUTDIR.

Then, when running your code from a compute node, you can point to the local files like this:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "path/to/your/local/Qwen3-4B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

Note

This strategy applies in general for any framework that tries to download files at runtime. Make sure to check the framework’s documentation for instructions on how to pre-fetch files to the local machine.

SRA Toolkit¶

The Sequence Read Archive (SRA) Toolkit a collection of tools and libraries for using data in the The International Nucleotide Sequence Database Collaboration (INSDC).

The SRA Toolkit is available via the cluster’s module system. To explore the available versions, run:

ml spider SRA-Toolkit

Once you load a version with ml SRA-Toolkit/<version>, commands like fastq-dump and fasterq-dump will be available to you. Check the official documentation for more information on commands.

Running PostgreSQL on the cluster¶

PostgreSQL is a popular object-relational database system, but running it on the cluster requires some configuration due to the lack of root privileges. Below are two approaches that run PostgreSQL in an isolated environment within your user space.

Note

This guide only shows how to run PostgreSQL on the cluster. For information on commands and workflows, please refer to the official documentation or to PostgreSQL tutorials.

Via Apptainer¶

The Apptainer container system is available by default to all users:

$ which apptainer
/usr/bin/apptainer

Info

See this page for more information of running containers on the cluster.

To run PostgreSQL with Apptainer, you must first pull a PostgreSQL image from a container registry. For example, you can use the official PostgreSQL image from Docker Hub:

apptainer pull docker://postgres:latest

Tip

If you need a specific version of PostgreSQL, you can specify the version tag in the image name, e.g., docker://postgres:17.5.

This should create a file named postgres_latest.sif in your current directory. This is the image where the PostgreSQL commands will run from.

Now, create folders to store your PostgreSQL databases and temporary files:

mkdir -p $HOME/pg_data
mkdir -p $HOME/pg_tmp

Pointing to the image you pulled, you can start a PostgreSQL server with the in the login node:

$ apptainer instance start \
    -B $PG_DBS_PATH:/var/lib/postgresql/data \
    -B $PG_TMP_PATH:/var/run/postgresql \
    postgres_latest.sif \
    postgres
INFO:    instance started successfully

Note

Remember to change the image path (postgres_latest.sif) if you used a different version or save path.

You can verify that the instance is running with:

apptainer instance list

Now, connect to the PostgreSQL server from inside the container:

$ apptainer shell instance://postgres
Apptainer>

Inside the container, all PostgreSQL commands are available to you. For example, you can initialize a new database with:

initdb "./my_db"

When you’re done working, you should stop the PostgreSQL instance with:

apptainer instance stop postgres

Info

If you stop the postgres container but you don’t delete the data in $PG_DBS_PATH your databases will be saved so you can reuse them in the future by booting the container again

Via pixi¶

PostgreSQL is available on conda-forge. So if you have the pixi package-manager in your user space, you can install PostgreSQL as a global tool with the following command:

pixi global install postgresql

This exposes commands like createdb, initdb, pg_ctl, psql, … to the user’s PATH:

$ which psql
/scicore/home/pi/user/.pixi/bin/psql

You can then use PostgreSQL commands as you would normally do if you had root access:

# Initialize database at the `my_db` directory
initdb "./my_db"

Tip

This strategy of installing global tools via pixi is generalizable to any software available on conda-forge. The tool will be installed in its own isolated “sandbox” and will not interfere with other tools or packages in your user space. See pixi’s Global Tools page for more information.

Similarly, for software available on PyPI you can use the ‘tool’ concept of the uv package manager.