Skip to content

Storage & Data Management

Overview of HPC Cluster Data and Storage Management at sciCORE

Data and storage management in an HPC environment like sciCORE is a critical component of high-performance computing workflows. It involves:

  • Efficient handling of large datasets (typically ≥100 GB or thousands of files) to ensure optimal performance and scalability.
  • Choosing the appropriate storage strategy based on the nature of the computation—whether it involves active processing, long-term retention, or group collaboration.
  • Ensuring data safety and accessibility, including routine backups, access controls, and adherence to university IT security policies.
  • Supporting data management planning (DMPs) to facilitate structured and reproducible research, including secure storage, backup policies, and long-term preservation.
  • Providing consultation to help researchers avoid performance bottlenecks and adopt best practices for handling big data.

Proper data management on sciCORE allows researchers to dramatically speed up computational tasks and reduce resource waste, particularly when handling large-scale or sensitive datasets.

Types of storage at the sciCORE HPC clusters

sciCORE provides two main user-accessible storage types:

1. Home Storage /scicore/home

  • Use case: For storing scripts, datasets, etc.
  • Backup policy: Automatically backed up daily. However, directories containing nobackup in the name are excluded and considered volatile.
  • Group collaboration: Most research groups have a shared GROUP subdirectory for datasets and tools used by all members (/scicore/home/<groupname>/GROUP).

2. Local Scratch Storage

  • Use case: Temporary storage for high-throughput computation (e.g., intermediate files, large simulations).
  • Performance: Faster than home storage, with no backup—data is ephemeral and may be purged.
  • Best practice: Use for performance-critical tasks where I/O speed is essential.

Data Transfer

For security reasons, FTP access is disabled on sciCORE. Use secure methods such as:

  • ssh
  • scp
  • sftp
  • rsync

Basic Transfer Commands

# Using scp
scp <options> source destination

# From cluster to local
scp username@transfer12.scicore.unibas.ch:/scicore/home/group/username/file.txt /mnt/c/Users/username/Documents/

# From local to cluster
scp /mnt/c/Users/file.txt username@transfer12.scicore.unibas.ch:/scicore/home/group/username/

rsync Examples

# Using rsync to copy/sync a file
rsync <options> source destination

# Copy a file to sciCORE
rsync /mnt/c/Users/username/Documents/backup.tar username@transfer12.scicore.unibas.ch:/scicore/home/group/username/

# Copy a file from sciCORE
rsync username@transfer12.scicore.unibas.ch:/scicore/home/group/username/backup.tar /mnt/c/Users/username/Documents/

Graphical User Interface (GUI) Tools to Access the Cluster on Windows

  • MobaXterm : A versatile SSH client for Windows that includes an integrated SFTP browser, allowing drag-and-drop file transfers between your local machine and the cluster. Ideal for users who want a terminal and file transfer interface in a single application.
  • WinSCP: A dedicated, lightweight client for secure file transfer (SFTP/SCP) between Windows and the cluster. It offers a familiar file explorer-style interface and is a reliable alternative for users focused solely on data transfer.

Synchronize with SwitchDrive

Info

SwitchDrive (drive.switch.ch) is a cloud storage service offering 25 GB of free storage to members of the Swiss academic community. Based on the OwnCloud platform, it allows easy synchronization of files between local systems and the cloud.

Files stored on SwitchDrive can be synced to HPC environments using the owncloudcmd command-line client, making it a useful tool for transferring or backing up research data.

Examples

# Sync entire drive
owncloudcmd local_dir https://drive.switch.ch/

# Sync specific subdirectory
owncloudcmd local_dir/local_sub_dir https://drive.switch.ch/remote.php/webdav/remotedir

Tip

Avoid using --user and --password. Use a .netrc file for automation.


Importing External Data (Nextcloud)

sciCORE provides a Nextcloud service for securely receiving data from external collaborators who do not have sciCORE access. Data can be imported efficiently using the rclone tool for command-line synchronization.

Info

Default Nextcloud quota: 200 GB per user. Contact sciCORE if more is needed.

Steps

1. Create & share upload folder

  • Log in at io.scicore.unibas.ch
  • Create folder
  • Share it by email (select “email” option, enable “Can edit”, “File drop”, expiration date)

2. External user uploads

  • Via browser or rclone
  • rclone webDAV config instructions: see below

3. Configure rclone

rclone config
# Follow interactive prompts:
# type = webdav
# url = https://io.scicore.unibas.ch
# vendor = nextcloud
# user = <sciCORE username>

4. Download from Nextcloud

# List contents
rclone ls nextcloud:remote.php/webdav/upload_folder/

# Check size
rclone size nextcloud:remote.php/webdav/upload_folder

# Download single file
rclone -v copy nextcloud:remote.php/webdav/upload_folder/shared_file ./

# Download entire folder
rclone -v copy nextcloud:remote.php/webdav/upload_folder ./upload_folder

Tip

Avoid retyping passwords:

read -sp "Password:" RCLONE_CONFIG_PASS && echo && export RCLONE_CONFIG_PASS


Share Data from sciCORE to Externals (Nextcloud)

Use rclone to push data from the cluster to Nextcloud.

Upload Examples

# Create remote folder
rclone mkdir nextcloud:remote.php/webdav/folder_to_share

# Upload file
rclone -v copy ./file_to_share nextcloud:remote.php/webdav/folder_to_share/

# Upload directory
rclone -v copy ./subdir_to_share nextcloud:remote.php/webdav/folder_to_share/subdir_to_share

# Sync local to remote
rclone -v sync ./subdir_to_share nextcloud:remote.php/webdav/folder_to_share/subdir_to_share

Enable Sharing via Web Interface

  1. Go to io.scicore.unibas.ch
  2. Click the share icon on the folder
  3. Add collaborators’ emails (select “email” option)
  4. Optional: Set expiration date, disable share

External User rclone Configuration

For links like https://io.scicore.unibas.ch/s/xyz123, external users should configure:

[nextcloud_shared]
type = webdav
url = https://io.scicore.unibas.ch
vendor = nextcloud
user = xyz123
pass = *********

Then use:

rclone ls nextcloud_shared:public.php/webdav
rclone copyto ./file nextcloud_shared:public.php/webdav/

Alternatives to Nextcloud for Sharing

Nextcloud limitations

  • Only supports password-protected folders for sharing
  • Shared links are time-limited, not permanent

For public, persistent sharing, consider:

  • Zenodo - An open-access repository hosted by CERN for sharing research data, publications, and software with DOI support.
  • DRYAD - A curated resource for publishing and sharing research datasets, particularly in the life sciences.
  • re3data.org - A global registry of research data repositories across disciplines, supporting FAIR data principles.
  • SwitchDrive - A cloud storage service offering 25 GB of free, persistent storage to Swiss academic users.
  • FileSender - A secure platform for one-time, large file transfers to collaborators, including those outside your institution.

Tip

Review legal implications when using commercial services like Dropbox, especially with sensitive data.

Deep Storage

Deep storage is an offline solution for storing large volumes of raw data that are not needed on a regular basis. It is more economical than using your sciCORE home directory or sciCORE share see the sciCORE (user fees).

Deep storage is not connected to the sciCORE cluster compute infrastructure. Data is archived in the University IT-Services tape library. Access latency is significant — typically several weeks.

Info

Two redundant copies are stored on different tapes to ensure data safety.

When to Use Deep Storage

Example: A graduated PhD student leaves the lab, and their raw data must be archived for future reference.

How It Works

  1. Submit a request via the sciCORE user space.
  2. Admin review and processing:
    • The directory is renamed and user permissions are revoked.
    • After successful transfer to tape, the directory is deleted.
    • Metadata is archived to /scicore/home/<PI>/GROUP/deepstorage.

Ownership of the data is transferred to the PI of the research group.

Eligibility Criteria

  • Data must reside in a single parent directory (internal structure is flexible).
  • Size must be between 500 GB and 10 TB.
  • Retrieval should be infrequent; deep storage is not meant for active access.

Retrieving Data

Warning

Deep storage data can be retrieved but this is a slow process (minimum of 2 weeks) and should be treated as an exception, not a regular operation.

Deep Storage vs. Backup

Feature Deep Storage Backup
Purpose Long-term archiving Short-term recovery
Snapshots One-time snapshot Rolling snapshots over time
Access Very slow (weeks) Fast (for recent files)
Source Data Deleted after storage Remains in original location

How to Request Deep Storage

  1. Login to your user space.
  2. Click “Send a request” under the Deep Storage section.
  3. Fill out the form:
    • Select the storage system.
    • Provide the absolute path to the target directory (e.g. /scicore/home/<group>/<user>/<directory>).
    • Complete all required fields (bold in the form).
    • Optionally, add comments for the sciCORE admins.
  4. Submit the form.

A confirmation email will summarize your request. Processing time may vary depending on system load. sciCORE will validate the path before processing.

After Submission

Once processed, the original directory will be renamed to <directory>.toberemoved and deleted once verified.

It will contain 5 metadata files per archive:

  • *.json - Metadata submitted by the user
  • *.manifest - Full file listing with permissions and sizes
  • *.md5sum - File list with MD5 checksums
  • *.size - Total compressed archive size (bytes)
  • *.checksum - Checksum of the .tar.bz2 archive stored on tape

PI View

PIs can view total deep storage usage in the Cluster Usage section of the user space.

Deep storage move from PUMA SMB Shares

PUMA is not attached to the cluster, so paths cannot be autocompleted. Instead, provide:

smb_share_name/path_to_data

Data Encryption at sciCORE

Encryption converts readable data into a ciphered format that only authorized parties can decode. It is essential when:

  • Exchanging sensitive data
  • Storing sensitive data long-term

Warning

Some providers (e.g. NIH dbGaP) may mandate deletion or encryption of data after research concludes.


Symmetric vs Asymmetric Encryption

When handling sensitive data, especially in research, encryption ensures confidentiality and controlled access. Two main approaches exist—symmetric and asymmetric encryption—each suited to different scenarios depending on the use case, security requirements, and collaboration needs.

Feature Symmetric Encryption Asymmetric Encryption
Key Type One shared secret key Public/Private key pair
Use Case Long-term storage Secure data exchange
Common Tool 7zip GPG
Pros Fast, simple to implement No need to share private key, good for open sharing with identity verification
Cons Key must be shared securely Slower and more complex to set up and manage
When to Use Encrypting archived project files or backups for yourself or internal use Exchanging sensitive data with external collaborators or submitting to secure repositories

7zip (Symmetric Encryption)

Best for long-term storage of sensitive data with no ongoing need to share passwords.

Example Use Cases
  • Retaining data for reproducibility (e.g., for publication, patents)
  • Archiving research data where deletion is not feasible

Warning

  • Do not use 7zip to exchange sensitive data
  • Never share your encryption password

GPG (Asymmetric Encryption)

Best for data exchange, especially when subject to data protection laws.

  • Secure transfer with public/private keys
  • Can also be used for long-term storage

Both 7zip and GPG support AES-256, an industry-standard cipher.


Encrypting Large Datasets

Encrypting large data requires significant time and memory. Avoid running these jobs on login nodes.

Use SLURM Interactive Sessions

Launch an interactive job:

srun --pty bash -i

Once inside a compute node:

# Recommended way to ensure the job continues even if terminal closes
nohup <encryption command> &

Warning

You will need to manually enter your encryption passphrase inside the session.


For assistance with encryption or large data handling, contact: scicore-admin@unibas.ch

Storage Quotas

Online storage is a shared and limited resource on sciCORE. Disk quotas and automatic notifications are used to promote sustainable usage and prevent critical failures due to unexpected data growth.

What Are Quotas?

Quotas help manage disk usage in two dimensions:

  • Block quota: Limits the total size of data on disk.
  • File quota: Limits the number of files and directories stored.

Each quota is configured with:

  • Soft quota: A warning threshold. Users can exceed it temporarily (within a grace period of 15 days), but must take action to reduce usage. Notification is sent.
  • Hard quota: A strict limit. Once reached, write operations (e.g. saving files) are no longer possible.

Quotas in sciCORE

Quotas are enforced in both the sciCORE compute cluster and PUMA storage environment.

  • Quotas apply per volume (cluster) or per share (PUMA), not per user.
  • Only block quotas (soft and hard) are active by default.
  • Default quota: 1 TB soft / 1.25 TB hard, unless specified otherwise.
  • Notifications are sent:
    • To users (cluster volumes)
    • To share owners (PUMA shares)

How to Check Your Quota and Storage Usage

On sciCORE Cluster

Check your group’s home volume
df -h $HOME

Example output:

Filesystem                                                Size  Used Avail Use% Mounted on
toucan-nfsi.cluster.bc2.ch:/export/sci_fs01/home/mygroup   12T    8T    4T  67% /scicore/home/mygroup

Info

Size reflects the hard quota, not the soft limit.

Check a specific project volume
df -h /scicore/projects/myproject

Example output:

Filesystem                                                      Size  Used Avail Use% Mounted on
toucan-nfsi.cluster.bc2.ch:/export/sci_fs02/projects/myproject  100T   80T   20T  80% /scicore/projects/myproject

On PUMA (Remote SMB/NFS Shares)

Windows
  1. Right-click on the share in File ExplorerProperties.
  2. View “Used space” and “Capacity”.
macOS - Graphical
  1. In Finder, go to Go > Computer (or press ⇧⌘C).
  2. Right-click on the share → Get Info.
  3. View “Used” and “Capacity”.
macOS - Terminal
df -h /Volumes/mySMBshare$

Example:

Filesystem                                                 Size   Used  Avail Capacity
//user@scicore-puma.../mySMBshare%24                       45Ti   57Gi   45Ti     1%
Linux (SMB)
df -h /mnt/mySMBshare

Example:

Filesystem                                      Size   Used Avail Use% Mounted on
//scicore-puma.../mySMBshare$                  2.0T   200G  1.8T  10% /mnt/mySMBshare
Linux (NFS)
df -h /mnt/myNFSshare

Example:

Filesystem                                                 Size  Used Avail Use% Mounted on
scicore-puma...:/gpfs/nasfs02/myNFSshare                  2.0T  200G  1.8T  10% /mnt/myNFSshare

Info

Size reflects the hard quota, not the soft limit.


Visual Usage Overview

In the sciCORE user space, authenticated users can monitor storage consumption across:

  • Compute cluster volumes
  • PUMA shares

By default: - Users see their own usage. - PIs see total group usage. - Deputies (designated by the PI) have PI-level visibility.

Tip

Want to assign a deputy? Contact scicore-admin@unibas.ch.

Usage information may take up to 24 hours to update.


What to Do If You Hit Your Quota?

Soft quota reached?

The grace period (15 days) starts. During this time, usage must be reduced to avoid enforcement of the limit.

Strategies to reduce usage

  • Delete unnecessary or temporary data
  • Move old data to Deep Storage
  • Compress directories using .tar.gz, .zip, or cjarchiver (in the cluster)

If you’re unable to resolve the issue, or need more space, contact: scicore-admin@unibas.ch

Data Cleaning Guidelines

Regular data cleaning is essential to optimize your storage use, reduce group costs, and maintain system sustainability at sciCORE.

Why Clean?

  • Free up precious and limited online storage.
  • Keep storage costs low.
  • Improve your workflow efficiency.

If cleaning is not feasible, consider:

  • Archiving “dormant” data to Deep Storage.
  • Compressing files (.tar.gz, .zip, cjarchiver)

Where to Clean From

Transfer Login Nodes

Use the dedicated transfer login nodes for heavy I/O operations:

ssh -Y login-transfer.scicore.unibas.ch

Warning

Do not use login12.scicore.unibas.ch. Long-running I/O jobs will be terminated.

Compute Node (via Slurm)

For very large datasets (>1 TB or >1 million files), open an interactive shell on a compute node:

srun --pty bash -i

Note

Defaults: --time=06:00:00, --qos=6hours. See the Slurm guide for more.


What to Clean

Use standard Linux tools: find, du, ncdu.

Tip

Only clean data you own or have permission to access. Contact scicore-admin@unibas.ch if you need access to data owned by inactive users.

Here are some examples:

Find old files (last accessed > 1 year)
find $HOME -atime +365
Find large files (> 1 GB)
find $HOME -size +1G
Find files by user or group
find /scicore/home/mygroup/GROUP/ -user username
find /scicore/projects/myproject/ -group groupname
Find files by name
find $HOME -iname '*myoldproject*'
Combine filters
find /scicore/home/mygroup/GROUP/ -user username -size +1G -atime +365
Directory size (single)
du -sh ./myfolder
Directory size (all in current dir)
du -sh *
Browse disk usage with ncdu
ncdu
Save and reload ncdu results
ncdu -1xo- | gzip > export.gz
zcat export.gz | ncdu -f-

How to Clean

Warning

Deleted files are non-recoverable. Double-check before deleting!

Avoid Triggering Unnecessary Backups

Name transient directories *_NOBACKUP to avoid triggering full backups.


Deleting Files

Delete single file
rm -i ./myfile   # Prompt confirmation
rm -f ./myfile   # Force delete
Delete all .iso files in a directory
rm -i ./mydir/*.iso
rm -f ./mydir/*.iso
Delete folder and all contents
rm -rf ./mydir        # Force delete recursively
rm -rfv ./mydir       # Verbose mode
rm -ri ./mydir        # Interactive (safer)

Advanced: Selective Deletion

Delete large, old files in a folder
find ./mydir -size +1G -atime +365 -exec rm -f {} \;
find ./mydir -size +1G -atime +365 | xargs rm

Note

Need help with cleaning? Contact scicore-admin@unibas.ch