Cluster quick start¶
This section is a quick start for someone who is somewhat familiar with Linux/Unix and is looking at how to start using the University of Tartu UTHPC clusters quickly.
If you're not familiar with UTHPC and Linux in general, many beginner tutorials are available online. UTHPC team is also actively working on providing a better guides.
Request an account¶
In order to open an account with UTHPC, please fill out the form here to ensure that UTHPC team receives all the necessary information to quickly create the account.
Alternatively you can email your request to email@example.com . If you've already got a UT account, please provide your username with the email. If you are a student, you must also CC your supervisor in the email.
To request an access to Galaxy Tartu Ülikool , please fill out the form here to assure a quick response. Alternatively you can send an email to firstname.lastname@example.org but please include also your UT account username.
For more information about Galaxy Tartu Ülikool go to our galaxy.hpc.ut.ee docs
You can access UTHPC cluster from anywhere, but for security reasons please use either of the following:
- Be physically in a university building.
- Connect from a remote location utilizing UT VPN .
To connect from a Unix-like system like Linux, macOS, WSL, use a Secure Shell Protocol called SSH to log in to
rocket.hpc.ut.ee with your UT credentials:
To connect from a Windows system, please follow guide for PuTTY or use Windows Subsystem for Linux (WSL). WSL is highly recommended.
Your home directory¶
The home directory, which makes all files and directories available on all cluster nodes, resides on a shared file system called
Quotas manage the Disk space consumption. There are two types of quotas - directory size and file count. By default, a user has 2 TB of
$HOME space and a maximum file count of 1 million files.
Please keep your home directory clean by regularly cleaning old data.
To see your quota, you can use the
myquota command to see your maximum limits and current usage.
There are multiple ways to transfer files between a local machine and the cluster, mainly depending on your local operating system. For a Unix-like OS, you can use
sftp commands on the command line. If you are on Windows or prefer a Graphical User Interface, FileZilla is one of a tools that you can use.
More comprehensive guides are available here: File Transfer to/out of the cluster
To copy data to the cluster from your local machine, use the secure copy command
scp /path/to/file <username>@rocket.hpc.ut.ee:/path/to/target_dir/
To retrieve data from the cluster to your local machine:
scp <username>@rocket.hpc.ut.ee:/path/to/file /path/to/target_dir/
You can make use of already pre-installed software, or you can compile and install software on your own. UTHPC uses an environment module system to make software and specific versions available to users:
For example on searching and loading ’python’ software.
Check the available ’python’ versions on cluster:
module av python
Load the desired version of ’python’:
module load python/3.8.6
List loaded modules:
Loaded software is only for operating in the current terminal session. If you open a new session, it's a blank slate. Therefore, it's advisable to specify and load the needed modules in your job script.
For a more thorough guide on modules, please go to Modules guide
The cluster utilizes a scheduler called Slurm to control job execution and distribute running jobs across available physical resources like memory and CPU cores.
The following is an example of how to run your first job. A job script (sbatch file) consists of two main parts - instructions for the scheduler and the actual commands to run for the job, which operate your choice of software. Start with the scheduler instructions:
#!/bin/bash #SBATCH -J hello_world #SBATCH --partition=testing #SBATCH -t 1:00:00 #SBATCH --cpus-per-task=1 #SBATCH --mem=500 # your code goes below
#!/bin/bash #SBATCH -J hello_world #SBATCH --partition=testing #SBATCH --allocation="ealloc_905b0_something" #SBATCH -t 1:00:00 #SBATCH --cpus-per-task=1 #SBATCH --mem=500 # your code goes below
ETAIS users can only submit jobs when they use the proper allocation. The allocation specifies which ETAIS organization and project is billed for the job.
You can get the information about which allocation name to use from the https://minu.etais.ee website, by going to the appropriate organization's UTHPC resource, where's written how to submit a job.
Then add the part for loading software and running a command:
module load python/3.8.6 python -c 'print ("Hello world!")'
The finalized job script looks like this and you should save it into a file, for example
#!/bin/bash #SBATCH -J hello_world #SBATCH --partition=testing #SBATCH -t 1:00:00 #SBATCH --cpus-per-task=1 #SBATCH --mem=500 # your code goes below module load python/3.8.6 python -c 'print ("Hello world!")'
Submit your job¶
Once you have a job definition script, you can submit your job script to the scheduler. Scheduler allocates the requested resources for your job and give you a job id. If the requested resources are available, your job start immediately. Otherwise, the job stays in queue until sufficient resources are available. To submit your job to Slurm, use the
Submitted batch job 15304092
Running jobs directly on the cluster, without the queue system, is strictly forbidden and the jobs are killed!
There are various options for different kinds of jobs in cluster. Please review the following sections for more information Submitting Jobs , GPU Computing , Interactive Jobs for more information.
Monitor your job¶
You can inspect the status of your running jobs with the
squeue -j 15304092
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 15304092 testing hello_wo test_user R 0:10 1 stage43
R). The job runs on the ’testing’ partition on the node ’stage43’ for 10 seconds.
Be aware, that if the requested resources aren't available, the job status is ’PENDING’ (
PD). The job is in the queue, and starts as soon as the requested resources are available.
You can also see all active submitted jobs with
squeue -u <test_user>
Cancel your job¶
You can cancel your job via the
scancel command by passing the job ID as an argument.