Skip to content

Introduction - Software and modules

Welcome to the software and modules lab. In this session we will be going over how scientific software is built, how to manage it and how to best make it work for you.

Building software

If you start using the cluster you probably have an idea of what kind of software you will need. In the first part of this session we will go over how to install it for yourself.

We will be using a package called tcl/tk in this course as an example.

First package

There are a few steps that you will need to follow in order to successfully install and use your software. Here are they in order:

  • Find and download the software
  • Follow the hopefully included guide to install it on a Linux system
  • debug as necessary
  • set the proper environment variables

To start off you need to download the software. There are multiple ways to achieve this either using git, rsync, wget etc. We will be using the latter:

wget http://prdownloads.sourceforge.net/tcl/tcl8.6.13-src.tar.gz

This will download the tcl8.6.13-src.tar.gz in to your current directory. Unpack it to get the tcl8.6.13 directory with:

tar xzvf tcl8.6.13-src.tar.gz

You now have the software downloaded. Now to get on with installing it. Usually the guides are found at the README.md file in the root of the installation directory. In the Tcl case it can be found either at the tcl website or in the tcl8.6.13/unix/README file.

Complete

First, create directories called lab2 and lab2/tcl in the course directory, we will be using this as an installation destination. Next, move to the tcl8.6.13/unix directory.

The tcl documentation is quite thorough but we will be following the classic C language style mantra of configure-make-make install It should go like this:

./configure --prefix=/path/to/course/directory/lab2/tcl
make 
make test
make install

What we did was:

  • ./configure runs the bundled configure script to set variables according to the system. The --prefix flag sets the final destination of the install.
  • make compiles the software
  • make test (optional) runs included tests
  • make install moves the compiled software to the installation prefix

The whole process for Tcl should take a couple of minutes and produce a lot of informational output.

Accessing your software

You now have successfully installed Tcl. If you look in the destination tcl directory then you will see some subdirectories.

bin  include  lib  man  share

To run the software we simply need to run bin/tclsh8.6 and it will open a Tcl shell. You can exit it by entering exit. If you try to run the other program in the directory with bin/sqlite3_analyzer then you will most likely encounter an error.

Environment variables

We will be looking at some environment variables that make our software more accessible and, in most cases, usable.

  • $PATH Contains the directories that host the executable binaries, usually the bin directory in a prefix
  • $LD_LIBRARY_PATH directs to the directories hosting libraries, usually lib or lib64
  • CPATH points to the header files that are used by compilers, usually include . This won't be needed this time but it's good to know

You can see the contents of the variables with echo $PATH for example. We will be setting the variables to our software with:

export PATH=$PATH:/path/to/course/dir/lab2/tcl/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/course/dir/lab2/tcl/lib

With the PATH variable set you can now run the tclsh8.6 command from anywhere on your command line. Make sure to always use the PATH=$PATH:/path format as this appends your path to the end of the preexisting one. Otherwise you would overwrite it and you will lose access to all other commands. Run which tclsh8.6 to see where the executable is located.

These export commands work only in the current terminal session until you close it. You could append the export lines to your .bashrc file in your home directory to have them available when you log in.

This whole approach is a bit tedious but doable. If you only have a couple of software packages to install then it will not be too much of an issue to manage. In the following chapters we will show how all of this can be made trivial for the user.

Build systems

The tcl installation that we looked at was written in C. Here are some high level examples on how other language packages might be installed:

Python packages are usually managed by package managers pip or conda. We will go more in depth with those in the environments/containers session but the general workflow is:

conda install package
#or
pip install package
These will try to install in to a central directory and if they fail, to your user default location. If you wish to specify the prefix then you can use --target= and --prefix for conda and pip respectively.

R has a built in package manager that installs packages from cran

R
> install.packages("package")
#or
R CMD INSTALL package
There are some other tools like devtools and bioconductor that might need to be enabled separately but those are a bit more advanced.

With C based languages you can use either the configure, make, make install pipeline demonstrated earlier or the package might come with cmake.

cmake -DCMAKE_PREFIX_PATH=/path/to/destination /path/to/build/dir
make 
make install
Usually cmake is run in a separate build directory inside of the downloaded software directory so it is mainly executed with cmake ..

Perl packages are installed with the cpanm package manager.

cpanm PACKAGE::subpackage
#or use perl directy
perl INSTALL.pl
The installation is usually quite package-specific when using perl.

Using modules

Since a lot of users tend to use the same software, having everyone install their own would be unnecessarily repetitive. This is why most HPC centers have a centralized software stack.

Intro to Lmod

One of the most popular ways to make your centralised software available is via Lmod. Lmod provides a system to manage and load module files

What is a module file

Below is an example of a module file, in this case for tcl/8.6.12. You will not need to remember what the contents are or write them yourself but it is good to see one to get a better basic understanding of what modules can do for you.

whatis([[Name : tcl]])
whatis([[Version : 8.6.12]])
whatis([[Target : x86_64]])
whatis([[Short description : Tcl (Tool Command Language) is a very powerful but easy to learn dynamic programming language, suitable for a very wide range of uses, including web and desktop applications, networking, administration, testing and many more. Open source and business-friendly, Tcl is a mature yet evolving language that is truly cross platform, easily deployed and highly extensible.]])

help([[Tcl (Tool Command Language) is a very powerful but easy to learn dynamic
programming language, suitable for a very wide range of uses, including
web and desktop applications, networking, administration, testing and
many more. Open source and business-friendly, Tcl is a mature yet
evolving language that is truly cross platform, easily deployed and
highly extensible.]])


depends_on("zlib/1.2.11")

prepend_path("PATH", "/gpfs/space/software/cluster_software/spack/linux-centos7-x86_64/gcc-9.2.0/tcl-8.6.12-kyut5qehakkghrvfsfca5itzgupr3zye/bin", ":")
prepend_path("MANPATH", "/gpfs/space/software/cluster_software/spack/linux-centos7-x86_64/gcc-9.2.0/tcl-8.6.12-kyut5qehakkghrvfsfca5itzgupr3zye/man", ":")
prepend_path("MANPATH", "/gpfs/space/software/cluster_software/spack/linux-centos7-x86_64/gcc-9.2.0/tcl-8.6.12-kyut5qehakkghrvfsfca5itzgupr3zye/share/man", ":")
prepend_path("LIBRARY_PATH", "/gpfs/space/software/cluster_software/spack/linux-centos7-x86_64/gcc-9.2.0/tcl-8.6.12-kyut5qehakkghrvfsfca5itzgupr3zye/lib", ":")
prepend_path("LD_LIBRARY_PATH", "/gpfs/space/software/cluster_software/spack/linux-centos7-x86_64/gcc-9.2.0/tcl-8.6.12-kyut5qehakkghrvfsfca5itzgupr3zye/lib", ":")
prepend_path("CPATH", "/gpfs/space/software/cluster_software/spack/linux-centos7-x86_64/gcc-9.2.0/tcl-8.6.12-kyut5qehakkghrvfsfca5itzgupr3zye/include", ":")
prepend_path("PKG_CONFIG_PATH", "/gpfs/space/software/cluster_software/spack/linux-centos7-x86_64/gcc-9.2.0/tcl-8.6.12-kyut5qehakkghrvfsfca5itzgupr3zye/lib/pkgconfig", ":")
prepend_path("CMAKE_PREFIX_PATH", "/gpfs/space/software/cluster_software/spack/linux-centos7-x86_64/gcc-9.2.0/tcl-8.6.12-kyut5qehakkghrvfsfca5itzgupr3zye/", ":")
setenv("TCL_LIBRARY", "/gpfs/space/software/cluster_software/spack/linux-centos7-x86_64/gcc-9.2.0/tcl-8.6.12-kyut5qehakkghrvfsfca5itzgupr3zye/lib/tcl8.6")
setenv("TCL_ROOT", "/gpfs/space/software/cluster_software/spack/linux-centos7-x86_64/gcc-9.2.0/tcl-8.6.12-kyut5qehakkghrvfsfca5itzgupr3zye")

execute{cmd="logger -t module -p local6.info DATE=$(date +%FT%T) HOSTNAME=$(hostname -s) USER=$(whoami) JOB=" .. (os.getenv("SLURM_JOB_ID") or "NOJOB") .. " APP=tcl " .. "VER=8.6.12", modeA={"load"}}

Finding, loading and managing modules

Navigating the module system is done with the module command. module is actually a function but it is better to treat it as a command in this guide. First of all we would like to see what modules are available. Please run module avail to get a complete list. This list is quite huge, we currently have thousands of different package versions available. To search for something specific, we can append a keyword like module avail tcl.

To load the module you have to use the module load command. In our case you would use module load tcl/8.6.12. The format is a standard name/version . As before you can run which tclsh8.6 to see where your executable is located. Please note that loaded modules only affect your current terminal session and are unloaded when you close it. You can also use module unload tcl to unload a module or module purge to unload all modules for a clean sheet.

Using modules in jobs

Complete

To use the loaded software in a job you simply need to add the same loading commands to your job.

...
#SBATCH 
...
module load tcl/8.6.12
sqlite3_analyzer --version
Run a job with these two lines in them and name the job lab2_module_job_<ANON NAME>.

Power of the module system

Compared to the way that we built the tcl package in the first part, loading a module for it makes the task fairly trivial. Loading a module removes the issue of adding specific paths to your environment variables each time you need a package. Lmod is also very popular so searching for solutions to problems online should pose no problem. The next sections the topic of what to do when the software you need is not in our module system.

Building software with spack

If you do not see your required software with module av you have generally three options: contact us at support@hpc.ut.ee, build it and set variables as before or build it automatically with spack

Intro to spack

Spack is a highly powerful package manager designed with HPC in mind. It supports building, deploying and managing software from multiple languages, all fully automated and without requiring admin privileges.

Start with installing spack into your home directory

git clone --depth=100 --branch=releases/v0.20 https://github.com/spack/spack.git ~/spack
. ~/spack/share/spack/setup-env.sh
Run the latter command to enable spack. You can also add it to .bashrc to activate it with every terminal session.

First package

Now that you have spack enabled, you have full access to it's functionality. We will be following the Spack getting started guide.

You can start by running spack list to see the available packages. Spack has 7360 packages at the time of writing so you can go to the Spack package index to get a simpler(and faster) overview.

We will continue with the tcl package as an example. Run spack info tcl to see information about the package. Next, run spack spec tcl

Spack specs

Spack specs are collections of descriptors that Spack uses to refer to a specific build configuration. The most important descriptors are:

  • @ denotes the version
  • % specifies the compiler
  • ~,+ and name=<value> specify variants
  • ^ specifies a dependency
  • / specifies a hash, this is rarely needed unless you need a very specific spec

An example from the Spack documentation: mpileaks @1.2:1.4 %gcc@4.7.5 +debug -qt target=x86_64 ^callpath @1.1 %gcc@4.7.2 This refers to:

  • mpileaks version between 1.2 and 1.4
  • compiler of Gcc version 4.7.5
  • has debug options and without qt support
  • target architecture is x86_64
  • has a dependency package called Callpath with version 1.1 and built with a different compiler

Specs are very powerful ways to customize how you want your packages built. You will most probably not need any extra descriptors besides the version one, but they are good to know.

Installing and using a package

Run spack install tcl to install Tcl. Run spack find to see all of the packages that you have installed. Next install Tcl with a different version with spack install tcl@8.6.10 and run spack find again to see the difference

Simply installing a package does not make it usable yet. Spack has two variants on how to load its packages. Spack automatically creates module files for its packages and running the setup-env.sh script automatically makes them loadable. If you run module avail tcl again, you will see new tcl modules that are differently named.

The other option is to use spack load <spec>, like spack load tcl@8.6.10 for example. The advantage with this one is that you can use the Spack spec syntax that is more versatile than what Lmod has to offer.

What can spack do for you

Spack has enough functionality and options to warrant a separate course for it. The spack documentation is vast and well written so its well worth checking out. Spack allows you to fine tune your software, have it ready with a few lines and even reproduce software environments. To get help with spack you can write to our support or join the Spack slack channel.

Tips and tricks

The module system can be a bit daunting at first glance. Hopefully this lab has given you a better understanding on the basics.

Here are some tips and tricks that did not fit in the guide that might help you in the future:

  • As you saw in the module file example, loading a module sets a variable called $TCL_ROOT. This references the base directory of the module location and gets set with every module, the TCL gets replaced with the proper name. If you ever need to reference software with a full path then using this variable is highly suggested. If we change any paths or directory structures in the background then your scripts will not break
  • Usually software errors are quite descriptive. It is well worth reading them before proceeding, your issue might be a missing directory, a dependent module is not loaded etc. If you are unable to decipher the error then our support email is always available.
  • It is best to keep your environment as clean and simple as possible. Try to load modules inside job scripts instead of using .bashrc. This lessens the probability of there being mismatches and errors.
  • Try to make your setup replicable. This mostly means using environments but those will be covered extensively in the environments and containers lab.