Introduction - Software and modules¶
Welcome to the software and modules lab. In this session we will be going over how scientific software is built, how to manage it and how to best make it work for you.
Building software¶
If you start using the cluster you probably have an idea of what kind of software you will need. In the first part of this session we will go over how to install it for yourself.
We will be using a package called tcl/tk in this course as an example.
First package¶
There are a few steps that you will need to follow in order to successfully install and use your software. Here are they in order:
- Find and download the software
- Follow the hopefully included guide to install it on a Linux system
- debug as necessary
- set the proper environment variables
To start off you need to download the software. There are multiple ways to achieve this either using git
, rsync
, wget
etc. We will be using the latter:
wget http://prdownloads.sourceforge.net/tcl/tcl8.6.13-src.tar.gz
This will download the tcl8.6.13-src.tar.gz
in to your current directory. Unpack it to get the tcl8.6.13 directory with:
tar xzvf tcl8.6.13-src.tar.gz
You now have the software downloaded. Now to get on with installing it. Usually the guides are found at the README.md
file in the root of the installation directory. In the Tcl case it can be found either at the tcl website or in the tcl8.6.13/unix/README
file.
Complete
First, create directories called lab2
and lab2/tcl
in the course directory, we will be using this as an installation destination. Next, move to the tcl8.6.13/unix directory
.
The tcl documentation is quite thorough but we will be following the classic C language style mantra of configure-make-make install
It should go like this:
./configure --prefix=/path/to/course/directory/lab2/tcl
make
make test
make install
What we did was:
./configure
runs the bundled configure script to set variables according to the system. The--prefix
flag sets the final destination of the install.make
compiles the softwaremake test
(optional) runs included testsmake install
moves the compiled software to the installation prefix
The whole process for Tcl should take a couple of minutes and produce a lot of informational output.
Accessing your software¶
You now have successfully installed Tcl. If you look in the destination tcl
directory then you will see some subdirectories.
bin include lib man share
To run the software we simply need to run bin/tclsh8.6
and it will open a Tcl shell. You can exit it by entering exit
. If you try to run the other program in the directory with bin/sqlite3_analyzer
then you will most likely encounter an error.
Environment variables¶
We will be looking at some environment variables that make our software more accessible and, in most cases, usable.
$PATH
Contains the directories that host the executable binaries, usually thebin
directory in a prefix$LD_LIBRARY_PATH
directs to the directories hosting libraries, usuallylib
orlib64
CPATH
points to the header files that are used by compilers, usuallyinclude
. This won't be needed this time but it's good to know
You can see the contents of the variables with echo $PATH
for example. We will be setting the variables to our software with:
export PATH=$PATH:/path/to/course/dir/lab2/tcl/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/course/dir/lab2/tcl/lib
With the PATH variable set you can now run the tclsh8.6
command from anywhere on your command line. Make sure to always use the PATH=$PATH:/path
format as this appends your path to the end of the preexisting one. Otherwise you would overwrite it and you will lose access to all other commands. Run which tclsh8.6
to see where the executable is located.
These export commands work only in the current terminal session until you close it. You could append the export lines to your .bashrc
file in your home directory to have them available when you log in.
This whole approach is a bit tedious but doable. If you only have a couple of software packages to install then it will not be too much of an issue to manage. In the following chapters we will show how all of this can be made trivial for the user.
Build systems¶
The tcl
installation that we looked at was written in C. Here are some high level examples on how other language packages might be installed:
Python packages are usually managed by package managers pip
or conda
. We will go more in depth with those in the environments/containers session but the general workflow is:
conda install package
#or
pip install package
--target=
and --prefix
for conda and pip respectively. R has a built in package manager that installs packages from cran
R
> install.packages("package")
#or
R CMD INSTALL package
devtools
and bioconductor
that might need to be enabled separately but those are a bit more advanced. With C based languages you can use either the configure, make, make install
pipeline demonstrated earlier or the package might come with cmake.
cmake -DCMAKE_PREFIX_PATH=/path/to/destination /path/to/build/dir
make
make install
cmake ..
Perl packages are installed with the cpanm
package manager.
cpanm PACKAGE::subpackage
#or use perl directy
perl INSTALL.pl
Using modules¶
Since a lot of users tend to use the same software, having everyone install their own would be unnecessarily repetitive. This is why most HPC centers have a centralized software stack.
Intro to Lmod¶
One of the most popular ways to make your centralised software available is via Lmod. Lmod provides a system to manage and load module files
What is a module file¶
Below is an example of a module file, in this case for tcl/8.6.12
. You will not need to remember what the contents are or write them yourself but it is good to see one to get a better basic understanding of what modules can do for you.
whatis([[Name : tcl]])
whatis([[Version : 8.6.12]])
whatis([[Target : x86_64]])
whatis([[Short description : Tcl (Tool Command Language) is a very powerful but easy to learn dynamic programming language, suitable for a very wide range of uses, including web and desktop applications, networking, administration, testing and many more. Open source and business-friendly, Tcl is a mature yet evolving language that is truly cross platform, easily deployed and highly extensible.]])
help([[Tcl (Tool Command Language) is a very powerful but easy to learn dynamic
programming language, suitable for a very wide range of uses, including
web and desktop applications, networking, administration, testing and
many more. Open source and business-friendly, Tcl is a mature yet
evolving language that is truly cross platform, easily deployed and
highly extensible.]])
depends_on("zlib/1.2.11")
prepend_path("PATH", "/gpfs/space/software/cluster_software/spack/linux-centos7-x86_64/gcc-9.2.0/tcl-8.6.12-kyut5qehakkghrvfsfca5itzgupr3zye/bin", ":")
prepend_path("MANPATH", "/gpfs/space/software/cluster_software/spack/linux-centos7-x86_64/gcc-9.2.0/tcl-8.6.12-kyut5qehakkghrvfsfca5itzgupr3zye/man", ":")
prepend_path("MANPATH", "/gpfs/space/software/cluster_software/spack/linux-centos7-x86_64/gcc-9.2.0/tcl-8.6.12-kyut5qehakkghrvfsfca5itzgupr3zye/share/man", ":")
prepend_path("LIBRARY_PATH", "/gpfs/space/software/cluster_software/spack/linux-centos7-x86_64/gcc-9.2.0/tcl-8.6.12-kyut5qehakkghrvfsfca5itzgupr3zye/lib", ":")
prepend_path("LD_LIBRARY_PATH", "/gpfs/space/software/cluster_software/spack/linux-centos7-x86_64/gcc-9.2.0/tcl-8.6.12-kyut5qehakkghrvfsfca5itzgupr3zye/lib", ":")
prepend_path("CPATH", "/gpfs/space/software/cluster_software/spack/linux-centos7-x86_64/gcc-9.2.0/tcl-8.6.12-kyut5qehakkghrvfsfca5itzgupr3zye/include", ":")
prepend_path("PKG_CONFIG_PATH", "/gpfs/space/software/cluster_software/spack/linux-centos7-x86_64/gcc-9.2.0/tcl-8.6.12-kyut5qehakkghrvfsfca5itzgupr3zye/lib/pkgconfig", ":")
prepend_path("CMAKE_PREFIX_PATH", "/gpfs/space/software/cluster_software/spack/linux-centos7-x86_64/gcc-9.2.0/tcl-8.6.12-kyut5qehakkghrvfsfca5itzgupr3zye/", ":")
setenv("TCL_LIBRARY", "/gpfs/space/software/cluster_software/spack/linux-centos7-x86_64/gcc-9.2.0/tcl-8.6.12-kyut5qehakkghrvfsfca5itzgupr3zye/lib/tcl8.6")
setenv("TCL_ROOT", "/gpfs/space/software/cluster_software/spack/linux-centos7-x86_64/gcc-9.2.0/tcl-8.6.12-kyut5qehakkghrvfsfca5itzgupr3zye")
execute{cmd="logger -t module -p local6.info DATE=$(date +%FT%T) HOSTNAME=$(hostname -s) USER=$(whoami) JOB=" .. (os.getenv("SLURM_JOB_ID") or "NOJOB") .. " APP=tcl " .. "VER=8.6.12", modeA={"load"}}
Finding, loading and managing modules¶
Navigating the module system is done with the module
command. module
is actually a function but it is better to treat it as a command in this guide. First of all we would like to see what modules are available. Please run module avail
to get a complete list. This list is quite huge, we currently have thousands of different package versions available. To search for something specific, we can append a keyword like module avail tcl
.
To load the module you have to use the module load
command. In our case you would use module load tcl/8.6.12
. The format is a standard name/version
. As before you can run which tclsh8.6
to see where your executable is located. Please note that loaded modules only affect your current terminal session and are unloaded when you close it. You can also use module unload tcl
to unload a module or module purge
to unload all modules for a clean sheet.
Using modules in jobs¶
Complete
To use the loaded software in a job you simply need to add the same loading commands to your job.
...
#SBATCH
...
module load tcl/8.6.12
sqlite3_analyzer --version
lab2_module_job_<ANON NAME>
. Power of the module system¶
Compared to the way that we built the tcl
package in the first part, loading a module for it makes the task fairly trivial. Loading a module removes the issue of adding specific paths to your environment variables each time you need a package. Lmod is also very popular so searching for solutions to problems online should pose no problem. The next sections the topic of what to do when the software you need is not in our module system.
Building software with spack¶
If you do not see your required software with module av
you have generally three options: contact us at support@hpc.ut.ee, build it and set variables as before or build it automatically with spack
Intro to spack¶
Spack is a highly powerful package manager designed with HPC in mind. It supports building, deploying and managing software from multiple languages, all fully automated and without requiring admin privileges.
Start with installing spack into your home directory
git clone --depth=100 --branch=releases/v0.20 https://github.com/spack/spack.git ~/spack
. ~/spack/share/spack/setup-env.sh
.bashrc
to activate it with every terminal session. First package¶
Now that you have spack enabled, you have full access to it's functionality. We will be following the Spack getting started guide.
You can start by running spack list
to see the available packages. Spack has 7360 packages at the time of writing so you can go to the Spack package index to get a simpler(and faster) overview.
We will continue with the tcl
package as an example. Run spack info tcl
to see information about the package. Next, run spack spec tcl
Spack specs¶
Spack specs are collections of descriptors that Spack uses to refer to a specific build configuration. The most important descriptors are:
@
denotes the version%
specifies the compiler~
,+
andname=<value>
specify variants^
specifies a dependency/
specifies a hash, this is rarely needed unless you need a very specific spec
An example from the Spack documentation: mpileaks @1.2:1.4 %gcc@4.7.5 +debug -qt target=x86_64 ^callpath @1.1 %gcc@4.7.2
This refers to:
- mpileaks version between 1.2 and 1.4
- compiler of Gcc version 4.7.5
- has debug options and without qt support
- target architecture is
x86_64
- has a dependency package called Callpath with version 1.1 and built with a different compiler
Specs are very powerful ways to customize how you want your packages built. You will most probably not need any extra descriptors besides the version one, but they are good to know.
Installing and using a package¶
Run spack install tcl
to install Tcl. Run spack find
to see all of the packages that you have installed. Next install Tcl with a different version with spack install tcl@8.6.10
and run spack find
again to see the difference
Simply installing a package does not make it usable yet. Spack has two variants on how to load its packages. Spack automatically creates module files for its packages and running the setup-env.sh
script automatically makes them loadable. If you run module avail tcl
again, you will see new tcl
modules that are differently named.
The other option is to use spack load <spec>
, like spack load tcl@8.6.10
for example. The advantage with this one is that you can use the Spack spec syntax that is more versatile than what Lmod has to offer.
What can spack do for you¶
Spack has enough functionality and options to warrant a separate course for it. The spack documentation is vast and well written so its well worth checking out. Spack allows you to fine tune your software, have it ready with a few lines and even reproduce software environments. To get help with spack you can write to our support or join the Spack slack channel.
Tips and tricks¶
The module system can be a bit daunting at first glance. Hopefully this lab has given you a better understanding on the basics.
Here are some tips and tricks that did not fit in the guide that might help you in the future:
- As you saw in the module file example, loading a module sets a variable called
$TCL_ROOT
. This references the base directory of the module location and gets set with every module, theTCL
gets replaced with the proper name. If you ever need to reference software with a full path then using this variable is highly suggested. If we change any paths or directory structures in the background then your scripts will not break - Usually software errors are quite descriptive. It is well worth reading them before proceeding, your issue might be a missing directory, a dependent module is not loaded etc. If you are unable to decipher the error then our support email is always available.
- It is best to keep your environment as clean and simple as possible. Try to load modules inside job scripts instead of using
.bashrc
. This lessens the probability of there being mismatches and errors. - Try to make your setup replicable. This mostly means using environments but those will be covered extensively in the environments and containers lab.