Skip to content

Python environments

Installing software in UTHPC may be at times a chore or even difficult, but it doesn't have to be. For python, there exist a few options to create and manage your own local environments with the necessary submodules/packages. Below is show, how to create virtual environments using pip with Python's virtualenv module, and using Anaconda Python (conda) environments.

For all of those options, it's recommended that you always use Python through the modules system (see Using Modules ). Python versions available on the nodes themselves are for system use and don't contain all the necessary packages. Also, different nodes might have different versions of python and different packages, and loading a python module before your jobs maintains a steady environment.

Different modules of python might have different configurations, for example the default python/version modules are bare bones python installations and don't contain any virtualenv modules. Also the name of the executable might be different - python may not always exist with the same name, using python2 and python3 respectively is a good idea.

Using pip and python virtual environments

Virtual environment or virtualenv or venv denotes the same thing - usually a folder in your home directory, which can then be ’activated’ to load a pre-installed set of python modules and executables. Creating such environments requires the python virtualenv module, which in UTHPC is usually provided by a module called py-virualenv.

Below is an simple example how to cerate such envs. After creating the environment, unload the py-virtualenv package as there might be library location conflicts with the virtualenv:

[user@login1 ~]$ module load any/python/3.9.9
[user@login1 ~]$ virtualenv venv_example
created virtual environment CPython3.9.9.final.0-64 in 8958ms
  creator CPython3Posix(dest=/gpfs/space/home/user/venv_example, clear=False, no_vcs_ignore=False, global=False)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/gpfs/space/home/user/.local/share/virtualenv)
    added seed packages: pip==21.3.1, setuptools==60.2.0, wheel==0.37.1
  activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator

While setting up a simple environment is easy, it's not the main point of maintaining different environments. The idea is installing different modules and software available from the pip repositories, which you can keep for a project or even tool basis - there is no limit of how many you can have.

For example, assume you need a package named request, which isn't available by default in python. First activate the environment using the source command and point it at the activate function in freshly created venv.

[user@login1 ~]$ module unload any/python/3.9.9
[user@login1 ~]$ source venv_example/bin/activate
(venv_example) [user@login1 ~]$ pip install requests
Collecting requests
  Downloading requests-2.25.1-py2.py3-none-any.whl (61 kB)
     |████████████████████████████████| 61 kB 1.4 MB/s
Collecting urllib3<1.27,>=1.21.1
  Downloading urllib3-1.26.4-py2.py3-none-any.whl (153 kB)
     |████████████████████████████████| 153 kB 3.4 MB/s
Collecting chardet<5,>=3.0.2
  Downloading chardet-4.0.0-py2.py3-none-any.whl (178 kB)
     |████████████████████████████████| 178 kB 7.5 MB/s
Collecting certifi>=2017.4.17
  Downloading certifi-2020.12.5-py2.py3-none-any.whl (147 kB)
     |████████████████████████████████| 147 kB 7.3 MB/s
Collecting idna<3,>=2.5
  Downloading idna-2.10-py2.py3-none-any.whl (58 kB)
     |████████████████████████████████| 58 kB 3.1 MB/s
Installing collected packages: urllib3, idna, chardet, certifi, requests
Successfully installed certifi-2020.12.5 chardet-4.0.0 idna-2.10 requests-2.25.1 urllib3-1.26.4
Pip takes care of managing all the dependencies, which means less work for you. With the environment activated, you can freely use pip or other methods to install Python packages without the --user switch, and not worry about conflicting with the module's or locally installed packages. To search for available packages and software, you can either use Google or pypi.org , a repository of pip packages.

Info

To deactivate your environment and return to regular environment use the command deactivate.

Note

You must set up and create your venv only once. After you have created it with the necessary packages, you only need to run the source command to activate it.

In case your specific Python package isn't available through pip/pip3 repository, but you downloaded it as a source that contains a setup.py file, then pip is also capable of installing a package directly from the source folder — cd to the source folder where the setup.py file resides and execute pip install .. There is more pip support for installing from different version control systems, --index-url (and --extra-index-url) options from installing packages from other repositories, --find-links for installing specific packages from local folders, and the possibility of installing packages directly from .zip, .tar and .wheel files without unpacking them first using pip install <file>. Always take care to only install packages from trusted sources when using other than the pip default repository.

Tip

  • To create a new virtual environment: virtualenv <venv name>
  • To activate a virtual environment: source <venv_name>/bin/activate
  • To deactivate a virtual environment: deactivate
  • To install a specific package version: pip install <package-name==version>--user
  • To install packages listed in a requirements file: pip install -r <requirements-file>--user
  • To uninstall a package: pip uninstall <package-name>

Using conda environments

Conda is a different python dependency and environment manager. While pip and conda are quite similar, there are some differences in the command run. Additionally, conda packages are prebuilt, meaning they're only downloaded and extracted whereas pip packages compilation is local. Conda also works as a standalone for both environments and installing as opposed to python virtualenvs creating the venv and pip managing the packages.

After loading a conda capable python module any/python/3.8.3-conda, you can take a look at how conda manages its environments. First list all environments:

[user@login1 ~]$ conda env list
# conda environments:
#
base                  *  /gpfs/space/software/cluster_software/manual/any/python/conda/3.8
You can see one environment named base, this is akin to a central python installation. Create a new environment and activate it:
[user@login1 ~]$ conda create -n conda_venv_example
Solving environment: done
## Package Plan ##

  environment location: /gpfs/space/home/user/.conda/envs/conda_venv_example

Proceed ([y]/n)? y

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate conda_venv_example
#
# To deactivate an active environment, use
#
#     $ conda deactivate

[user@login1 ~]$ conda activate conda_venv_example
(conda_venv_example) [user@login1 ~]$ conda env list
# conda environments:
#
conda_venv_example    *  /gpfs/space/home/user/.conda/envs/conda_venv_example
base                     /gpfs/space/software/cluster_software/manual/any/python/conda/3.8

You can now see, that by default, conda places newly created virtual environments under ~/.conda/envs/, but creating environment is half the process - the main idea is to allow for easy installation of software. For this, conda has different repositories named ’channels’, and it allows installation of different software. You can search for conda packages through Google or anaconda.org , the official repository website.

For example, assume you need a package from the bioconda channel, a repository for bioinformatics tools.

(conda_venv_example) [user@login1 ~]$ conda install -c bioconda samtools
Solving environment: done

## Package Plan ##

  environment location: /gpfs/space/home/user/.conda/envs/conda_venv_example

  added / updated specs:
    - samtools


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2021.1.19  |       h06a4308_1         125 KB
    openssl-1.1.1k             |       h27cfd23_0         3.8 MB
    krb5-1.18.2                |       h173b8e3_0         1.5 MB
    ncurses-6.2                |       he6710b0_1         1.1 MB
    bzip2-1.0.8                |       h7b6447c_0         105 KB
    xz-5.2.5                   |       h7b6447c_0         438 KB
    samtools-1.11              |       h6270b1f_0         383 KB  bioconda
    libssh2-1.9.0              |       h1ba5d50_1         346 KB
    htslib-1.11                |       hd3b49d5_2         1.8 MB  bioconda
    libdeflate-1.7             |       h27cfd23_5          72 KB
    libedit-3.1.20210216       |       h27cfd23_1         190 KB
    libcurl-7.71.1             |       h20c2e04_1         313 KB
    ------------------------------------------------------------
                                           Total:        10.0 MB

The following NEW packages will be INSTALLED:

    _libgcc_mutex:   0.1-main
    bzip2:           1.0.8-h7b6447c_0
    ca-certificates: 2021.1.19-h06a4308_1
    htslib:          1.11-hd3b49d5_2         bioconda
    krb5:            1.18.2-h173b8e3_0
    libcurl:         7.71.1-h20c2e04_1
    libdeflate:      1.7-h27cfd23_5
    libedit:         3.1.20210216-h27cfd23_1
    libgcc-ng:       9.1.0-hdf63c60_0
    libssh2:         1.9.0-h1ba5d50_1
    libstdcxx-ng:    9.1.0-hdf63c60_0
    ncurses:         6.2-he6710b0_1
    openssl:         1.1.1k-h27cfd23_0
    samtools:        1.11-h6270b1f_0         bioconda
    xz:              5.2.5-h7b6447c_0
    zlib:            1.2.11-h7b6447c_3

Proceed ([y]/n)? y


Downloading and Extracting Packages
ca-certificates 2021.1.19: ################### | 100%
openssl 1.1.1k: ############################## | 100%
krb5 1.18.2: ################################# | 100%
ncurses 6.2: ################################# | 100%
bzip2 1.0.8: ##################################| 100%
xz 5.2.5: #################################### | 100%
samtools 1.11: ############################### | 100%
libssh2 1.9.0: ############################### | 100%
htslib 1.11: ################################# | 100%
libdeflate 1.7: ############################## | 100%
libedit 3.1.20210216: #########################| 100%
libcurl 7.71.1: ############################## | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

Tip

  • To create a virtual environment using conda: conda create --name <name of-environment>
  • To install a conda package under a specific conda virtual environment: conda install -n <name-of-environment> <package-name>
  • To activate your conda virtual environment: conda activate <name-of-environment>
  • To deactivate your conda virtual environment: conda deactivate
  • To list installed packages in your environment: conda list
  • To list environments you have created: conda env list

And of course, you can install multiple tools and packages into both regular virtualenvs and conda environments. After setting up an environment, you only need to activate it afterwards. For your sbatch jobs, the commands are all the same - a small example of using the samtools from the conda environment in an Slurm job:

#!/bin/bash

#SBATCH --partition=main
#SBATCH --time=10
#SBATCH --mem=10G
#SBATCH --cpus-per-task=4

module load any/python/3.8.3-conda
conda activate conda_venv_example

samtools sort -n input.bam -O output.bam --threads=4