Python environments¶
Installing software in UTHPC may be at times a chore or even difficult, but it doesn't have to be. For python, there exist a few options to create and manage your own local environments with the necessary submodules/packages. Below is show, how to create virtual environments using pip with Python's virtualenv module, and using Anaconda Python (conda) environments.
For all of those options, it's recommended that you always use Python through the modules system (see Using Modules ). Python versions available on the nodes themselves are for system use and don't contain all the necessary packages. Also, different nodes might have different versions of python and different packages, and loading a python module before your jobs maintains a steady environment.
Different modules of python might have different configurations, for example the default python/version
modules are bare bones python installations and don't contain any virtualenv modules. Also the name of the executable might be different - python
may not always exist with the same name, using python2
and python3
respectively is a good idea.
Using pip and python virtual environments¶
Virtual environment or virtualenv or venv denotes the same thing - usually a folder in your home directory, which can then be ’activated’ to load a pre-installed set of python modules and executables. Creating such environments requires the python virtualenv
module, which in UTHPC is usually provided by a module called py-virualenv
.
Below is an simple example how to cerate such envs. After creating the environment, unload the py-virtualenv
package as there might be library location conflicts with the virtualenv:
[user@login1 ~]$ module load any/python/3.9.9
[user@login1 ~]$ virtualenv venv_example
created virtual environment CPython3.9.9.final.0-64 in 8958ms
creator CPython3Posix(dest=/gpfs/space/home/user/venv_example, clear=False, no_vcs_ignore=False, global=False)
seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/gpfs/space/home/user/.local/share/virtualenv)
added seed packages: pip==21.3.1, setuptools==60.2.0, wheel==0.37.1
activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
While setting up a simple environment is easy, it's not the main point of maintaining different environments. The idea is installing different modules and software available from the pip repositories, which you can keep for a project or even tool basis - there is no limit of how many you can have.
For example, assume you need a package named request
, which isn't available by default in python. First activate the environment using the source
command and point it at the activate
function in freshly created venv.
[user@login1 ~]$ module unload any/python/3.9.9
[user@login1 ~]$ source venv_example/bin/activate
(venv_example) [user@login1 ~]$ pip install requests
Collecting requests
Downloading requests-2.25.1-py2.py3-none-any.whl (61 kB)
|████████████████████████████████| 61 kB 1.4 MB/s
Collecting urllib3<1.27,>=1.21.1
Downloading urllib3-1.26.4-py2.py3-none-any.whl (153 kB)
|████████████████████████████████| 153 kB 3.4 MB/s
Collecting chardet<5,>=3.0.2
Downloading chardet-4.0.0-py2.py3-none-any.whl (178 kB)
|████████████████████████████████| 178 kB 7.5 MB/s
Collecting certifi>=2017.4.17
Downloading certifi-2020.12.5-py2.py3-none-any.whl (147 kB)
|████████████████████████████████| 147 kB 7.3 MB/s
Collecting idna<3,>=2.5
Downloading idna-2.10-py2.py3-none-any.whl (58 kB)
|████████████████████████████████| 58 kB 3.1 MB/s
Installing collected packages: urllib3, idna, chardet, certifi, requests
Successfully installed certifi-2020.12.5 chardet-4.0.0 idna-2.10 requests-2.25.1 urllib3-1.26.4
--user
switch, and not worry about conflicting with the module's or locally installed packages. To search for available packages and software, you can either use Google or pypi.org , a repository of pip packages. Info
To deactivate your environment and return to regular environment use the command deactivate
.
Note
You must set up and create your venv only once. After you have created it with the necessary packages, you only need to run the source
command to activate it.
In case your specific Python package isn't available through pip/pip3 repository, but you downloaded it as a source that contains a setup.py file, then pip is also capable of installing a package directly from the source folder — cd
to the source folder where the setup.py
file resides and execute pip install .
. There is more pip support for installing from different version control systems, --index-url (and --extra-index-url)
options from installing packages from other repositories, --find-links
for installing specific packages from local folders, and the possibility of installing packages directly from .zip, .tar and .wheel files without unpacking them first using pip install <file>
. Always take care to only install packages from trusted sources when using other than the pip default repository.
Tip
- To create a new virtual environment:
virtualenv <venv name>
- To activate a virtual environment:
source <venv_name>/bin/activate
- To deactivate a virtual environment:
deactivate
- To install a specific package version:
pip install <package-name==version>--user
- To install packages listed in a requirements file:
pip install -r <requirements-file>--user
- To uninstall a package:
pip uninstall <package-name>
Using conda environments¶
Conda is a different python dependency and environment manager. While pip and conda are quite similar, there are some differences in the command run. Additionally, conda packages are prebuilt, meaning they're only downloaded and extracted whereas pip packages compilation is local. Conda also works as a standalone for both environments and installing as opposed to python virtualenvs creating the venv and pip managing the packages.
After loading a conda capable python module any/python/3.8.3-conda
, you can take a look at how conda manages its environments. First list all environments:
[user@login1 ~]$ conda env list
# conda environments:
#
base * /gpfs/space/software/cluster_software/manual/any/python/conda/3.8
base
, this is akin to a central python installation. Create a new environment and activate it: [user@login1 ~]$ conda create -n conda_venv_example
Solving environment: done
## Package Plan ##
environment location: /gpfs/space/home/user/.conda/envs/conda_venv_example
Proceed ([y]/n)? y
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate conda_venv_example
#
# To deactivate an active environment, use
#
# $ conda deactivate
[user@login1 ~]$ conda activate conda_venv_example
(conda_venv_example) [user@login1 ~]$ conda env list
# conda environments:
#
conda_venv_example * /gpfs/space/home/user/.conda/envs/conda_venv_example
base /gpfs/space/software/cluster_software/manual/any/python/conda/3.8
You can now see, that by default, conda places newly created virtual environments under ~/.conda/envs/
, but creating environment is half the process - the main idea is to allow for easy installation of software. For this, conda has different repositories named ’channels’, and it allows installation of different software. You can search for conda packages through Google or anaconda.org , the official repository website.
For example, assume you need a package from the bioconda
channel, a repository for bioinformatics tools.
(conda_venv_example) [user@login1 ~]$ conda install -c bioconda samtools
Solving environment: done
## Package Plan ##
environment location: /gpfs/space/home/user/.conda/envs/conda_venv_example
added / updated specs:
- samtools
The following packages will be downloaded:
package | build
---------------------------|-----------------
ca-certificates-2021.1.19 | h06a4308_1 125 KB
openssl-1.1.1k | h27cfd23_0 3.8 MB
krb5-1.18.2 | h173b8e3_0 1.5 MB
ncurses-6.2 | he6710b0_1 1.1 MB
bzip2-1.0.8 | h7b6447c_0 105 KB
xz-5.2.5 | h7b6447c_0 438 KB
samtools-1.11 | h6270b1f_0 383 KB bioconda
libssh2-1.9.0 | h1ba5d50_1 346 KB
htslib-1.11 | hd3b49d5_2 1.8 MB bioconda
libdeflate-1.7 | h27cfd23_5 72 KB
libedit-3.1.20210216 | h27cfd23_1 190 KB
libcurl-7.71.1 | h20c2e04_1 313 KB
------------------------------------------------------------
Total: 10.0 MB
The following NEW packages will be INSTALLED:
_libgcc_mutex: 0.1-main
bzip2: 1.0.8-h7b6447c_0
ca-certificates: 2021.1.19-h06a4308_1
htslib: 1.11-hd3b49d5_2 bioconda
krb5: 1.18.2-h173b8e3_0
libcurl: 7.71.1-h20c2e04_1
libdeflate: 1.7-h27cfd23_5
libedit: 3.1.20210216-h27cfd23_1
libgcc-ng: 9.1.0-hdf63c60_0
libssh2: 1.9.0-h1ba5d50_1
libstdcxx-ng: 9.1.0-hdf63c60_0
ncurses: 6.2-he6710b0_1
openssl: 1.1.1k-h27cfd23_0
samtools: 1.11-h6270b1f_0 bioconda
xz: 5.2.5-h7b6447c_0
zlib: 1.2.11-h7b6447c_3
Proceed ([y]/n)? y
Downloading and Extracting Packages
ca-certificates 2021.1.19: ################### | 100%
openssl 1.1.1k: ############################## | 100%
krb5 1.18.2: ################################# | 100%
ncurses 6.2: ################################# | 100%
bzip2 1.0.8: ##################################| 100%
xz 5.2.5: #################################### | 100%
samtools 1.11: ############################### | 100%
libssh2 1.9.0: ############################### | 100%
htslib 1.11: ################################# | 100%
libdeflate 1.7: ############################## | 100%
libedit 3.1.20210216: #########################| 100%
libcurl 7.71.1: ############################## | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Tip
- To create a virtual environment using conda:
conda create --name <name of-environment>
- To install a conda package under a specific conda virtual environment:
conda install -n <name-of-environment> <package-name>
- To activate your conda virtual environment:
conda activate <name-of-environment>
- To deactivate your conda virtual environment:
conda deactivate
- To list installed packages in your environment:
conda list
- To list environments you have created:
conda env list
And of course, you can install multiple tools and packages into both regular virtualenvs and conda environments. After setting up an environment, you only need to activate it afterwards. For your sbatch jobs, the commands are all the same - a small example of using the samtools from the conda environment in an Slurm job:
#!/bin/bash
#SBATCH --partition=main
#SBATCH --time=10
#SBATCH --mem=10G
#SBATCH --cpus-per-task=4
module load any/python/3.8.3-conda
conda activate conda_venv_example
samtools sort -n input.bam -O output.bam --threads=4