Aus PC2 Doc
Dies ist die bestätigte sowie die neueste Version dieser Seite.
Zur Navigation springen Zur Suche springen

A more general description of applications and software tools can be found on the Wiki-Page Software. In the following are details about some software available for OCuLUS. The availability of an application can be restricted by the terms of its license.

Choosing a Software Package[Bearbeiten]

We use the TCL-based tool Modules to manage the different software packages.

The following table shows the most used commands.

For more information about Modules, please read the man page or refer to the Modules Project home page.

Command Purpose
mdlsearch search for modules containing the given string in its name
module avail list the available software packages
module list show the loaded modules
module add load a module. If now release given, the highest version is loaded normally.
module del unload a module
module display show what the module does


Version 8.2.2 of ABINIT is installed. A sequential (abinit-seq), a MPI-parallel (abinit-mpi) and a GPU-Version (abinit-gpu) of the software is availalble.

module add abinit


module add AMS.
Example job script: $PC2SW/examples/ams.sh


Version 1.7.7 is installed.

module add bigdft


Blender features a rendering engine called Cycles that offers stunning realistic rendering.

The built-in Cycles rendering engine offers:

  • GPU & CPU rendering
  • Realtime viewport preview
  • HDR lighting support
  • Permissive License for linking with external software
module add blender
blender Gearwheel.blend -o Gearwheel--threads 16 --render-format MPEG -x 1 --Background --render-anim


Version 1.0.0 RC3 is installed.

module add caffe/1.0.0

and a CPU-only version

module add caffe/1.0.0_cpu


To compiler and execute an OpenACC program use a GPU node and do the following:

module add cuda capscompilers


Chapel supports a multithreaded parallel programming model at a high level by supporting abstractions for data parallelism, task parallelism, and nested parallelism. It enables optimizations for the locality of data and computation in the program via abstractions for data distribution and data-driven placement of subcomputations. It allows for code reuse and generality through object-oriented concepts and generic programming features. For instance, Chapel allows for the declaration of locales.

module add chapel

chpl test.chpl -L/cm/shared/apps/pc2/OpenMPI/gcc/1.8.4-mt/lib -lmpi

Example batch file for ccsalloc to execute a chapel program:
#! /bin/sh
#Starts a.out on 3 processors

#CCS -n 3
### Prepare the Chapel hostfile
nHosts=`cat $GASNET_NODEFILE | wc -l`

### Now start the chapel application
a.out -nl $nHosts

Clang with OpenMP for GPU-offloading[Bearbeiten]

Clang with OpenMP support for GPU-offloading is available on OCuLUS. To use the latest Clang 11.1.0 please load the environment with

module load compiler/clang

The compiler option that enables the OpenMP support for GPU-offloading is -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda.

Sample codes and detailed documentation can be found at https://github.com/pc2/OMP-Offloading.


The version 4.1, and 5.1 are installed.


module add cp2k/4.1
ccsalloc -I -t 1h -n 4 ompi -- cp2k.popt -i MyInp.inp


Critic2 is installed. Critic2 is a program for the analysis of quantum mechanical calculation results in molecules and periodic solids.

It can be used by loading the environment module:

module load chem/critic2


Dalton2016.2 is installed.

NOTE: Users have to accept the Dalton2016 license terms.

module add dalton


ccsalloc --res=rset=2:ncpus=16:mpiprocs=1:ompthreads=1,place=scatter:excl dalton -N 32 dft_stex_1 OCS_aug-cc-pCVTZ



This is a Singularity-Image containing FEniCS 2018.1. Ubuntu Xenial is running inside the image. The following packages have been installed after the FEniCS installation:

  • PETSc
  • SLEPc
  • matplotlib


If you want to use FEniCS interactively do this on the frontend:

module add fenics # Load fenics-module
singularity shell $FENICS_IMG # Spawn a shell inside the image
singularity exec $FENICS_IMG <command> # Execute command inside image

Submit via ccsalloc[Bearbeiten]

An example submit script can be found under $PC2SW/examples/fenics.sh

cp $PC2SW/examples/fenics.sh . # Copy the script to your work-directory
ccsalloc fenics.sh <command>   # Read fenics.sh for usage example

The specified command and all its options will be executed in the image.


Different versions of FFTW3 library (single and double-precision) are installed.

mdlsearch fftw


Access is restricted. Please apply for access.

The source code of Gamess-US can be downloaded at https://www.msg.chem.iastate.edu/gamess/download.html.

module use $PC2SW/CHEM_PHYS_SW/modules/
module add gamess-us

Refer to $PC2SW/examples/gamess-us.sh for a template job-script.


Access is restricted. Please apply for access.

Installed versions are G03-B.01, G09-B.01, G09-D.01, G16-B.01, and G16-C.01

module add g16

loads G16 Rev. C.01

Refer to $PC2SW/examples for template job-scripts.

Here is useful information for running Gaussian computation with the checkpoint file.

GCC with GPU offloading[Bearbeiten]

To use GCC 9.2.0-offload with OpenMP and OpenACC support for offloading computation on GPU, load the environment with

module load gcc/9.2.0-offload

The compiler option is -fopenmp -foffload=nvptx-none.

Sample codes and detailed documentation can be found on https://github.com/pc2/OMP-Offloading


Use the ghmm module to activate the environment

module add ghmm

Python bindings are available for Python

$ module add python
$ python
Python 2.7.6 (default, Dec  6 2013, 18:06:23) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>  import ghmm


Installed version is 5.5

module add grace


Several versions are installed.

mdlsearch gromacs

An example job script can be found in $PC2SW/examples


HOOMD-blue is a general-purpose particle simulation toolkit. Version 1.1.1 for Intel Xeon E5 and nVidia GPU is installed. Versions 1.0.0, 1.0.1, and 1.0.5 are also available.

module add hoomd


Sometimes you will get this message:

An MPI process has executed an operation involving a call to the fork() system call to create a child process. Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your MPI job may hang, crash, or produce silent data corruption. The use of fork() (or system() or other calls that create child processes) is strongly discouraged.

The process that invoked fork was: ...

If you are *absolutely sure* that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0.

Here an example to disable the warning:

ccsalloc -t 5m --res=rset=1:ncpus=1:mem=4g:vmem=100g:gpus=1:tesla=t ompi --mca mpi_warn_on_fork 0 -- hoomd test.py


If hoomd is started on a node w/o GPU you will get this message:

*Warning*: (Rank ...): NVIDIA driver not installed or is too old, ignoring any GPUs in the system.

Please, ignore the warning, hoomd will run on the CPU.

Intel Cluster Studio[Bearbeiten]

Several versions of Intel Parallel Studio Cluster Edition are installed.

mdlsearch ^intel/[12]

Example: Start an INTEL-MPI/OpenMP program on 4 chunks (2 MPI-processes per chunk) and 8 OMP threads per process. Refer to $PC2SW/examples for template job-scripts.

ccsalloc --res=rset=4:ncpus=16:mpiprocs=2:ompthreads=8 impi.sh ./programm

The Intel® MPI Library introduces thread safe libraries at level MPI_THREAD_MULTIPLE. Several threads can make the Intel MPI library calls simultaneously. Use the compiler driver -mt_mpi option to link the thread safe version of the Intel MPI Library.

Set the I_MPI_DEBUG environment variable to "4". The Intel MPI Library will report process pinning information.


Jupyter notebooks[Bearbeiten]

See Jupyter.


LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator.

It has potentials for solid-state materials (metals, semiconductors) and soft matter (biomolecules, polymers) and coarse-grained or mesoscopic systems. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, meso, or continuum scale.

LAMMPS runs in parallel using message-passing techniques with a spatial-decomposition of the simulation domain. The code is designed to be modify or extend with new functionality.

module add lammps

Execute an example instance of $LAMMPS_EXAMPLEDIR:

ccsalloc -n 16 --stdin=in.crack  ompi -- lmp_mpi

LAMMPS 15May2015 ist installed with the CUDA package.

lmp_mpi_cuda -c on


Toolsuite with command line applications for performance oriented programmers.

module load devel/likwid

The uncore counters might not be accessible. Please contact us at PC2-support if you need uncore counters.


MAGMA, Matrix Algebra on GPU and Multicore Architectures, version 1.7.0 is installed.

module add magma


Only available for members of the Paderborn University.

We provide several releases. Please try mdlsearch matlab to see the available releases.

module add matlab loads the highest release.

Licenses for the Parallel Toolbox and Distributed Computing Server are available.

NOTE: Our DCE license is working only with Matlab 2016b or lower. Newer Matlab versions cannot use DCE.

If you want to use the Distributed Computing Server, you have to load the profile $PC2SW/examples/MATLAB/MDCE.settings.

It is necessary that you have specified your default group and your email address in the CCS rc-file $HOME/.ccsrc/uirc.

To create this file call $CCS/bin/ccsgenrcfile. Refer also to the related CCS FAQ

Examples for parallel/distributed and GPU usage can be found in $PC2SW/examples/MATLAB

If you are using MATLAB on a front-end node, please do not run heavy compute jobs locally. Please use "nice -n 10 matlab"

For more information, have a look at: http://www.mathworks.de/products/distriben/


Installed versions are 2.11 and 2.10b2 (IB, SMP, CUDA, PHI).

InfiniBand and single-threaded NAMD processes[Bearbeiten]

example (assuming a node has at least 16 cores):

ccsalloc -n 8 ./charm.sh

File: charm.sh


PPE=16  #we assume 16 cores per node
  echo "group main ++shell ssh" > $NODELIST
  for NODE in $NODES
    for (( c=1; c<=${PPE}; c++ ))
      echo host $NODE >> $NODELIST
      NP=`expr $NP + 1`

${CHARMRUN} ++p ${NP} ++nodelist ${NODELIST} ${PROG} ${IFILE} > ${OFILE}

tail -n 30 ${OFILE}

exit 0

InfiniBand and multi-threaded NAMD processes[Bearbeiten]

The Linux-x86_64-ibverbs-smp binaries are based on "smp" builds of Charm++ that can be used with multiple threads on either a single machine like a multicore build, or across a network. SMP builds combine multiple worker threads and an extra communication thread into a single process. Since one core per process is used for the communication thread SMP builds are typically slower than non-SMP builds. The advantage of SMP builds is that many data structures are shared among the threads, reducing the per-core memory footprint when scaling large simulations to large numbers of cores.

example (assuming a node has at least 16 cores):

ccsalloc -n 8 ./charm-smp.sh

File: charm-smp.sh


#  number of cores minus one, because of communcation thread
  echo "group main ++shell ssh" > $NODELIST
  for NODE in $NODES
      echo host $NODE >> $NODELIST
      NP=`expr $NP + 1`
      NP=`expr $NP \* $PPN`

$CHARMRUN ++p ${NP} ++nodelist ${NODELIST} ${PROG} ++ppn ${PPN} ${IFILE} > ${OFILE}

exit 0

InfiniBand, SMP, nVIDIA GPU[Bearbeiten]

example (uses one core and the Tesla card of a GPU node):

module add namd/2.11/ib-smp-cuda
ccsalloc --res=rset=1:tesla=t:gpus=1:ncpus=1:mem=4g:vmem=85g ./run_namd.sh 1

File: run_namd.sh

# module add namd/...
# call "run_namd <PPE>"

PROG=`which namd2`
ENV_SCRIPT=`which namd_env_script.sh`

cd $HOME
  echo "group main ++shell ssh" > $NODELIST
  for NODE in $NODES
    for (( c=1; c<=${PPE}; c++ ))
      echo host $NODE >> $NODELIST
      NP=`expr $NP + 1`
$CHARMRUN ++p ${NP} ++verbose ++nodelist ${NODELIST} ++runscript ${ENV_SCRIPT} ${PROG} +idlepoll ${IFILE} > ${OFILE}
exit 0

Note: In this example, the NAMD process uses two cores.


Nek5000 is a fast and scalable high-order solver for computational fluid dynamics. It can be used by loading the environment module:

 module load cae/Nek5000/19.0-intel-2020b


several versions are installed. Try mdlsearch netcdf

 module add data/netcdf data/hdf5

NVIDIA HPC SDK[Bearbeiten]

The NVIDIA HPC SDK includes compilers, libraries and software tools for developing HPC applications. It can be used by loading the environment module:

module load pgi/nvhpc/21.7

Sample code and job script can be found in $PC2SW/examples/nvhpc.


Access is restricted. Please apply for access. Installed versions are 6.0 and 6.3R2, both built with OpenMPI. Refer to $PC2SW/examples for an example run script.


Several version are installed. Try mdlsearch openblas.


Several version are installed. Try mdlsearch openfoam. An example jobscript can be found in $PC2SW/examples.


Several versions of OpenMPI are installed (try: mdlsearch openmpi). For more information about the installation of the OpenMPI version execute ompi_info.

Example: Start an OpenMPI/OpenMP program on 4 chunks (2 MPI-processes per chunk) and 8 OMP threads per process.

ccsalloc --res=rset=4:ncpus=16:mpiprocs=2:ompthreads=8 ompi -- ./program

If you want to use the intel based version 1.10.2, try out the following:

module add openmpi/intel/1.10.2_mt
ccsalloc -I --res=rset=2:ncpus=16:mpiprocs=1:ompthreads=16,place=scatter:excl ompi -V intel/1.10.2_mt --map-by node -- ./program


OpenSpeedShop version 2.0.2 and 2.1 are installed. User guides are located in $PC2SW/OpenSpeedShop


Access is restricted. Please apply for access.

To use the latest version of ORCA

module load orca/5.0.3

After loading the ORCA module, the manual can be found in the directory $ORCA_PATH.

A template Slurm jobscript ($PC2SW/examples/orca.sh) is prepared for you to submit ORCA computation:

orca.sh orca_input_file[.inp] walltime [ORCA-version] [xTB-version]
  • orca_input_file[.inp] is the name of ORCA input file ([.inp] is optional).
  • walltime is the compute walltime.
  • [ORCA-version] is the ORCA version (optional).
  • [xTB-version] is the xTB version (optional).

For example, the following command submits a calculation for caffeine.inp with the walltime of 2 hours, the ORCA version 5.0.3 and the xTB version 6.2.3:

orca.sh caffeine.inp 2h 503 623

More example calculations can be found in $PC2SW/examples/orca_xtb.


Several ParaView version are installed.

mdlsearch paraview


PETSc (Portable, Extensible Toolkit for Scientific Computation) version 3.6.1 and 3.8.3 are installed.

module add petsc


PGI Compiler Suite (C, C++, Fortan) with OpenACC and support for OpenMPI, netcdf, and nVidia is installed.

To search for all available PGI modules use

 mdlsearch pgi


 module load pgi/compiler/20.1 system/CUDA/10.2.89-GCC-8.3.0
 pgcc -ta=tesla:cc35 -o example $PC2SW/examples/OpenACC/example.c
 ccsalloc -I -t5m --res=rset=1:ncpus=1:tesla=1 example

The target codes for the GPUs are:

GPU Compiler Target
Tesla K20 -ta=tesla:cc35
GTX1080 -ta=tesla:cc60
RTX2080 -ta=tesla:cc75

Try also pgaccelinfo on a GPU node

To get more information use

module help pgi/compiler/20.1

Documentation for PGI 20.1 is available at: http://www.pgroup.com/resources/docs.htm


Parallel Linear Algebra Software for Multicore Architectures (PLASMA) version 2.8.0 compiled with Intel Compiler is installed.

module add plasma

Quantum ESPRESSO[Bearbeiten]

Quantum ESPRESSO is a suite for first-principles electronic-structure calculations and materials modeling. The latest version 6.8 built with both the foss-2021a and intel-2021a toolchains is installed on OCULUS.

  • to use Quantum ESPRESSO built with the foss-2021a toolchain please use
module load chem/QuantumESPRESSO/6.8-foss-2021a
  • to use Quantum ESPRESSO built with the intel-2021a toolchain please use
module load chem/QuantumESPRESSO/6.8-intel-2021a

An example for molecular dynamics simulation of silicon by using Quantum ESPRESSO with 2 compute nodes (32 CPU cores in total) on OCULUS can be found in $PC2SW/examples/QuantumESPRESSO.


Versions 3.0.0, 3.2.1, and 3.2.4 of the R-project are installed.

See also http://cran.r-project.org/web/views/HighPerformanceComputing.html


Several Scalasca version are installed. A user guide is located in $PC2SW/SCALASCA.

mdlsearch scalasca


Installed version of Scilab is 5.4.1.

module add scilab


Documentation is in $PC2SW/examples/SINGULARITY

module add singularity


The USPEX license forbids distribution of the software. Thus, we are not allowed to install it for all users on the cluster. The following instruction show how to run USPEX 10.4 with VASP 5.4.4 on Oculus. The mode for submission of jobs is whichCluster=0, which assumes that USPEX is running in a job. If you need one of the other submission modes, i.e., whichCluster=1 or whichCluster=2 (for running multiple VASP-calculations in parallel) please let us know.

Compile VASP 5.4.4 with instructions at https://wikis.uni-paderborn.de/pc2doc/Oculus-Software-VASP. The path in which the VASP-binaries (vasp_std, ...) are located is denoted as VASPBINDIR in the following, i.e., vasp_std is at $VASPBINDIR/vasp_std.

Installing USPEX[Bearbeiten]

    • Download and unpack USPEX 10.4.
    • run "bash install.sh"
    • choose 2 for terminal installation
    • Read and agree to the license conditions.
    • Type some directory as the installation path, e.g. /scratch/hpc-prf-PROJECTNAME/USPEX, this directory shall be known as USPEXINSTALLDIR
    • wait...
    • Check if the file $USPEXINSTALLDIR/install/application/USPEX exists.

Testing Basic USPEX INSTALLATION[Bearbeiten]

  • to test please try:
module load toolchain/foss/2018b
module load lang/Python/3.6.6-foss-2018b
  • You should get output like:
[rschade@fe1 EX01-3D_Si_vasp]$ $USPEXINSTALLDIR/application/USPEX 
/bin/bash: synclient: Kommando nicht gefunden.
*                                                       *
  _|    _|     _|_|_|   _|_|_|     _|_|_|_|   _|      _| 
  _|    _|   _|         _|    _|   _|           _|  _|   
  _|    _|     _|_|     _|_|_|     _|_|_|         _|     
  _|    _|         _|   _|         _|           _|  _|   
    _|_|     _|_|_|     _|         _|_|_|_|   _|      _| 
*                                                       *
** USPEX v10.4                           Oganov's Lab! **
  • You can ignore the following errors.

Installing Python-related Stuff[Bearbeiten]

module load toolchain/foss/2018b
module load lang/Python/3.6.6-foss-2018b
pip3 install --user numpy
git clone https://github.com/spglib/spglib.git
cd spglib/python 
python3 setup.py install --user

Testing for USPEX+VASP[Bearbeiten]

  • Copy $USPEXDIR/application/archive/examples/EX01-3D_Si_vasp.tgz to some directory and unpack it.
  • chnage to the directory EX01-3D_Si_vasp
  • change the lines between "% commandExecutable" and "% EndExecutable" to:
mpirun -genv I_MPI_DEBUG=4 -genvall -machinefile $CCS_NODEFILE vasp_std > log_VASP
  • Change whichCluster from 1 to 0.
  • Change numParallelCalcs from 30 to 1.
  • Create job.sh as a job script with the content:
#CCS --stdout=uspex_test.out
#CCS --stderr=uspex_test.err
#CCS --res=rset=16:ncpus=1:mpiprocs=1:ompthreads=1 
#CCS -t 24h

module purge
module add default-environment
module load lang/Python/3.6.6-foss-2018b
module load toolchain/foss/2018b
module load intel/19.0.4_compilers


export PATH=$PATH:$USPEXINSTALLDIR/application/archive:$VASPBINDIR:.

export USPEXPATH=$USPEXINSTALLDIR/application/archive/src
which python3
which USPEX
which vasp_std
rm still_running
bash EX01-3D_Si_vasp.sh
  • Make it executable: chmod +x job.sh
  • Submit is as a job.
  • You should see results in results1/OUTPUT.txt.
  • For actual calculations you should adapt the line resource settings of your job.


Access is restricted. Please apply for access. Several versions are installed.

mdlsearch turbomole

A template job script can be found in $PC2SW/examples.


Several versions are installed.

mdlsearch valgrind


Please note, that due to the licensing model of VASP, we cannot provide a compiled version to every user.

Please apply for a license.

As for Noctua ( Guide to compile VASP on Noctua) we provide a guide to compile VASP for Oculus.


vTune is part of the installed Intel Parallel Studio versions. The sampling driver to enable the Hardware Event-Based Sampling (EBS) analysis is available on the frontend fe2 and some compute nodes (request OpenCCS resource vtune).


module add intel/19.0.1
ccsalloc --res=rset=1:ncpus=16:vtune=true <yourscript>

To start the GUI on the frontend:

module add intel/19.0.1


xTB is a semiempirical extended tigh-binding program package. It can be used by loading the environment module:

module load chem/xtb/6.2.3-foss-2020b

xTB uses OpenMP parallelization. For the calculation of large molecular systems the stack memory size may need to be increased. The following settings can be added in your job script to increase the stack size for large calculations.

ulimit -s unlimited