Aus PC2 Doc
Wechseln zu: Navigation, Suche

How to find Software on Noctua

The path to the installed software is stored in the environment variable PC2SW. If you are looking if some software is already available we recommend to first look among the module files with module available. Additionally, you can look into the software directory with ls $PC2SW. If the software you need is not installed, you have several options:

  • If the software you need is available in Easybuild, then we can quickly install it for you. You can find the software supported by Easybuild at [1].
  • Prepare a singularity container at your local computer an copy it to Noctua. (You can also convert a Docker container to a Singularity container.)
  • We provide extra support for FPGA-related projects and projects in the field of computational chemistry and physics. Please let us know if you need any software or support in these fields (Mail to Special Advisors)
  • You can compile it yourself on the cluster. We are happy to help you if you encounter problems (Mail to PC2-support).

Chemistry/Physics Software


Hybrid MPI-OpenMP-version (psmp) compiled with the Intel Compiler

The latest hybrid MPI-OpenMP-version of CP2K (version 7.1) is installed on Noctua. The libraries enabled while building the cp2k.psmp executable are:

cp2kflags: omp libint fftw3 libxc elpa parallel mpi3 scalapack xsmm max_contr=4 mkl

If a CP2K version with additional libraries is needed for your simulation, please contact PC2-support.

To use CP2K (version 7.1) please load the following module in your job script:

module use $PC2SW/CHEM_PHYS_SW/modules
module load chem/CP2K/7.1

In addition the following environment variables are important for running CP2K simulations:

  • export OMP_NUM_THREADS=1 to disable OpenMP
  • export OMP_NUM_THREADS=[NUMBER_OF_THREADS] to use [NUMBER_OF_THREADS] threads per MPI rank
  • If you experience segmentation faults with OMP_NUM_THREADS greater than one, then please also use:
    • export OMP_STACKSIZE=100m, which sets the stack size for individual threads to 100 MB.
    • Depending on your calculation, you might need more than 100 MB for each thread. You can try larger values in your calculations.

Here is an example job script:

#SBATCH -J JobName
#SBATCH -A [Project_Name]
#SBATCH -p [Job_Partition]
#SBATCH -t 00:30:00

module reset
module use $PC2SW/CHEM_PHYS_SW/modules
module load chem/CP2K/7.1

export OMP_STACKSIZE=100m

mpirun -genv I_MPI_DEBUG=4 cp2k.psmp molecule.inp > out 2> err
  • Please replace [Project_Name] and [Job_Partition] with the project that you belong to and a suitable partition for this job.
  • For the best compute efficiency, [NUMBER_OF_NODES], [NUMBER_OF_MPI_RANKS_PER_NODE] and [NUMBER_OF_THREADS] should be chosen so that
    • [NUMBER_OF_NODES] times [NUMBER_OF_MPI_RANKS_PER_NODE] is a square of an integer number.
    • [NUMBER_OF_MPI_RANKS_PER_NODE] times [NUMBER_OF_THREADS] equals the number of CPU-cores per node (e.g. 40 on one compute node in Noctua)
  • To find good choices of [NUMBER_OF_NODES], [NUMBER_OF_MPI_RANKS_PER_NODE] and [NUMBER_OF_THREADS] you can also use the MPI/OpenMP-hybrid Execution planner (

FPGA-Accelerated CP2K for DFT with the Submatrix Method

  • details of the implementation in
  • To switch on FPGA usage, use the option SUBMATRIX_SIGN_METHOD NEWTONSCHULZ_fpga (or SUBMATRIX_SIGN_METHOD NEWTONSCHULZ_fpga_debug for debug details during the calculation).
  • The calculation must be done with one MPI rank per socket so that each MPI rank can access one of the two FPGAs in a node.
  • An example input file can be found at /cm/shared/apps/pc2/CHEM_PHYS_SW/CP2K/FPGA/cp2k/H2O-dft-ls.inp on Noctua.
  • Performance:
    • For the above example (water, DZVP basis) each FPGA (Intel Stratix 10 GX 2800) achieves a floating-point performance of 2.8-3.2 TFlops (at a power usage of 90-95 W). Due to the massively-parallel design of the submatrix method, many FPGAs can be used efficiently. The transfers from the CPU to the FPGAs create some overhead and we are actively working on reducing it.
  • Example jobscript:
#SBATCH -p fpga
#SBATCH -A pc2-mitarbeiter
#SBATCH -t 2:00:00
#SBATCH --constraint=19.4.0_hpc

module use /cm/shared/apps/pc2/CHEM_PHYS_SW/modules
module load chem/CP2K/submatrix_fpga
export OMP_PROC_BIND=master
export OMP_PLACES=sockets
mpirun --bind-to socket --map-by ppr:1:socket --report-bindings -mca btl ^openib -mca pml cm --output-filename output cp2k.psmp H2O-dft-ls.inp


  • Access is restricted. Please apply for access.
  • use module load g03, module load g09 or module load g16 to load the newest revision.
  • Useful information for running Gaussian computation with the checkpoint file.


  • Gromacs 2019:
    • use module load bio/GROMACS/2019-foss-2018b
    • examples under $PC2SW/examples/

i-PI v2

  • please contact PC2-support because the setup depends on the chosen backend and size of simulation


  • 10.0
    • module use $PC2SW/CHEM_PHYS_SW/modules
    • module load chem/OCTOPUS/10_0
    • details: octopus 10.0, max-dim=3 mpi sse2 avx libxc4 metis mpi2
    • Testsuite successful (Result "Everything seems to be OK")
  • 9.1
    • module use $PC2SW/CHEM_PHYS_SW/modules
    • module load chem/OCTOPUS/9_1
    • details: octopus 9.1, max-dim=3 mpi sse2 avx metis mpi2
  • 8.4
    • module use $PC2SW/CHEM_PHYS_SW/modules
    • module load chem/OCTOPUS/8_4
    • details: octopus 8.4, max-dim=3 mpi sse2 avx metis mpi2

Orca ($PC2SW/ORCA)

Access is restricted. Please apply for access.

To use the latest version of ORCA

module load orca/4.2.1

After loading the ORCA module, the manual can be found in the directory $ORCA_PATH.

A template Slurm jobscript is in $PC2SW/examples. This script can be used to submit ORCA computation: orca_input_file walltime_in_minutes
  • orca_input_file is the name of ORCA input file.
  • walltime_in_minutes is the compute time given in minutes.

Other Slurm settings can be automatically generated by using


  • 1.2.1
    • module use $PC2SW/CHEM_PHYS_SW/modules
    • module load chem/salmon/1.2.1
    • example jobscript at $PC2SW/CHEM_PHYS_SW/SALMON/1_2_1/SALMON-v.1.2.1/
    • results of testsuits at $PC2SW/CHEM_PHYS_SW/SALMON/1_2_1/SALMON-v.1.2.1/tests.log


  • Access is restricted. Please apply for access.
  • use module spider turbomole to find the available versions)



  • 21.1.0
    • example jobscript:
#SBATCH -J gpawtest
#SBATCH -A pc2-mitarbeiter
#SBATCH -p batch
#SBATCH -t 02:00:00

module reset
module load chem/GPAW/21.1.0-foss-2020a-ASE-3.21.1
mpirun --report-bindings -mca btl self,vader -mca mtl psm2 python3
    • If you plan to perform calculations with more than 8 compute nodes, please let us know so that we can prepare a version that is better optimized for this case.
    • details:
      • gpaw-setups-0.9.20000
      • MPI enabled yes (OpenMPI)
      • scalapack yes
      • Elpa yes; version: 2020.11.001
      • FFTW yes
    • Gpaw tests: successful
    • GPAW benchmarks:


  • 6.8.4
    • example jobscript:
#SBATCH -J elktest
#SBATCH -A [account name]
#SBATCH -p batch
#SBATCH --ntasks-per-node=[MPI-RANKS per Node]
#SBATCH -t 02:00:00

module reset
module use $PC2SW/CHEM_PHYS_SW/modules
module load chem/ELK/6_8_4


mpirun -genv I_MPI_DEBUG=4 elk
    • Elk supports three levels of parallelism at the same time (see also section 4.1.1):
      • MPI: set MPI ranks per node with sbatch argument --ntasks-per-node
      • Internal OpenMP: set threads with export OMP_NUM_THREADS= or option maxthd in
      • Threading in BLAS/LAPACK libraries: set threads with export MKL_NUM_THREADS= or option maxthdmkl in
    • The best choice of parameters for the parallelization has to be examined for your workload. The product of OMP_NUM_THREADS, MKL_NUM_THREADS and-per-node should equal the number of cpu-cores on a node, i.e. 40 in case of Noctua. We recommend to first try to change the number of MPI ranks and OMP_NUM_THREADS while not using threading inside BLAS/LAPACK libraries (i.e., MKL_NUM_THREADS=1 or maxthdmkl=1).


  • 3.8.0
    • example jobscript:
#SBATCH -J liggghtstest
#SBATCH -A [account name]
#SBATCH -p batch
#SBATCH -t 02:00:00

module reset
module use $PC2SW/CHEM_PHYS_SW/modules
module load chem/LIGGGHTS-PUBLIC/3.8.0


mpirun --report-bindings lmp_auto < input

Math and Data Analysis Software

  • Matlab (module load matlab/2018a)
    • Only available for members of Paderborn University.
  • Python 2.7/3.7
  • Julia (module load lang/Julia/1.4.2-linux-x86_64)
    • Usage in Jupyter Notebooks on Noctua
      • Please note, that on Oculus you can use the Juypterspawner that automizes most of the steps below.
      • initial setup:
        • start julia and enter using Pkg, then Pkg.add("IJulia")
        • run module load tools/IPython/7.2.0-foss-2018b-Python-3.6.6
        • run jupyter-notebook --no-browser --generate-config
        • run ipython -c "from notebook.auth import passwd; passwd()", copy the result, i.e. a string that looks like sha1:255adaa93208:9c18f63e8fee61f297a417cdb6444af684a979e5
        • open ~/.jupyter/ and:
          • change the line #c.NotebookApp.password = to c.NotebookApp.password = "sha1:...." with the copied string
          • change the line #c.NotebookApp.password_required = False to c.NotebookApp.password_required = True
      • Usage:
        • run module load tools/IPython/7.2.0-foss-2018b-Python-3.6.6
        • run jupyter-notebook --port=PORT --no-browser with a PORT number (1024<PORT<65536) of your choice. If no error shows up this notebook is now running on the machine that you are loggin into, i.e., one of the frontends. So please don't do long calculations there. If you would like to run a Jupyter notebook on a compute node, please contact
        • Forward the selected port of your local computer, i.e., ssh -L-magic.
        • Open a local web browser with the http://localhost:PORT then select New on the right side and select Julia 1.4.2. The resulting Jupyter notebook is a Julia notebook.

Container Frameworks

  • Singularity (module load singularity)


all tools

  • module use $PC2SW/CHEM_PHYS_SW/modules
  • then see module av

Tools from Intel Parallel Studio XE

All tools from Intel Parallel Studio XE are available on Noctua:

TexLive 2019 (LaTex)

  • module use $PC2SW/CHEM_PHYS_SW/modules
  • module load tools/texlive_2019


module use $PC2SW/CHEM_PHYS_SW/modules
module load tools/likwid_5.1.0
  • Please deactivate the Job-Specific-Monitoring by adding the option --collectors=off to sbatch, srun or your jobscript as
#SBATCH --collectors=off

if you want to use Likwid in your job. This is necessary because Likwid and the Job-Specific-Monitoring need access to hardware-performance counters and if two programs access them at the same time issues like incorrect values might occur.

 g++ -fopenmp -DLIKWID_PERFMON -L$LIKWID_LIB -I$LIKWID_INCLUDE program.cpp -o program.x -llikwid
  • You can execute it with likwid-perfctr for example with
 likwid-perfctr -C 0 -g BRANCH -m ./program.x

Programming Environments

On Noctua the following programming environments are available.

Intel Parallel Studio XE

Cray Programming Environment (Cray PE)

  • Cray Debugger (gdb4hpc)

GNU Tools


  • TotalView is an advanced debugger for HPC applications. To use TotalView on Noctua
module load totalview/20.20.1
  • For the best performance and lowest overhead while debugging an MPI parallel application using TotalView on Noctua, we recommend using TotalView remote debugging with Intel MPI library. We have prepared a short tutorial for TotalView remote debugging.