Noctua-Software

Aus PC2 Doc
Wechseln zu: Navigation, Suche

How to find Software on Noctua

The path to the installed software is stored in the environment variable PC2SW. If you are looking if some software is already available we recommend to first look among the module files with module available. Additionally, you can look into the software directory with ls $PC2SW. If the software you need is not installed, you have several options:

  • If the software you need is available in Easybuild, then we can quickly install it for you. You can find the software supported by Easybuild at [1].
  • Prepare a singularity container at your local computer an copy it to Noctua. (You can also convert a Docker container to a Singularity container.)
  • We provide extra support for FPGA-related projects and projects in the field of computational chemistry and physics. Please let us know if you need any software or support in these fields (Mail to Special Advisors)
  • You can compile it yourself on the cluster. We are happy to help you if you encounter problems (Mail to PC2-support).

Chemistry/Physics Software

CP2K

Hybrid MPI-OpenMP-version (psmp) compiled with the Intel Compiler

CP2K 7.1

WARNING: Users have experienced memory leaks when using the FULL_ALL preconditioner when using this version. If you experience this problem, please use the version built with the GNU-compiler described below.

The hybrid MPI-OpenMP-version of CP2K (version 7.1) is installed on Noctua. The libraries enabled while building the cp2k.psmp executable are:

cp2kflags: omp libint fftw3 libxc elpa parallel mpi3 scalapack xsmm max_contr=4 mkl

If a CP2K version with additional libraries is needed for your simulation, please contact PC2-support.

To use CP2K (version 7.1) please load the following module in your job script:

module use $PC2SW/CHEM_PHYS_SW/modules
module load chem/CP2K/7.1

In addition the following environment variables are important for running CP2K simulations:

  • export OMP_NUM_THREADS=1 to disable OpenMP
  • export OMP_NUM_THREADS=[NUMBER_OF_THREADS] to use [NUMBER_OF_THREADS] threads per MPI rank
  • If you experience segmentation faults with OMP_NUM_THREADS greater than one, then please also use:
    • export OMP_STACKSIZE=100m, which sets the stack size for individual threads to 100 MB.
    • Depending on your calculation, you might need more than 100 MB for each thread. You can try larger values in your calculations.

Here is an example job script:

#!/bin/bash
#SBATCH -J JobName
#SBATCH -A [Project_Name]
#SBATCH -p [Job_Partition]
#SBATCH -N [NUMBER_OF_NODES]
#SBATCH --ntasks-per-node=[NUMBER_OF_MPI_RANKS_PER_NODE]
#SBATCH -t 00:30:00

module reset
module use $PC2SW/CHEM_PHYS_SW/modules
module load chem/CP2K/7.1

export OMP_NUM_THREADS=[NUMBER_OF_THREADS]
export OMP_STACKSIZE=100m

mpirun -genv I_MPI_DEBUG=4 cp2k.psmp molecule.inp > out 2> err
  • Please replace [Project_Name] and [Job_Partition] with the project that you belong to and a suitable partition for this job.
  • For the best compute efficiency, [NUMBER_OF_NODES], [NUMBER_OF_MPI_RANKS_PER_NODE] and [NUMBER_OF_THREADS] should be chosen so that
    • [NUMBER_OF_NODES] times [NUMBER_OF_MPI_RANKS_PER_NODE] is a square of an integer number.
    • [NUMBER_OF_MPI_RANKS_PER_NODE] times [NUMBER_OF_THREADS] equals the number of CPU-cores per node (e.g. 40 on one compute node in Noctua)
  • To find good choices of [NUMBER_OF_NODES], [NUMBER_OF_MPI_RANKS_PER_NODE] and [NUMBER_OF_THREADS] you can also use the MPI/OpenMP-hybrid Execution planner (https://github.com/cp2k/cp2k/tree/master/tools/plan_mpi_omp).

Hybrid MPI-OpenMP-version (psmp) compiled with the GNU Compiler

CP2K 7.1 and 8.1

The hybrid MPI-OpenMP-version of CP2K (version 7.1 and 8.1) is installed on Noctua. The libraries enabled while building the cp2k.psmp executable are:

for CP2K 7.1: cp2kflags: omp libint fftw3 libxc elpa parallel mpi3 scalapack xsmm spglib

for CP2K 8.1: cp2kflags: omp libint fftw3 libxc elpa parallel mpi3 scalapack xsmm spglib libvori libbqb

If a CP2K version with additional libraries is needed for your simulation, please contact PC2-support.

To use this CP2K version please load the following module in your job script:

module use $PC2SW/CHEM_PHYS_SW/modules
module load chem/CP2K/7.1_gnu

or

module use $PC2SW/CHEM_PHYS_SW/modules
module load chem/CP2K/8.1_gnu

In addition the following environment variables are important for running CP2K simulations:

  • export OMP_NUM_THREADS=1 to disable OpenMP
  • export OMP_NUM_THREADS=[NUMBER_OF_THREADS] to use [NUMBER_OF_THREADS] threads per MPI rank
  • If you experience segmentation faults with OMP_NUM_THREADS greater than one, then please also use:
    • export OMP_STACKSIZE=100m, which sets the stack size for individual threads to 100 MB.
    • Depending on your calculation, you might need more than 100 MB for each thread. You can try larger values in your calculations.

Here is an example job script:

#!/bin/bash
#SBATCH -J JobName
#SBATCH -A [Project_Name]
#SBATCH -p [Job_Partition]
#SBATCH -N [NUMBER_OF_NODES]
#SBATCH --ntasks-per-node=[NUMBER_OF_MPI_RANKS_PER_NODE]
#SBATCH -t 00:30:00

module reset
module use $PC2SW/CHEM_PHYS_SW/modules
module load chem/CP2K/7.1_gnu
#or
#module load chem/CP2K/8.1_gnu

export OMP_NUM_THREADS=[NUMBER_OF_THREADS]
export OMP_STACKSIZE=100m

mpirun --report-bindings cp2k.psmp molecule.inp > out 2> err
  • Please replace [Project_Name] and [Job_Partition] with the project that you belong to and a suitable partition for this job.
  • For the best compute efficiency, [NUMBER_OF_NODES], [NUMBER_OF_MPI_RANKS_PER_NODE] and [NUMBER_OF_THREADS] should be chosen so that
    • [NUMBER_OF_NODES] times [NUMBER_OF_MPI_RANKS_PER_NODE] is a square of an integer number.
    • [NUMBER_OF_MPI_RANKS_PER_NODE] times [NUMBER_OF_THREADS] equals the number of CPU-cores per node (e.g. 40 on one compute node in Noctua)
  • To find good choices of [NUMBER_OF_NODES], [NUMBER_OF_MPI_RANKS_PER_NODE] and [NUMBER_OF_THREADS] you can also use the MPI/OpenMP-hybrid Execution planner (https://github.com/cp2k/cp2k/tree/master/tools/plan_mpi_omp).

FPGA-Accelerated CP2K for DFT with the Submatrix Method

  • details of the implementation in
  • To switch on FPGA usage, use the option SUBMATRIX_SIGN_METHOD NEWTONSCHULZ_fpga (or SUBMATRIX_SIGN_METHOD NEWTONSCHULZ_fpga_debug for debug details during the calculation).
  • The calculation must be done with one MPI rank per socket so that each MPI rank can access one of the two FPGAs in a node.
  • An example input file can be found at /cm/shared/apps/pc2/CHEM_PHYS_SW/CP2K/FPGA/cp2k/H2O-dft-ls.inp on Noctua.
  • Performance:
    • For the above example (water, DZVP basis) each FPGA (Intel Stratix 10 GX 2800) achieves a floating-point performance of 2.8-3.2 TFlops (at a power usage of 90-95 W). Due to the massively-parallel design of the submatrix method, many FPGAs can be used efficiently. The transfers from the CPU to the FPGAs create some overhead and we are actively working on reducing it.
  • Example jobscript:
#!/bin/bash
#SBATCH -p fpga
#SBATCH -A pc2-mitarbeiter
#SBATCH -N 2
#SBATCH -t 2:00:00
#SBATCH --constraint=19.4.0_hpc

module use /cm/shared/apps/pc2/CHEM_PHYS_SW/modules
module load chem/CP2K/submatrix_fpga
export OMP_NUM_THREADS=20
export OMP_PROC_BIND=master
export OMP_PLACES=sockets
mpirun --bind-to socket --map-by ppr:1:socket --report-bindings -mca btl ^openib -mca pml cm --output-filename output cp2k.psmp H2O-dft-ls.inp
 

Gaussian

  • Access is restricted. Please apply for access.
  • use module load g03, module load g09 or module load g16 to load the newest revision.
  • Useful information for running Gaussian computation with the checkpoint file.

Gromacs

  • Gromacs 2019:
    • use module load bio/GROMACS/2019-foss-2018b
    • examples under $PC2SW/examples/gromacs_slurm.sh

i-PI v2

  • please contact PC2-support because the setup depends on the chosen backend and size of simulation

Octopus

  • 10.5
    • module use $PC2SW/CHEM_PHYS_SW/modules
    • module load chem/OCTOPUS/10_5
    • NOTE: This version supports hybrid parallelization with MPI and OpenMP. I you have only used MPI-parallelization till now and want to continue to do so, please make sure to set the number of threads to 1 with "export OMP_NUM_THREADS=1" in your job script.
    • details: octopus 10.5, max-dim=3 openmp mpi sse2 avx libxc4 metis mpi2
  • 10.0
    • module use $PC2SW/CHEM_PHYS_SW/modules
    • module load chem/OCTOPUS/10_0
    • details: octopus 10.0, max-dim=3 mpi sse2 avx libxc4 metis mpi2
    • Testsuite successful (Result "Everything seems to be OK")
  • 9.1
    • module use $PC2SW/CHEM_PHYS_SW/modules
    • module load chem/OCTOPUS/9_1
    • details: octopus 9.1, max-dim=3 mpi sse2 avx metis mpi2
  • 8.4
    • module use $PC2SW/CHEM_PHYS_SW/modules
    • module load chem/OCTOPUS/8_4
    • details: octopus 8.4, max-dim=3 mpi sse2 avx metis mpi2

Orca ($PC2SW/ORCA)

Access is restricted. Please apply for access.

To use the latest version of ORCA

module load orca/5.0.1

After loading the ORCA module, the manual can be found in the directory $ORCA_PATH.

A template Slurm jobscript ($PC2SW/examples/orca.sh) is prepared for you to submit ORCA computation:

orca.sh orca_input_file[.inp] walltime [ORCA-version] [xTB-version]
  • orca_input_file[.inp] is the name of ORCA input file ([.inp] is optional).
  • walltime is the compute walltime.
  • [ORCA-version] is the ORCA version (optional).
  • [xTB-version] is the xTB version (optional).

For example, the following command submits a calculation for caffeine.inp with the walltime of 2 hours, the ORCA version 5.0.1 and the xTB version 6.2.3:

orca.sh caffeine.inp 2h 501 623

More example calculations can be found in $PC2SW/examples/orca_xtb.

Quantum ESPRESSO

Quantum ESPRESSO is a suite for first-principles electronic-structure calculations and materials modeling. The latest version 6.7 built with both the foss-2020b and intel-2020b toolchains is installed on Noctua.

  • to use Quantum ESPRESSO built with the foss-2020b toolchain please use
module load chem/QuantumESPRESSO/6.7-foss-2020b
  • to use Quantum ESPRESSO built with the intel-2020b toolchain please use
module load chem/QuantumESPRESSO/6.7-intel-2020b

Quantum ESPRESSO supports hybrid MPI/OpenMP. An example for molecular dynamics simulation of silicon using 2 compute nodes of Noctua, 4 MPI ranks per compute node and 10 OpenMP threads for each MPI rank can be found in $PC2SW/examples/QuantumESPRESSO.

Salmon

  • 1.2.1
    • module use $PC2SW/CHEM_PHYS_SW/modules
    • module load chem/salmon/1.2.1
    • example jobscript at $PC2SW/CHEM_PHYS_SW/SALMON/1_2_1/SALMON-v.1.2.1/salmon_job.sh
    • results of testsuits at $PC2SW/CHEM_PHYS_SW/SALMON/1_2_1/SALMON-v.1.2.1/tests.log

Turbomole

  • Access is restricted. Please apply for access.
  • use module spider turbomole to find the available versions)

VASP

GPAW

  • 21.1.0
    • example jobscript:
#!/bin/bash
#SBATCH -N 1
#SBATCH -J gpawtest
#SBATCH -A pc2-mitarbeiter
#SBATCH -p batch
#SBATCH -t 02:00:00

module reset
module load chem/GPAW/21.1.0-foss-2020a-ASE-3.21.1
export OMP_NUM_TRHEADS=1
mpirun --report-bindings -mca btl self,vader -mca mtl psm2 python3 input.py
 
    • If you plan to perform calculations with more than 8 compute nodes, please let us know so that we can prepare a version that is better optimized for this case.
    • details:
      • gpaw-setups-0.9.20000
      • MPI enabled yes (OpenMPI)
      • scalapack yes
      • Elpa yes; version: 2020.11.001
      • FFTW yes
    • Gpaw tests: successful
    • GPAW benchmarks:

ELK

  • 6.8.4
    • example jobscript:
#!/bin/bash
#SBATCH -N 1
#SBATCH -J elktest
#SBATCH -A [account name]
#SBATCH -p batch
#SBATCH --ntasks-per-node=[MPI-RANKS per Node]
#SBATCH -t 02:00:00

module reset
module use $PC2SW/CHEM_PHYS_SW/modules
module load chem/ELK/6_8_4

export OMP_NUM_THREADS=[NUMBER OF OPENMP THREADS per Rank]
export MKL_NUM_THREADS=[NUMBER OF OPENMP THREADS per Rank in BLAS/LAPACK inside the MKL]

export OMP_STACKSIZE=64M
mpirun -genv I_MPI_DEBUG=4 elk
 
    • Elk supports three levels of parallelism at the same time (see also http://elk.sourceforge.net/elk.pdf section 4.1.1):
      • MPI: set MPI ranks per node with sbatch argument --ntasks-per-node
      • Internal OpenMP: set threads with export OMP_NUM_THREADS= or option maxthd in elk.in
      • Threading in BLAS/LAPACK libraries: set threads with export MKL_NUM_THREADS= or option maxthdmkl in elk.in
    • The best choice of parameters for the parallelization has to be examined for your workload. The product of OMP_NUM_THREADS, MKL_NUM_THREADS and-per-node should equal the number of cpu-cores on a node, i.e. 40 in case of Noctua. We recommend to first try to change the number of MPI ranks and OMP_NUM_THREADS while not using threading inside BLAS/LAPACK libraries (i.e., MKL_NUM_THREADS=1 or maxthdmkl=1).

LIGGGHTS-PUBLIC

  • 3.8.0
    • example jobscript:
#!/bin/bash
#SBATCH -N 1
#SBATCH -J liggghtstest
#SBATCH -A [account name]
#SBATCH -p batch
#SBATCH -t 02:00:00

module reset
module use $PC2SW/CHEM_PHYS_SW/modules
module load chem/LIGGGHTS-PUBLIC/3.8.0

export OMP_NUM_THREADS=1

export OMP_STACKSIZE=64M
mpirun --report-bindings lmp_auto < input
 

GULP

  • 6.0
    • How to build for Noctua with mpifort:
      • load Intel-module: "module load intel/20.02_compilers"
      • modify mkgulp-file in Src
        • add "-xHost" after -O3/-O2/-O1 to line 224-226
        • change "mpif90" in line 243 to "mpiifort"
        • change "mpicc" in line 244 to "mpiicc"
        • run "bash mkgulp -j 10 -m -t gulp -c intel" and get a coffee
        • there should now be a executable named "gulp" in the Src-directory
    • example jobscript:
#!/bin/bash
#SBATCH -N 1
#SBATCH -J gulptest
#SBATCH -A [account name]
#SBATCH -p batch
#SBATCH -t 01:00:00

module reset
module load intel/20.02_compilers

export OMP_NUM_THREADS=1
GULPDIR=[Put here the directory that contains the Src-Directory]

mpirun -genv I_MPI_DEBUG=4  $GULPDIR/Src/gulp input
 

xTB

xTB is a semiempirical extended tigh-binding program package. It can be used by loading the environment module:

module load chem/xtb/6.2.3-foss-2020b

xTB uses OpenMP parallelization. For the calculation of large molecular systems the stack memory size may need to be increased. The following settings can be added in your job script to increase the stack size for large calculations.

ulimit -s unlimited
export OMP_STACKSIZE=4G

Math and Data Analysis Software

  • Matlab (module load matlab/2018a)
    • Only available for members of Paderborn University.
  • Python 2.7/3.7
  • Julia (module load lang/Julia/1.4.2-linux-x86_64)
    • Usage in Jupyter Notebooks on Noctua
      • Please note, that on Oculus you can use the Juypterspawner that automizes most of the steps below.
      • initial setup:
        • start julia and enter using Pkg, then Pkg.add("IJulia")
        • run module load tools/IPython/7.2.0-foss-2018b-Python-3.6.6
        • run jupyter-notebook --no-browser --generate-config
        • run ipython -c "from notebook.auth import passwd; passwd()", copy the result, i.e. a string that looks like sha1:255adaa93208:9c18f63e8fee61f297a417cdb6444af684a979e5
        • open ~/.jupyter/jupyter_notebook_config.py and:
          • change the line #c.NotebookApp.password = to c.NotebookApp.password = "sha1:...." with the copied string
          • change the line #c.NotebookApp.password_required = False to c.NotebookApp.password_required = True
      • Usage:
        • run module load tools/IPython/7.2.0-foss-2018b-Python-3.6.6
        • run jupyter-notebook --port=PORT --no-browser with a PORT number (1024<PORT<65536) of your choice. If no error shows up this notebook is now running on the machine that you are loggin into, i.e., one of the frontends. So please don't do long calculations there. If you would like to run a Jupyter notebook on a compute node, please contact pc2-support@uni-paderborn.de.
        • Forward the selected port of your local computer, i.e., ssh -L-magic.
        • Open a local web browser with the http://localhost:PORT then select New on the right side and select Julia 1.4.2. The resulting Jupyter notebook is a Julia notebook.

Container Frameworks

  • Singularity (module load singularity)

Tools

all tools

  • module use $PC2SW/CHEM_PHYS_SW/modules
  • then see module av

Tools from Intel Parallel Studio XE

All tools from Intel Parallel Studio XE are available on Noctua:

TexLive 2019 (LaTex)

  • module use $PC2SW/CHEM_PHYS_SW/modules
  • module load tools/texlive_2019

LIKWID

module use $PC2SW/CHEM_PHYS_SW/modules
module load tools/likwid_5.1.0
  • Please deactivate the Job-Specific-Monitoring by adding the option --collectors=off to sbatch, srun or your jobscript as
#SBATCH --collectors=off

if you want to use Likwid in your job. This is necessary because Likwid and the Job-Specific-Monitoring need access to hardware-performance counters and if two programs access them at the same time issues like incorrect values might occur.

 g++ -fopenmp -DLIKWID_PERFMON -L$LIKWID_LIB -I$LIKWID_INCLUDE program.cpp -o program.x -llikwid
  • You can execute it with likwid-perfctr for example with
 likwid-perfctr -C 0 -g BRANCH -m ./program.x

Programming Environments

On Noctua the following programming environments are available.

Intel Parallel Studio XE

Cray Programming Environment (Cray PE)

  • Cray Debugger (gdb4hpc)

GNU Tools

TotalView

  • TotalView is an advanced debugger for HPC applications. To use TotalView on Noctua
module load totalview/20.20.1
  • For the best performance and lowest overhead while debugging an MPI parallel application using TotalView on Noctua, we recommend using TotalView remote debugging with Intel MPI library. We have prepared a short tutorial for TotalView remote debugging.

ARM FORGE

  • ARM FORGE is a toolsuite for software development. To use it on Noctua
module load ARM/20.1.2
  • ARM FORGE contains two components:
    • ARM DDT: a powerful parallel debugger. To use it run ddt after loading the module file.
    • ARM MAP: a scalable low-overhead profiler. To use it run map after loading the module file.