Noctua-Software

Aus PC2 Doc
Wechseln zu: Navigation, Suche

How to find Software on Noctua

The path to the installed software is stored in the environment variable PC2SW. If you are looking if some software is already available we recommend to first look among the module files with module available. Additionally, you can look into the software directory with ls $PC2SW. If the software you need is not installed, you have several options:

  • If the software you need is available in Easybuild, then we can quickly install it for you. You can find the software supported by Easybuild at [1].
  • Prepare a singularity container at your local computer an copy it to Noctua. (You can also convert a Docker container to a Singularity container.)
  • We provide extra support for FPGA-related projects and projects in the field of computational chemistry and physics. Please let us know if you need any software or support in these fields (Mail to Special Advisors)
  • You can compile it yourself on the cluster. We are happy to help you if you encounter problems (Mail to PC2-support).

Chemistry/Physics Software

CP2K

Hybrid MPI-OpenMP-version (psmp) compiled with the Intel Compiler

The latest hybrid MPI-OpenMP-version of CP2K (version 7.1) is installed on Noctua. The libraries enabled while building the cp2k.psmp executable are:

cp2kflags: omp libint fftw3 libxc elpa parallel mpi3 scalapack xsmm max_contr=4 mkl

If a CP2K version with additional libraries is needed for your simulation, please contact PC2-support.

To use CP2K (version 7.1) please load the following module in your job script:

module use $PC2SW/CHEM_PHYS_SW/modules
module load chem/CP2K/7.1

In addition the following environment variables are important for running CP2K simulations:

  • export OMP_NUM_THREADS=1 to disable OpenMP
  • export OMP_NUM_THREADS=[NUMBER_OF_THREADS] to use [NUMBER_OF_THREADS] threads per MPI rank
  • If you experience segmentation faults with OMP_NUM_THREADS greater than one, then please also use:
    • export OMP_STACKSIZE=100m, which sets the stack size for individual threads to 100 MB.
    • Depending on your calculation, you might need more than 100 MB for each thread. You can try larger values in your calculations.

Here is an example job script:

#!/bin/bash
#SBATCH -J JobName
#SBATCH -A [Project_Name]
#SBATCH -p [Job_Partition]
#SBATCH -N [NUMBER_OF_NODES]
#SBATCH --ntasks-per-node=[NUMBER_OF_MPI_RANKS_PER_NODE]
#SBATCH -t 00:30:00

module reset
module use $PC2SW/CHEM_PHYS_SW/modules
module load chem/CP2K/7.1

export OMP_NUM_THREADS=[NUMBER_OF_THREADS]
export OMP_STACKSIZE=100m

mpirun -genv I_MPI_DEBUG=4 cp2k.psmp molecule.inp > out 2> err
  • Please replace [Project_Name] and [Job_Partition] with the project that you belong to and a suitable partition for this job.
  • For the best compute efficiency, [NUMBER_OF_NODES], [NUMBER_OF_MPI_RANKS_PER_NODE] and [NUMBER_OF_THREADS] should be chosen so that
    • [NUMBER_OF_NODES] times [NUMBER_OF_MPI_RANKS_PER_NODE] is a square of an integer number.
    • [NUMBER_OF_MPI_RANKS_PER_NODE] times [NUMBER_OF_THREADS] equals the number of CPU-cores per node (e.g. 40 on one compute node in Noctua)
  • To find good choices of [NUMBER_OF_NODES], [NUMBER_OF_MPI_RANKS_PER_NODE] and [NUMBER_OF_THREADS] you can also use the MPI/OpenMP-hybrid Execution planner (https://github.com/cp2k/cp2k/tree/master/tools/plan_mpi_omp).

FPGA-Accelerated CP2K for DFT with the Submatrix Method

  • details of the implementation in
  • To switch on FPGA usage, use the option SUBMATRIX_SIGN_METHOD NEWTONSCHULZ_fpga (or SUBMATRIX_SIGN_METHOD NEWTONSCHULZ_fpga_debug for debug details during the calculation).
  • The calculation must be done with one MPI rank per socket so that each MPI rank can access one of the two FPGAs in a node.
  • An example input file can be found at /cm/shared/apps/pc2/CHEM_PHYS_SW/CP2K/FPGA/cp2k/H2O-dft-ls.inp on Noctua.
  • Performance:
    • For the above example (water, DZVP basis) each FPGA (Intel Stratix 10 GX 2800) achieves a floating-point performance of 2.8-3.2 TFlops (at a power usage of 90-95 W). Due to the massively-parallel design of the submatrix method, many FPGAs can be used efficiently. The transfers from the CPU to the FPGAs create some overhead and we are actively working on reducing it.
  • Example jobscript:
#!/bin/bash
#SBATCH -p fpga
#SBATCH -A pc2-mitarbeiter
#SBATCH -N 2
#SBATCH -t 2:00:00
#SBATCH --constraint=19.4.0_hpc

module use /cm/shared/apps/pc2/CHEM_PHYS_SW/modules
module load chem/CP2K/submatrix_fpga
export OMP_NUM_THREADS=20
export OMP_PROC_BIND=master
export OMP_PLACES=sockets
mpirun --bind-to socket --map-by ppr:1:socket --report-bindings -mca btl ^openib -mca pml cm --output-filename output cp2k.psmp H2O-dft-ls.inp
 

Gaussian

  • Access is restricted. Please apply for access.
  • use module load g03, module load g09 or module load g16

Gromacs

  • Gromacs 2019:
    • use module load bio/GROMACS/2019-foss-2018b
    • examples under $PC2SW/examples/gromacs_slurm.sh

i-PI v2

  • please contact PC2-support because the setup depends on the chosen backend and size of simulation

Octopus

  • 10.0
    • module use $PC2SW/CHEM_PHYS_SW/modules
    • module load chem/OCTOPUS/10_0
    • details: octopus 10.0, max-dim=3 mpi sse2 avx libxc4 metis mpi2
    • Testsuite successful (Result "Everything seems to be OK")
  • 9.1
    • module use $PC2SW/CHEM_PHYS_SW/modules
    • module load chem/OCTOPUS/9_1
    • details: octopus 9.1, max-dim=3 mpi sse2 avx metis mpi2
  • 8.4
    • module use $PC2SW/CHEM_PHYS_SW/modules
    • module load chem/OCTOPUS/8_4
    • details: octopus 8.4, max-dim=3 mpi sse2 avx metis mpi2

Orca ($PC2SW/ORCA)

Access is restricted. Please apply for access.

Salmon

  • 1.2.1
    • module use $PC2SW/CHEM_PHYS_SW/modules
    • module load chem/salmon/1.2.1
    • example jobscript at $PC2SW/CHEM_PHYS_SW/SALMON/1_2_1/SALMON-v.1.2.1/salmon_job.sh
    • results of testsuits at $PC2SW/CHEM_PHYS_SW/SALMON/1_2_1/SALMON-v.1.2.1/tests.log

Turbomole

  • Access is restricted. Please apply for access.
  • use module spider turbomole to find the available versions)

VASP

GPAW

  • 20.1.0
    • example jobscript:
#!/bin/bash
#SBATCH -N 1
#SBATCH -J gpawtest
#SBATCH -A pc2-mitarbeiter
#SBATCH -p batch
#SBATCH -t 02:00:00

module reset
module use $PC2SW/CHEM_PHYS_SW/modules
module load chem/GPAW/20_1_0
export OMP_NUM_TRHEADS=1
mpirun --report-bindings -mca btl self,vader -mca mtl psm2 python3 input.py
 
    • If you plan to perform calculations with more than 8 compute nodes, please let us know so that we can prepare a version that is better optimized for this case.
    • details:
      • gpaw-setups-0.9.20000
      • MPI enabled yes (OpenMPI)
      • scalapack yes
      • Elpa yes; version: 20191110
      • FFTW yes
    • Gpaw tests: successful
    • GPAW benchmarks:

ELK

  • 6.8.4
    • example jobscript:
#!/bin/bash
#SBATCH -N 1
#SBATCH -J elktest
#SBATCH -A [account name]
#SBATCH -p batch
#SBATCH --ntasks-per-node=[MPI-RANKS per Node]
#SBATCH -t 02:00:00

module reset
module use $PC2SW/CHEM_PHYS_SW/modules
module load chem/ELK/6_8_4

export OMP_NUM_THREADS=[NUMBER OF OPENMP THREADS per Rank]
export MKL_NUM_THREADS=[NUMBER OF OPENMP THREADS per Rank in BLAS/LAPACK inside the MKL]

export OMP_STACKSIZE=64M
mpirun -genv I_MPI_DEBUG=4 elk
 
    • Elk supports three levels of parallelism at the same time (see also http://elk.sourceforge.net/elk.pdf section 4.1.1):
      • MPI: set MPI ranks per node with sbatch argument --ntasks-per-node
      • Internal OpenMP: set threads with export OMP_NUM_THREADS= or option maxthd in elk.in
      • Threading in BLAS/LAPACK libraries: set threads with export MKL_NUM_THREADS= or option maxthdmkl in elk.in
    • The best choice of parameters for the parallelization has to be examined for your workload. The product of OMP_NUM_THREADS, MKL_NUM_THREADS and-per-node should equal the number of cpu-cores on a node, i.e. 40 in case of Noctua. We recommend to first try to change the number of MPI ranks and OMP_NUM_THREADS while not using threading inside BLAS/LAPACK libraries (i.e., MKL_NUM_THREADS=1 or maxthdmkl=1).

Math and Data Analysis Software

  • Matlab (module load matlab/2018a)
  • Python 2.7/3.7
  • Julia (module load lang/Julia/1.4.2-linux-x86_64)
    • Usage in Jupyter Notebooks on Noctua
      • Please note, that on Oculus you can use the Juypterspawner that automizes most of the steps below.
      • initial setup:
        • start julia and enter using Pkg, then Pkg.add("IJulia")
        • run module load tools/IPython/7.2.0-foss-2018b-Python-3.6.6
        • run jupyter-notebook --no-browser --generate-config
        • run ipython -c "from notebook.auth import passwd; passwd()", copy the result, i.e. a string that looks like sha1:255adaa93208:9c18f63e8fee61f297a417cdb6444af684a979e5
        • open ~/.jupyter/jupyter_notebook_config.py and:
          • change the line #c.NotebookApp.password = to c.NotebookApp.password = "sha1:...." with the copied string
          • change the line #c.NotebookApp.password_required = False to c.NotebookApp.password_required = True
      • Usage:
        • run module load tools/IPython/7.2.0-foss-2018b-Python-3.6.6
        • run jupyter-notebook --port=PORT --no-browser with a PORT number (1024<PORT<65536) of your choice. If no error shows up this notebook is now running on the machine that you are loggin into, i.e., one of the frontends. So please don't do long calculations there. If you would like to run a Jupyter notebook on a compute node, please contact pc2-support@uni-paderborn.de.
        • Forward the selected port of your local computer, i.e., ssh -L-magic.
        • Open a local web browser with the http://localhost:PORT then select New on the right side and select Julia 1.4.2. The resulting Jupyter notebook is a Julia notebook.

Container Frameworks

  • Singularity (module load singularity)

Tools

all tools

  • module use $PC2SW/CHEM_PHYS_SW/modules
  • then see module av

Tools from Intel Parallel Studio XE

All tools from Intel Parallel Studio XE are available on Noctua:

TexLive 2019 (LaTex)

  • module use $PC2SW/CHEM_PHYS_SW/modules
  • module load tools/texlive_2019

LIKWID

module use $PC2SW/CHEM_PHYS_SW/modules
module load tools/likwid_4.3.4
  • Please deactivate the Job-Specific-Monitoring by adding the option --collectors=off to sbatch, srun or your jobscript as
#SBATCH --collectors=off

if you want to use Likwid in your job. This is necessary because Likwid and the Job-Specific-Monitoring need access to hardware-performance counters and if two programs access them at the same time issues like incorrect values might occur.

 g++ -fopenmp -DLIKWID_PERFMON -L$LIKWID_LIB -I$LIKWID_INCLUDE program.cpp -o program.x -llikwid
  • You can execute it with likwid-perfctr for example with
 likwid-perfctr -C 0 -g BRANCH -m ./program.x

Programming Environments

On Noctua the following programming environments are available.

Intel Parallel Studio XE

Cray Programming Environment (Cray PE)

  • Cray Debugger (gdb4hpc)

GNU Tools

TotalView

  • TotalView is an advanced debugger for HPC applications. To use TotalView on Noctua
module load totalview/20.20.1
  • For the best performance and lowest overhead while debugging an MPI parallel application using TotalView on Noctua, we recommend using TotalView remote debugging with Intel MPI library. We have prepared a short tutorial for TotalView remote debugging.