OCuLUS-Software

Aus PC2 Doc
Wechseln zu: Navigation, Suche

A more general description of applications and software tools can be found on the Wiki-Page Software. In the following are details about some software available for OCuLUS. The availability of an application can be restricted by the terms of its license.

Choosing a Software Package

We use the TCL-based tool Modules to manage the different software packages.

The following table shows the most used commands.

For more information about Modules, please read the man page or refer to the Modules Project home page.

Command Purpose
mdlsearch search for modules containing the given string in its name
module avail list the available software packages
module list show the loaded modules
module add load a module. If now release given, the highest version is loaded normally.
module del unload a module
module display show what the module does

ABINIT

Version 8.2.2 of ABINIT is installed. A sequential (abinit-seq), a MPI-parallel (abinit-mpi) and a GPU-Version (abinit-gpu) of the software is avaialble.

module add abinit

BigDFT

Version 1.7.7 is installed.

module add bigdft

BLENDER

Blender features a rendering engine called Cycles that offers stunning realistic rendering.

The built-in Cycles rendering engine offers:

  • GPU & CPU rendering
  • Realtime viewport preview
  • HDR lighting support
  • Permissive License for linking with external software
module add blender
blender Gearwheel.blend -o Gearwheel--threads 16 --render-format MPEG -x 1 --Background --render-anim

CAFFE

Version 1.0.0 RC3 is installed.

module add caffe/1.0.0

and a CPU-only version

module add caffe/1.0.0_cpu

CAPSCompilers

To compiler and execute an OpenACC program use a GPU node and do the following:

module add cuda capscompilers

CHAPEL

Chapel supports a multithreaded parallel programming model at a high level by supporting abstractions for data parallelism, task parallelism, and nested parallelism. It enables optimizations for the locality of data and computation in the program via abstractions for data distribution and data-driven placement of subcomputations. It allows for code reuse and generality through object-oriented concepts and generic programming features. For instance, Chapel allows for the declaration of locales.

module add chapel

chpl test.chpl -L/cm/shared/apps/pc2/OpenMPI/gcc/1.8.4-mt/lib -lmpi


Example batch file for ccsalloc to execute a chapel program:
#! /bin/sh
#Starts a.out on 3 processors

#CCS -n 3
### Prepare the Chapel hostfile
export GASNET_NODEFILE=$CCS_NODEFILE
nHosts=`cat $GASNET_NODEFILE | wc -l`

### Now start the chapel application
a.out -nl $nHosts

Clang with OpenMP for GPU-offloading

Clang with OpenMP support for GPU-offloading is available on OCuLUS. To use the latest Clang 11.1.0 please load the environment with

module load compiler/clang

The compiler option that enables the OpenMP support for GPU-offloading is -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda.

Sample codes and detailed documentation can be found at https://github.com/pc2/OMP-Offloading.

CP2K

The version 4.1, and 5.1 are installed.

Examples:

module add cp2k/4.1
ccsalloc -I -t 1h -n 4 ompi -- cp2k.popt -i MyInp.inp

Critic2

Critic2 is installed. Critic2 is a program for the analysis of quantum mechanical calculation results in molecules and periodic solids.

It can be used by loading the environment module:

module load chem/critic2

DALTON

Dalton2016.2 is installed.

NOTE: Users have to accept the Dalton2016 license terms.

module add dalton

Example:

ccsalloc --res=rset=2:ncpus=16:mpiprocs=1:ompthreads=1,place=scatter:excl dalton -N 32 dft_stex_1 OCS_aug-cc-pCVTZ

ESPRESSO

 mdlsearch espresso 
prints the available releases

Submit example script espresso.sh:

#! /bin/sh

module purge
module load DefaultModules 
module load chem/QuantumESPRESSO/6.6-foss-2019b
ccsworker ompi -V mpi/OpenMPI/3.1.4-GCC-8.3.0 -- pw.x -input $*
exit $?

Example call: ccsalloc -t 2h --res=rset=2:ncpus=16:mem=40g,place=:excl espresso.sh myinput

FEniCS

Description

This is a Singularity-Image containing FEniCS 2018.1. Ubuntu Xenial is running inside the image. The following packages have been installed after the FEniCS installation:

  • PETSc
  • SLEPc
  • matplotlib

Interactive

If you want to use FEniCS interactively do this on the frontend:


module add fenics # Load fenics-module
singularity shell $FENICS_IMG # Spawn a shell inside the image
OR
singularity exec $FENICS_IMG <command> # Execute command inside image


Submit via ccsalloc

An example submit script can be found under $PC2SW/examples/fenics.sh

cp $PC2SW/examples/fenics.sh . # Copy the script to your work-directory
ccsalloc fenics.sh <command>   # Read fenics.sh for usage example

The specified command and all its options will be executed in the image.

FFTW

Different versions of FTW3 library (single and double-precision) are installed.

mdlsearch fftw

Gamess-US

Access is restricted. Please apply for access.

The source code of Gamess-US can be downloaded at https://www.msg.chem.iastate.edu/gamess/download.html.

module use $PC2SW/CHEM_PHYS_SW/modules/
module add gamess-us

Refer to $PC2SW/examples/gamess-us.sh for a template job-script.

Gaussian

Access is restricted. Please apply for access.

Installed versions are G03-B.01, G09-B.01, G09-D.01, G16-B.01, and G16-C.01

module add g16

loads G16 Rev. C.01

Refer to $PC2SW/examples for template job-scripts.

Here is useful information for running Gaussian computation with the checkpoint file.

GCC with GPU offloading

To use GCC 9.2.0-offload with OpenMP and OpenACC support for offloading computation on GPU, load the environment with

module load gcc/9.2.0-offload

The compiler option is -fopenmp -foffload=nvptx-none.

Sample codes and detailed documentation can be found on https://github.com/pc2/OMP-Offloading

GHMM

Use the ghmm module to activate the environment

module add ghmm

Python bindings are available for Python

$ module add python
$ python
Python 2.7.6 (default, Dec  6 2013, 18:06:23) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>  import ghmm
....

GRACE

Installed version is 5.5

module add grace
xmgrace

GROMACS

Several versions are installed.

mdlsearch gromacs

An example job script can be found in $PC2SW/examples

HOOMD

HOOMD-blue is a general-purpose particle simulation toolkit. Version 1.1.1 for Intel Xeon E5 and nVidia GPU is installed. Versions 1.0.0, 1.0.1, and 1.0.5 are also available.

module add hoomd


NOTE-1:

Sometimes you will get this message:

An MPI process has executed an operation involving a call to the fork() system call to create a child process. Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your MPI job may hang, crash, or produce silent data corruption. The use of fork() (or system() or other calls that create child processes) is strongly discouraged.

The process that invoked fork was: ...

If you are *absolutely sure* that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0.

Here an example to disable the warning:

ccsalloc -t 5m --res=rset=1:ncpus=1:mem=4g:vmem=100g:gpus=1:tesla=t ompi --mca mpi_warn_on_fork 0 -- hoomd test.py

NOTE-2:

If hoomd is started on a node w/o GPU you will get this message:

*Warning*: (Rank ...): NVIDIA driver not installed or is too old, ignoring any GPUs in the system.

Please, ignore the warning, hoomd will run on the CPU.

Intel Cluster Studio

Several versions of Intel Parallel Studio Cluster Edition are installed.

mdlsearch ^intel/[12]

Example: Start an INTEL-MPI/OpenMP program on 4 chunks (2 MPI-processes per chunk) and 8 OMP threads per process. Refer to $PC2SW/examples for template job-scripts.

ccsalloc --res=rset=4:ncpus=16:mpiprocs=2:ompthreads=8 impi.sh ./programm

The Intel® MPI Library introduces thread safe libraries at level MPI_THREAD_MULTIPLE. Several threads can make the Intel MPI library calls simultaneously. Use the compiler driver -mt_mpi option to link the thread safe version of the Intel MPI Library.

Set the I_MPI_DEBUG environment variable to "4". The Intel MPI Library will report process pinning information.

NOTE:

Jupyter notebooks

See Jupyter.

LAMMPS

LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator.

It has potentials for solid-state materials (metals, semiconductors) and soft matter (biomolecules, polymers) and coarse-grained or mesoscopic systems. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, meso, or continuum scale.

LAMMPS runs in parallel using message-passing techniques with a spatial-decomposition of the simulation domain. The code is designed to be modify or extend with new functionality.

module add lammps

Execute an example instance of $LAMMPS_EXAMPLEDIR:

ccsalloc -n 16 --stdin=in.crack  ompi -- lmp_mpi

LAMMPS 15May2015 ist installed with the CUDA package.

lmp_mpi_cuda -c on

likwid

Toolsuite with command line applications for performance oriented programmers.

module load devel/likwid

The uncore counters might not be accessible. Please contact us at PC2-support if you need uncore counters.

MAGMA

MAGMA, Matrix Algebra on GPU and Multicore Architectures, version 1.7.0 is installed.

module add magma

MATLAB

Only available for members of the Paderborn University.

We provide several releases. Please try mdlsearch matlab to see the available releases.

module add matlab loads the highest release.

Licenses for the Parallel Toolbox and Distributed Computing Server are available.

NOTE: Our DCE license is working only with Matlab 2016b or lower. Newer Matlab versions cannot use DCE.

If you want to use the Distributed Computing Server, you have to load the profile $PC2SW/examples/MATLAB/MDCE.settings.

It is necessary that you have specified your default group and your email address in the CCS rc-file $HOME/.ccsrc/uirc.

To create this file call $CCS/bin/ccsgenrcfile. Refer also to the related CCS FAQ


Examples for parallel/distributed and GPU usage can be found in $PC2SW/examples/MATLAB

If you are using MATLAB on a front-end node, please do not run heavy compute jobs locally. Please use "nice -n 10 matlab"

For more information, have a look at: http://www.mathworks.de/products/distriben/

NAMD

Installed versions are 2.11 and 2.10b2 (IB, SMP, CUDA, PHI).

InfiniBand and single-threaded NAMD processes

example (assuming a node has at least 16 cores):

ccsalloc -n 8 ./charm.sh

File: charm.sh
#!/bin/bash

IFILE="apoa1.namd"
OFILE="apoa1.log"
PROG=${PC2SW}/NAMD/2.11/Linux-x86_64-ibverbs/namd2
CHARMRUN=${PC2SW}/NAMD/2.11/Linux-x86_64-ibverbs/charmrun

PPE=16  #we assume 16 cores per node
NODES=`cat $CCS_NODEFILE`
NODELIST=namd2.nodelist
NP=0
  echo "group main ++shell ssh" > $NODELIST
  for NODE in $NODES
   do
    for (( c=1; c<=${PPE}; c++ ))
     do
      echo host $NODE >> $NODELIST
      NP=`expr $NP + 1`
     done
   done

${CHARMRUN} ++p ${NP} ++nodelist ${NODELIST} ${PROG} ${IFILE} > ${OFILE}

tail -n 30 ${OFILE}

exit 0

InfiniBand and multi-threaded NAMD processes

The Linux-x86_64-ibverbs-smp binaries are based on "smp" builds of Charm++ that can be used with multiple threads on either a single machine like a multicore build, or across a network. SMP builds combine multiple worker threads and an extra communication thread into a single process. Since one core per process is used for the communication thread SMP builds are typically slower than non-SMP builds. The advantage of SMP builds is that many data structures are shared among the threads, reducing the per-core memory footprint when scaling large simulations to large numbers of cores.

example (assuming a node has at least 16 cores):

ccsalloc -n 8 ./charm-smp.sh

File: charm-smp.sh
#!/bin/bash

IFILE="apoa1.namd"
OFILE="apoa1.log"
PROG=${PC2SW}/NAMD/2.11/Linux-x86_64-ibverbs-smp/namd2
CHARMRUN=${PC2SW}/NAMD/2.11/Linux-x86_64-ibverbs-smp/charmrun

#  number of cores minus one, because of communcation thread
PPN=15
NODES=`cat $CCS_NODEFILE`
NODELIST=namd2.nodelist
NP=0
  echo "group main ++shell ssh" > $NODELIST
  for NODE in $NODES
   do
      echo host $NODE >> $NODELIST
      NP=`expr $NP + 1`
   done
      NP=`expr $NP \* $PPN`

$CHARMRUN ++p ${NP} ++nodelist ${NODELIST} ${PROG} ++ppn ${PPN} ${IFILE} > ${OFILE}

exit 0

InfiniBand, SMP, nVIDIA GPU

example (uses one core and the Tesla card of a GPU node):

module add namd/2.11/ib-smp-cuda
ccsalloc --res=rset=1:tesla=t:gpus=1:ncpus=1:mem=4g:vmem=85g ./run_namd.sh 1

File: run_namd.sh
#!/bin/bash

# module add namd/...
# call "run_namd <PPE>"

HOME=`pwd`/input/apoa1
IFILE="apoa1.namd"
OFILE=apoa1.log
PROG=`which namd2`
CHARMRUN=charmrun
ENV_SCRIPT=`which namd_env_script.sh`

PPE=$1
cd $HOME
NODES=`cat $CCS_NODEFILE`
NODELIST=namd2.nodelist
NP=0
  echo "group main ++shell ssh" > $NODELIST
  for NODE in $NODES
   do
    for (( c=1; c<=${PPE}; c++ ))
     do
      echo host $NODE >> $NODELIST
      NP=`expr $NP + 1`
     done
   done
$CHARMRUN ++p ${NP} ++verbose ++nodelist ${NODELIST} ++runscript ${ENV_SCRIPT} ${PROG} +idlepoll ${IFILE} > ${OFILE}
exit 0

Note: In this example, the NAMD process uses two cores.

netCDF

several versions are installed. Try mdlsearch netcdf

 module add data/netcdf data/hdf5

NWChem

Access is restricted. Please apply for access. Installed versions are 6.0 and 6.3R2, both built with OpenMPI. Refer to $PC2SW/examples for an example run script.

OpenBLAS

Several version are installed. Try mdlsearch openblas.

OpenFOAM

Several version are installed. Try mdlsearch openfoam. An example jobscript can be found in $PC2SW/examples.

OpenMPI

Several versions of OpenMPI are installed (try: mdlsearch openmpi). For more information about the installation of the OpenMPI version execute ompi_info.

Example: Start an OpenMPI/OpenMP program on 4 chunks (2 MPI-processes per chunk) and 8 OMP threads per process.

ccsalloc --res=rset=4:ncpus=16:mpiprocs=2:ompthreads=8 ompi -- ./program

If you want to use the intel based version 1.10.2, try out the following:

module add openmpi/intel/1.10.2_mt
ccsalloc -I --res=rset=2:ncpus=16:mpiprocs=1:ompthreads=16,place=scatter:excl ompi -V intel/1.10.2_mt --map-by node -- ./program

OpenSpeedShop

OpenSpeedShop version 2.0.2 and 2.1 are installed. User guides are located in $PC2SW/OpenSpeedShop

ORCA

Access is restricted. Please apply for access.

mdlsearch orca

The manual is located in directory $ORCA_PATH. A template jobscript is in $PC2SW/examples.

ParaView

Several ParaView version are installed.

mdlsearch paraview

PETSc

PETSc (Portable, Extensible Toolkit for Scientific Computation) version 3.6.1 and 3.8.3 are installed.

module add petsc

PGI-Compiler

PGI Compiler Suite (C, C++, Fortan) with OpenACC and support for OpenMPI, netcdf, and nVidia is installed.

To search for all available PGI modules use

 mdlsearch pgi

Example

 module load pgi/compiler/20.1 system/CUDA/10.2.89-GCC-8.3.0
 pgcc -ta=tesla:cc35 -o example $PC2SW/examples/OpenACC/example.c
 ccsalloc -I -t5m --res=rset=1:ncpus=1:tesla=1 example

The target codes for the GPUs are:

GPU Compiler Target
Tesla K20 -ta=tesla:cc35
GTX1080 -ta=tesla:cc60
RTX2080 -ta=tesla:cc75

Try also pgaccelinfo on a GPU node

To get more information use

module help pgi/compiler/20.1

Documentation for PGI 20.1 is available at: http://www.pgroup.com/resources/docs.htm

PLASMA

Parallel Linear Algebra Software for Multicore Architectures (PLASMA) version 2.8.0 compiled with Intel Compiler is installed.

module add plasma

R-Project

Versions 3.0.0, 3.2.1, and 3.2.4 of the R-project are installed.

See also http://cran.r-project.org/web/views/HighPerformanceComputing.html

SCALASCA

Several Scalasca version are installed. A user guide is located in $PC2SW/SCALASCA.

mdlsearch scalasca

Scilab

Installed version of Scilab is 5.4.1.

module add scilab

SINGULARITY

Documentation is in $PC2SW/examples/SINGULARITY

module add singularity

USPEX

The USPEX license forbids distribution of the software. Thus, we are not allowed to install it for all users on the cluster. The following instruction show how to run USPEX 10.4 with VASP 5.4.4 on Oculus. The mode for submission of jobs is whichCluster=0, which assumes that USPEX is running in a job. If you need one of the other submission modes, i.e., whichCluster=1 or whichCluster=2 (for running multiple VASP-calculations in parallel) please let us know.

Compile VASP 5.4.4 with instructions at https://wikis.uni-paderborn.de/pc2doc/Oculus-Software-VASP. The path in which the VASP-binaries (vasp_std, ...) are located is denoted as VASPBINDIR in the following, i.e., vasp_std is at $VASPBINDIR/vasp_std.

Installing USPEX

    • Download and unpack USPEX 10.4.
    • run "bash install.sh"
    • choose 2 for terminal installation
    • Read and agree to the license conditions.
    • Type some directory as the installation path, e.g. /scratch/hpc-prf-PROJECTNAME/USPEX, this directory shall be known as USPEXINSTALLDIR
    • wait...
    • Check if the file $USPEXINSTALLDIR/install/application/USPEX exists.

Testing Basic USPEX INSTALLATION

  • to test please try:
module load toolchain/foss/2018b
module load lang/Python/3.6.6-foss-2018b
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$USPEXINSTALLDIR/v91/bin/glnxa64/:$USPEXINSTALLDIR/v91/runtime/glnxa64/
$USPEXINSTALLDIR/application/USPEX
  • You should get output like:
[rschade@fe1 EX01-3D_Si_vasp]$ $USPEXINSTALLDIR/application/USPEX 
/bin/bash: synclient: Kommando nicht gefunden.
*********************************************************
*                                                       *
  _|    _|     _|_|_|   _|_|_|     _|_|_|_|   _|      _| 
  _|    _|   _|         _|    _|   _|           _|  _|   
  _|    _|     _|_|     _|_|_|     _|_|_|         _|     
  _|    _|         _|   _|         _|           _|  _|   
    _|_|     _|_|_|     _|         _|_|_|_|   _|      _| 
*                                                       *
** USPEX v10.4                           Oganov's Lab! **
*********************************************************
...
  • You can ignore the following errors.

Installing Python-related Stuff

module load toolchain/foss/2018b
module load lang/Python/3.6.6-foss-2018b
pip3 install --user numpy
git clone https://github.com/spglib/spglib.git
cd spglib/python 
python3 setup.py install --user

Testing for USPEX+VASP

  • Copy $USPEXDIR/application/archive/examples/EX01-3D_Si_vasp.tgz to some directory and unpack it.
  • chnage to the directory EX01-3D_Si_vasp
  • change the lines between "% commandExecutable" and "% EndExecutable" to:
mpirun -genv I_MPI_DEBUG=4 -genvall -machinefile $CCS_NODEFILE vasp_std > log_VASP
  • Change whichCluster from 1 to 0.
  • Change numParallelCalcs from 30 to 1.
  • Create job.sh as a job script with the content:
#!/bin/bash
#CCS --stdout=uspex_test.out
#CCS --stderr=uspex_test.err
#CCS --name USPEX_TEST
#CCS --res=rset=16:ncpus=1:mpiprocs=1:ompthreads=1 
#CCS -t 24h

module purge
module add default-environment
module load lang/Python/3.6.6-foss-2018b
module load toolchain/foss/2018b
module load intel/19.0.4_compilers

VASPBINDIR=!!Write VASPBINDIR here!!
USPEXINSTALLDIR=!!Write USPEXDIR here!!

export PATH=$PATH:$USPEXINSTALLDIR/application/archive:$VASPBINDIR:.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$USPEXINSTALLDIR/v91/bin/glnxa64/:$USPEXINSTALLDIR/v91/runtime/glnxa64/

export MCRROOT=$USPEXINSTALLDIR
export USPEXPATH=$USPEXINSTALLDIR/application/archive/src
export CCS_NODEFILE
which python3
which USPEX
which vasp_std
USPEX -v
rm still_running
bash EX01-3D_Si_vasp.sh
  • Make it executable: chmod +x job.sh
  • Submit is as a job.
  • You should see results in results1/OUTPUT.txt.
  • For actual calculations you should adapt the line resource settings of your job.

Turbomole

Access is restricted. Please apply for access. Several versions are installed.

mdlsearch turbomole

A template job script can be found in $PC2SW/examples.

Valgrind

Several versions are installed.

mdlsearch valgrind

VASP

Please note, that due to the licensing model of VASP, we cannot provide a compiled version to every user.

Please apply for a license.

As for Noctua ( Guide to compile VASP on Noctua) we provide a guide to compile VASP for Oculus.

vTune

vTune is part of the installed Intel Parallel Studio versions. The sampling driver to enable the Hardware Event-Based Sampling (EBS) analysis is available on the frontend fe2 and some compute nodes (request OpenCCS resource vtune).

Example:

module add intel/19.0.1
ccsalloc --res=rset=1:ncpus=16:vtune=true <yourscript>

To start the GUI on the frontend:

module add intel/19.0.1
amplxe-gui

xTB

xTB is a semiempirical extended tigh-binding program package. It can be used by loading the environment module:

module load chem/xtb/6.2.3-foss-2019b

xTB uses OpenMP parallelization. For the calculation of large molecular systems the stack memory size may need to be increased. The following settings can be added in your job script to increase the stack size for large calculations.

ulimit -s unlimited
export OMP_STACKSIZE=4G