OCuLUS-Software
A more general description of applications and software tools can be found on the Wiki-Page Software. In the following are details about some software available for OCuLUS. The availability of an application can be restricted by the terms of its license.
Inhaltsverzeichnis
- 1 Choosing a Software Package
- 2 ABINIT
- 3 BigDFT
- 4 BLENDER
- 5 CAFFE
- 6 CAPSCompilers
- 7 CHAPEL
- 8 Clang with OpenMP for GPU-offloading
- 9 CP2K
- 10 DALTON
- 11 ESPRESSO
- 12 FEniCS
- 13 FFTW
- 14 Gamess-US
- 15 Gaussian
- 16 GCC with GPU offloading
- 17 GHMM
- 18 GRACE
- 19 GROMACS
- 20 HOOMD
- 21 Intel Cluster Studio
- 22 Jupyter notebooks
- 23 LAMMPS
- 24 likwid
- 25 MAGMA
- 26 MATLAB
- 27 NAMD
- 28 netCDF
- 29 NWChem
- 30 OpenBLAS
- 31 OpenFOAM
- 32 OpenMPI
- 33 OpenSpeedShop
- 34 ORCA
- 35 ParaView
- 36 PETSc
- 37 PGI-Compiler
- 38 PLASMA
- 39 R-Project
- 40 SCALASCA
- 41 Scilab
- 42 SINGULARITY
- 43 USPEX
- 44 Turbomole
- 45 Valgrind
- 46 VASP
- 47 vTune
- 48 xTB
Choosing a Software Package
We use the TCL-based tool Modules to manage the different software packages.
The following table shows the most used commands.
For more information about Modules, please read the man page or refer to the Modules Project home page.
Command | Purpose |
---|---|
mdlsearch | search for modules containing the given string in its name |
module avail | list the available software packages |
module list | show the loaded modules |
module add | load a module. If now release given, the highest version is loaded normally. |
module del | unload a module |
module display | show what the module does |
ABINIT
Version 8.2.2 of ABINIT is installed. A sequential (abinit-seq), a MPI-parallel (abinit-mpi) and a GPU-Version (abinit-gpu) of the software is avaialble.
module add abinit
BigDFT
Version 1.7.7 is installed.
module add bigdft
BLENDER
Blender features a rendering engine called Cycles that offers stunning realistic rendering.
The built-in Cycles rendering engine offers:
- GPU & CPU rendering
- Realtime viewport preview
- HDR lighting support
- Permissive License for linking with external software
module add blender
blender Gearwheel.blend -o Gearwheel--threads 16 --render-format MPEG -x 1 --Background --render-anim
CAFFE
Version 1.0.0 RC3 is installed.
module add caffe/1.0.0
and a CPU-only version
module add caffe/1.0.0_cpu
CAPSCompilers
To compiler and execute an OpenACC program use a GPU node and do the following:
module add cuda capscompilers
CHAPEL
Chapel supports a multithreaded parallel programming model at a high level by supporting abstractions for data parallelism, task parallelism, and nested parallelism. It enables optimizations for the locality of data and computation in the program via abstractions for data distribution and data-driven placement of subcomputations. It allows for code reuse and generality through object-oriented concepts and generic programming features. For instance, Chapel allows for the declaration of locales.
module add chapel chpl test.chpl -L/cm/shared/apps/pc2/OpenMPI/gcc/1.8.4-mt/lib -lmpi Example batch file for ccsalloc to execute a chapel program: #! /bin/sh #Starts a.out on 3 processors #CCS -n 3 ### Prepare the Chapel hostfile export GASNET_NODEFILE=$CCS_NODEFILE nHosts=`cat $GASNET_NODEFILE | wc -l` ### Now start the chapel application a.out -nl $nHosts
Clang with OpenMP for GPU-offloading
There are two versions of Clang with OpenMP support for GPU-offloading available on OCuLUS. To use the latest Clang 11.0.0 please load the environment with
module load clang/11.0.0
To use Clang 10.0.1 please load the environment with
module load clang/10.0.1
The compiler option that enables the OpenMP support for GPU-offloading is -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda
.
Sample codes and detailed documentation can be found on https://github.com/pc2/OMP-Offloading
.
CP2K
The version 4.1, and 5.1 are installed.
Examples:
module add cp2k/4.1 ccsalloc -I -t 1h -n 4 ompi -- cp2k.popt -i MyInp.inp
DALTON
Dalton2016.2 is installed.
NOTE: Users have to accept the Dalton2016 license terms.
module add dalton
Example:
ccsalloc --res=rset=2:ncpus=16:mpiprocs=1:ompthreads=1,place=scatter:excl dalton -N 32 dft_stex_1 OCS_aug-cc-pCVTZ
ESPRESSO
mdlsearch espressoprints the available releases
Submit example script espresso.sh:
#! /bin/sh module purge module load DefaultModules module load chem/QuantumESPRESSO/6.6-foss-2019b ccsworker ompi -V mpi/OpenMPI/3.1.4-GCC-8.3.0 -- pw.x -input $* exit $?
Example call: ccsalloc -t 2h --res=rset=2:ncpus=16:mem=40g,place=:excl espresso.sh myinput
FEniCS
Description
This is a Singularity-Image containing FEniCS 2018.1. Ubuntu Xenial is running inside the image. The following packages have been installed after the FEniCS installation:
- PETSc
- SLEPc
- matplotlib
Interactive
If you want to use FEniCS interactively do this on the frontend:
module add fenics # Load fenics-module singularity shell $FENICS_IMG # Spawn a shell inside the image OR singularity exec $FENICS_IMG <command> # Execute command inside image
Submit via ccsalloc
An example submit script can be found under $PC2SW/examples/fenics.sh
cp $PC2SW/examples/fenics.sh . # Copy the script to your work-directory ccsalloc fenics.sh <command> # Read fenics.sh for usage example
The specified command and all its options will be executed in the image.
FFTW
Different versions of FTW3 library (single and double-precision) are installed.
mdlsearch fftw
Gamess-US
Access is restricted. Please apply for access.
The source code of Gamess-US can be downloaded at https://www.msg.chem.iastate.edu/gamess/download.html.
module use $PC2SW/CHEM_PHYS_SW/modules/ module add gamess-us
Refer to $PC2SW/examples/gamess-us.sh for a template job-script.
Gaussian
Access is restricted. Please apply for access.
Installed versions are G03 Revision B.01, G09 Revision B.01 and D.01, and G16.
module add g16
Refer to $PC2SW/examples for template job-scripts.
GCC with GPU offloading
To use GCC 9.2.0-offload with OpenMP and OpenACC support for offloading computation on GPU, load the environment with
module load gcc/9.2.0-offload
The compiler option is -fopenmp -foffload=nvptx-none
.
Sample codes and detailed documentation can be found on https://github.com/pc2/OMP-Offloading
GHMM
Use the ghmm module to activate the environment
module add ghmm
Python bindings are available for Python
$ module add python $ python Python 2.7.6 (default, Dec 6 2013, 18:06:23) [GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import ghmm ....
GRACE
Installed version is 5.5
module add grace xmgrace
GROMACS
Several versions are installed.
mdlsearch gromacs
An example job script can be found in $PC2SW/examples
HOOMD
HOOMD-blue is a general-purpose particle simulation toolkit. Version 1.1.1 for Intel Xeon E5 and nVidia GPU is installed. Versions 1.0.0, 1.0.1, and 1.0.5 are also available.
module add hoomd
NOTE-1:
Sometimes you will get this message:
An MPI process has executed an operation involving a call to the fork() system call to create a child process. Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your MPI job may hang, crash, or produce silent data corruption. The use of fork() (or system() or other calls that create child processes) is strongly discouraged.
The process that invoked fork was: ...
If you are *absolutely sure* that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0.
Here an example to disable the warning:
ccsalloc -t 5m --res=rset=1:ncpus=1:mem=4g:vmem=100g:gpus=1:tesla=t ompi --mca mpi_warn_on_fork 0 -- hoomd test.py
NOTE-2:
If hoomd is started on a node w/o GPU you will get this message:
*Warning*: (Rank ...): NVIDIA driver not installed or is too old, ignoring any GPUs in the system.
Please, ignore the warning, hoomd will run on the CPU.
Intel Cluster Studio
Several versions of Intel Parallel Studio Cluster Edition are installed.
mdlsearch ^intel/[12]
Example: Start an INTEL-MPI/OpenMP program on 4 chunks (2 MPI-processes per chunk) and 8 OMP threads per process. Refer to $PC2SW/examples for template job-scripts.
ccsalloc --res=rset=4:ncpus=16:mpiprocs=2:ompthreads=8 impi.sh ./programm
The Intel® MPI Library introduces thread safe libraries at level MPI_THREAD_MULTIPLE. Several threads can make the Intel MPI library calls simultaneously. Use the compiler driver -mt_mpi option to link the thread safe version of the Intel MPI Library.
Set the I_MPI_DEBUG environment variable to "4". The Intel MPI Library will report process pinning information.
NOTE:
- The Intel mpif90 uses gfortran as default. To enable the Intel-Fortran Compiler use the mpif90 switch -fc=ifort
- Intel Math Kernel Library Link Line Advisor: http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor
Jupyter notebooks
See Jupyter.
LAMMPS
LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator.
It has potentials for solid-state materials (metals, semiconductors) and soft matter (biomolecules, polymers) and coarse-grained or mesoscopic systems. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, meso, or continuum scale.
LAMMPS runs in parallel using message-passing techniques with a spatial-decomposition of the simulation domain. The code is designed to be modify or extend with new functionality.
module add lammps
Execute an example instance of $LAMMPS_EXAMPLEDIR:
ccsalloc -n 16 --stdin=in.crack ompi -- lmp_mpi
LAMMPS 15May2015 ist installed with the CUDA package.
lmp_mpi_cuda -c on
likwid
Toolsuite with command line applications for performance oriented programmers.
module load devel/likwid
The uncore counters might not be accessible. Please contact us at PC2-support if you need uncore counters.
MAGMA
MAGMA, Matrix Algebra on GPU and Multicore Architectures, version 1.7.0 is installed.
module add magma
MATLAB
We provide several releases. Please try mdlsearch matlab to see the available releases.
module add matlab loads the highest release.
Licenses for the Parallel Toolbox and Distributed Computing Server are available.
NOTE: Our DCE license is working only with Matlab 2016b or lower. Newer Matlab versions cannot use DCE.
If you want to use the Distributed Computing Server, you have to load the profile $PC2SW/examples/MATLAB/MDCE.settings.
It is necessary that you have specified your default group and your email address in the CCS rc-file $HOME/.ccsrc/uirc.
To create this file call $CCS/bin/ccsgenrcfile. Refer also to the related CCS FAQ
Examples for parallel/distributed and GPU usage can be found in $PC2SW/examples/MATLAB
If you are using MATLAB on a front-end node, please do not run heavy compute jobs locally. Please use "nice -n 10 matlab"
For more information, have a look at: http://www.mathworks.de/products/distriben/
NAMD
Installed versions are 2.11 and 2.10b2 (IB, SMP, CUDA, PHI).
InfiniBand and single-threaded NAMD processes
example (assuming a node has at least 16 cores):
ccsalloc -n 8 ./charm.sh File: charm.sh #!/bin/bash IFILE="apoa1.namd" OFILE="apoa1.log" PROG=${PC2SW}/NAMD/2.11/Linux-x86_64-ibverbs/namd2 CHARMRUN=${PC2SW}/NAMD/2.11/Linux-x86_64-ibverbs/charmrun PPE=16 #we assume 16 cores per node NODES=`cat $CCS_NODEFILE` NODELIST=namd2.nodelist NP=0 echo "group main ++shell ssh" > $NODELIST for NODE in $NODES do for (( c=1; c<=${PPE}; c++ )) do echo host $NODE >> $NODELIST NP=`expr $NP + 1` done done ${CHARMRUN} ++p ${NP} ++nodelist ${NODELIST} ${PROG} ${IFILE} > ${OFILE} tail -n 30 ${OFILE} exit 0
InfiniBand and multi-threaded NAMD processes
The Linux-x86_64-ibverbs-smp binaries are based on "smp" builds of Charm++ that can be used with multiple threads on either a single machine like a multicore build, or across a network. SMP builds combine multiple worker threads and an extra communication thread into a single process. Since one core per process is used for the communication thread SMP builds are typically slower than non-SMP builds. The advantage of SMP builds is that many data structures are shared among the threads, reducing the per-core memory footprint when scaling large simulations to large numbers of cores.
example (assuming a node has at least 16 cores):
ccsalloc -n 8 ./charm-smp.sh File: charm-smp.sh #!/bin/bash IFILE="apoa1.namd" OFILE="apoa1.log" PROG=${PC2SW}/NAMD/2.11/Linux-x86_64-ibverbs-smp/namd2 CHARMRUN=${PC2SW}/NAMD/2.11/Linux-x86_64-ibverbs-smp/charmrun # number of cores minus one, because of communcation thread PPN=15 NODES=`cat $CCS_NODEFILE` NODELIST=namd2.nodelist NP=0 echo "group main ++shell ssh" > $NODELIST for NODE in $NODES do echo host $NODE >> $NODELIST NP=`expr $NP + 1` done NP=`expr $NP \* $PPN` $CHARMRUN ++p ${NP} ++nodelist ${NODELIST} ${PROG} ++ppn ${PPN} ${IFILE} > ${OFILE} exit 0
InfiniBand, SMP, nVIDIA GPU
example (uses one core and the Tesla card of a GPU node):
module add namd/2.11/ib-smp-cuda ccsalloc --res=rset=1:tesla=t:gpus=1:ncpus=1:mem=4g:vmem=85g ./run_namd.sh 1 File: run_namd.sh #!/bin/bash # module add namd/... # call "run_namd <PPE>" HOME=`pwd`/input/apoa1 IFILE="apoa1.namd" OFILE=apoa1.log PROG=`which namd2` CHARMRUN=charmrun ENV_SCRIPT=`which namd_env_script.sh` PPE=$1 cd $HOME NODES=`cat $CCS_NODEFILE` NODELIST=namd2.nodelist NP=0 echo "group main ++shell ssh" > $NODELIST for NODE in $NODES do for (( c=1; c<=${PPE}; c++ )) do echo host $NODE >> $NODELIST NP=`expr $NP + 1` done done $CHARMRUN ++p ${NP} ++verbose ++nodelist ${NODELIST} ++runscript ${ENV_SCRIPT} ${PROG} +idlepoll ${IFILE} > ${OFILE} exit 0
Note: In this example, the NAMD process uses two cores.
netCDF
several versions are installed. Try mdlsearch netcdf
module add data/netcdf data/hdf5
NWChem
Access is restricted. Please apply for access. Installed versions are 6.0 and 6.3R2, both built with OpenMPI. Refer to $PC2SW/examples for an example run script.
OpenBLAS
Several version are installed. Try mdlsearch openblas.
OpenFOAM
Several version are installed. Try mdlsearch openfoam. An example jobscript can be found in $PC2SW/examples.
OpenMPI
Several versions of OpenMPI are installed (try: mdlsearch openmpi). For more information about the installation of the OpenMPI version execute ompi_info.
Example: Start an OpenMPI/OpenMP program on 4 chunks (2 MPI-processes per chunk) and 8 OMP threads per process.
ccsalloc --res=rset=4:ncpus=16:mpiprocs=2:ompthreads=8 ompi -- ./program
If you want to use the intel based version 1.10.2, try out the following:
module add openmpi/intel/1.10.2_mt ccsalloc -I --res=rset=2:ncpus=16:mpiprocs=1:ompthreads=16,place=scatter:excl ompi -V intel/1.10.2_mt --map-by node -- ./program
OpenSpeedShop
OpenSpeedShop version 2.0.2 and 2.1 are installed. User guides are located in $PC2SW/OpenSpeedShop
ORCA
Access is restricted. Please apply for access.
mdlsearch orca
The manual is located in directory $ORCA_PATH. A template jobscript is in $PC2SW/examples.
ParaView
Several ParaView version are installed.
mdlsearch paraview
PETSc
PETSc (Portable, Extensible Toolkit for Scientific Computation) version 3.6.1 and 3.8.3 are installed.
module add petsc
PGI-Compiler
PGI Compiler Suite (C, C++, Fortan) with OpenACC and support for OpenMPI, netcdf, and nVidia is installed.
To search for all available PGI modules use
mdlsearch pgi
Example
module load pgi/compiler/20.1 system/CUDA/10.2.89-GCC-8.3.0 pgcc -ta=tesla:cc35 -o example $PC2SW/examples/OpenACC/example.c ccsalloc -I -t5m --res=rset=1:ncpus=1:tesla=1 example
The target codes for the GPUs are:
GPU | Compiler Target |
---|---|
Tesla K20 | -ta=tesla:cc35 |
GTX1080 | -ta=tesla:cc60 |
RTX2080 | -ta=tesla:cc75 |
Try also pgaccelinfo on a GPU node
To get more information use
module help pgi/compiler/20.1
Documentation for PGI 20.1 is available at: http://www.pgroup.com/resources/docs.htm
PLASMA
Parallel Linear Algebra Software for Multicore Architectures (PLASMA) version 2.8.0 compiled with Intel Compiler is installed.
module add plasma
R-Project
Versions 3.0.0, 3.2.1, and 3.2.4 of the R-project are installed.
See also http://cran.r-project.org/web/views/HighPerformanceComputing.html
SCALASCA
Several Scalasca version are installed. A user guide is located in $PC2SW/SCALASCA.
mdlsearch scalasca
Scilab
Installed version of Scilab is 5.4.1.
module add scilab
SINGULARITY
Documentation is in $PC2SW/examples/SINGULARITY
module add singularity
USPEX
The USPEX license forbids distribution of the software. Thus, we are not allowed to install it for all users on the cluster. The following instruction show how to run USPEX 10.4 with VASP 5.4.4 on Oculus. The mode for submission of jobs is whichCluster=0, which assumes that USPEX is running in a job. If you need one of the other submission modes, i.e., whichCluster=1 or whichCluster=2 (for running multiple VASP-calculations in parallel) please let us know.
Compile VASP 5.4.4 with instructions at https://wikis.uni-paderborn.de/pc2doc/Oculus-Software-VASP. The path in which the VASP-binaries (vasp_std, ...) are located is denoted as VASPBINDIR in the following, i.e., vasp_std is at $VASPBINDIR/vasp_std.
Installing USPEX
- Download and unpack USPEX 10.4.
- run "bash install.sh"
- choose 2 for terminal installation
- Read and agree to the license conditions.
- Type some directory as the installation path, e.g. /scratch/hpc-prf-PROJECTNAME/USPEX, this directory shall be known as USPEXINSTALLDIR
- wait...
- Check if the file $USPEXINSTALLDIR/install/application/USPEX exists.
Testing Basic USPEX INSTALLATION
- to test please try:
module load toolchain/foss/2018b module load lang/Python/3.6.6-foss-2018b export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$USPEXINSTALLDIR/v91/bin/glnxa64/:$USPEXINSTALLDIR/v91/runtime/glnxa64/ $USPEXINSTALLDIR/application/USPEX
- You should get output like:
[rschade@fe1 EX01-3D_Si_vasp]$ $USPEXINSTALLDIR/application/USPEX /bin/bash: synclient: Kommando nicht gefunden. ********************************************************* * * _| _| _|_|_| _|_|_| _|_|_|_| _| _| _| _| _| _| _| _| _| _| _| _| _|_| _|_|_| _|_|_| _| _| _| _| _| _| _| _| _|_| _|_|_| _| _|_|_|_| _| _| * * ** USPEX v10.4 Oganov's Lab! ** ********************************************************* ...
- You can ignore the following errors.
module load toolchain/foss/2018b module load lang/Python/3.6.6-foss-2018b pip3 install --user numpy git clone https://github.com/spglib/spglib.git cd spglib/python python3 setup.py install --user
Testing for USPEX+VASP
- Copy $USPEXDIR/application/archive/examples/EX01-3D_Si_vasp.tgz to some directory and unpack it.
- chnage to the directory EX01-3D_Si_vasp
- change the lines between "% commandExecutable" and "% EndExecutable" to:
mpirun -genv I_MPI_DEBUG=4 -genvall -machinefile $CCS_NODEFILE vasp_std > log_VASP
- Change whichCluster from 1 to 0.
- Change numParallelCalcs from 30 to 1.
- Create job.sh as a job script with the content:
#!/bin/bash #CCS --stdout=uspex_test.out #CCS --stderr=uspex_test.err #CCS --name USPEX_TEST #CCS --res=rset=16:ncpus=1:mpiprocs=1:ompthreads=1 #CCS -t 24h module purge module add default-environment module load lang/Python/3.6.6-foss-2018b module load toolchain/foss/2018b module load intel/19.0.4_compilers VASPBINDIR=!!Write VASPBINDIR here!! USPEXINSTALLDIR=!!Write USPEXDIR here!! export PATH=$PATH:$USPEXINSTALLDIR/application/archive:$VASPBINDIR:. export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$USPEXINSTALLDIR/v91/bin/glnxa64/:$USPEXINSTALLDIR/v91/runtime/glnxa64/ export MCRROOT=$USPEXINSTALLDIR export USPEXPATH=$USPEXINSTALLDIR/application/archive/src export CCS_NODEFILE which python3 which USPEX which vasp_std USPEX -v rm still_running bash EX01-3D_Si_vasp.sh
- Make it executable: chmod +x job.sh
- Submit is as a job.
- You should see results in results1/OUTPUT.txt.
- For actual calculations you should adapt the line resource settings of your job.
Turbomole
Access is restricted. Please apply for access. Several versions are installed.
mdlsearch turbomole
A template job script can be found in $PC2SW/examples.
Valgrind
Several versions are installed.
mdlsearch valgrind
VASP
Please note, that due to the licensing model of VASP, we cannot provide a compiled version to every user.
Please apply for a license.
As for Noctua ( Guide to compile VASP on Noctua) we provide a guide to compile VASP for Oculus.
vTune
vTune is part of the installed Intel Parallel Studio versions. The sampling driver to enable the Hardware Event-Based Sampling (EBS) analysis is available on the frontend fe2 and some compute nodes (request OpenCCS resource vtune).
Example:
module add intel/19.0.1 ccsalloc --res=rset=1:ncpus=16:vtune=true <yourscript>
To start the GUI on the frontend:
module add intel/19.0.1 amplxe-gui
xTB
xTB is a semiempirical extended tigh-binding program package. It can be used by loading the environment module:
module load chem/xtb/6.2.3-foss-2019b
xTB uses OpenMP parallelization. For the calculation of large molecular systems the stack memory size may need to be increased. The following settings can be added in your job script to increase the stack size for large calculations.
ulimit -s unlimited export OMP_STACKSIZE=4G