OCuLUS

Aus PC2 Doc
Wechseln zu: Navigation, Suche


Login-Frontend:fe.pc2.upb.de

Brief Instructions

Status and Upcoming-Events

Architecure

Workload-Manager

File Systems

Available Software


Tuning Hints

System Specific Environment Variables

They are set automatically and can be used in scripts. Refer also to OCuLUS file systems.

Environment
Variable
Purpose
HOME Absolute path to your PC² home directory
PC2GROUPS Absolute path to the PC² group directory
PC2SW Absolute path to the OCULUS software directory
PC2SCRATCH Absolute path to the OCULUS local scratch directory, provided as a parallel file system
PC2SYSNAME The name of the system: "OCULUS"
PC2WORK Absolute path to your PC² wide scratch directory

FAQ

No host found which provides the requested resources:Perhaps a collision with a node freepool

A node freepool limits the access to a node. The constraints are part of the node's properties.

One can inspect these propertiers by calling ccsinfo -n --state=%H%p%m.

More details are in the OpenCCS User-Manual Appendix K

OpenCCS did not write a job trace file

It may happen that OpenCCS is temporarily not able to write the trace file to the specified directory. In such cases OpenCCS writes the file to $CCS/tmp/OCULUS/TRACES.

My parallel (MPI) job does not work anymore

This may be due to wrong Access Control List Entries (ACE) of your HOME and/or .ssh directory. Please send an Email to pc2-support(at)upb.de.

Which node types are available?

Type Nodes CPU-Type Cores Memory Accelerator
normal 552 two Intel Xeon E5-2670 16 64GB -
washington 20 two Intel Xeon E5-2670 16 256GB -
tesla 32 two Intel Xeon E5-2670 16 64GB 1 nVIDIA K20 (Kepler)
gtx1080 8 two Intel Xeon E5-2670 16 64GB 2 nVIDIA GeForce GTX-1080 Ti)
smp 4 four Intel Xeon E5-4670 32 1TB -

'ccsinfo -n' shows 'only local jobs = true'. What does it mean?

This node does only accept jobs which run completey on that node. Jobs using more than one node are not mapped to this node.

Cannot get 16 cores on a gpu node

Nodes hosting GPU cards keep one core per GPU free for jobs requesting GPUs. Hence, jobs not requesting a GPU card, will only get 14 or 15 cores at maximum on that nodes.

HowTos

How to (un)select specific node types?

Normally, you won't care about this question, because you just request cores, memory, accelerators, or licences and CCS cares about the mapping. However, for benchmarking purposes it may be useful to (un)select specific node types. For this purpose, we provide resources of type Boolean to (un)select node types. ccsinfo -a shows the available resources:

Name            Type, Amount               Default    Purpose
                Flags Used/Online/Max
=============================================================
ncpus           U,C   4920/8704/8704       1          number of cores
nodes           U,C   0/525/617            1          number of exclusively used nodes
mem             S,C   18.92t/40.18t/40.18t 3.94g      physical memory
vmem            S,C   23.50t/50.51t/50.51t 4.89g      virtual memory
cput            T,    -                    N/A        CPU time
walltime        T,J   -                    N/A        walltime
hostname        A,    -                    N/A        hostname
arch            A,    -                    N/A        host architecture
mpiprocs        U,    -                    N/A        number of mpi processes per chunk
ompthreads      U,    -                    N/A        number of threads per chunk
mdce            U,CJ  0/256/256            N/A        Matlab Distributed Computing Environment licenses
norm            B,    -                    N/A        64GB compute node
gtx1080         U,C   0/8/16               2          nVIDIA GeForce GTX1080 Ti (11GB RAM)
rack            U,    -                    N/A        rack number
smp             B,    -                    N/A        SMP node
tesla           U,C   0/32/32              1          Tesla K20xm card
wash            B,    -                    N/A        washington node

For example. If you want to run a job only on the washington nodes set wash=true

 --res=rset=2:ncpus=5:wash=t   

requests 2 chunks each with 5 cores and wash==true.

To exclude washington and smp nodes use:

 --res=rset=2:ncpus=5:wash=f:smp=f

How to allocate a Tesla accelerator

For Tesla K20, we provide the consumable resource tesla. This avoids that more than one job will be scheduled to a card at the same time.

Hence, to request 2 chunks each with 8 cpus and one Tesla card use:

 --res=rset=2:ncpus=8:tesla=1

For offload jobs CCS sets the environment variable:

 CUDA_VISIBLE_DEVICES=0

For jobs mapped on a GPU node but not requesting the Tesla accelerator CCS sets the environment variable:

 CUDA_VISIBLE_DEVICES=1024

which is an invalid value.

Typically sufficient vmem has to be allocated.

E.g.  --res=rset=1:ncpus=1:tesla=1:vmem=85g:mem=4g

How to allocate a GTX1080 accelerator

For GTX1080, we provide the consumable resource gtx1080. This avoids that more than one job will be scheduled to a card at the same time.

Hence, to request 2 chunks each with 8 cpus and one GTX1080 card use:

 --res=rset=2:ncpus=8:gtx1080=1

For offload jobs CCS sets the environment variable:

 CUDA_VISIBLE_DEVICES=0

For jobs mapped on a GPU node but not requesting the Tesla accelerator CCS sets the environment variable:

 CUDA_VISIBLE_DEVICES=1024

which is an invalid value.

Typically sufficient vmem has to be allocated.

E.g.  --res=rset=1:ncpus=1:gtx1080=1:vmem=85g:mem=4g