- 1 System Specific Environment Variables
- 2 FAQ
- 2.1 Input/output error (5) on files in the directories $HOME, $PC2WORK, or $PC2GROUPS
- 2.2 OpenCCS did not write a job trace file
- 2.3 My parallel (MPI) job does not work anymore
- 2.4 Which node types are available?
- 2.5 'ccsinfo -n' shows 'only local jobs = true'. What does it mean?
- 2.6 Cannot get 16 cores on a gpu or phi node
- 3 HowTos
System Specific Environment Variables
They are set automatically and can be used in scripts. Refer also to OCuLUS file systems.
|HOME||Absolute path to your PC² home directory|
|PC2GROUPS||Absolute path to the PC² group directory|
|PC2SW||Absolute path to the OCULUS software directory|
|PC2SCRATCH||Absolute path to the OCULUS local scratch directory, provided as a parallel file system|
|PC2SYSNAME||The name of the system: "OCULUS"|
|PC2WORK||Absolute path to your PC² wide scratch directory|
Input/output error (5) on files in the directories $HOME, $PC2WORK, or $PC2GROUPS
This error is often caused by using vi to edit the file. It may happen that the NFS4 ACLs may become corrupt.
It does not matter if you edit the file on OCULUS or on a host which has mounted /upb/departments/pc2 and is not part of OCULUS.
As a workaround you may add the following line to your $HOME/.vimrc set backupcopy=yes
E.g., echo "set backupcopy=yes" >> $HOME/.vimrc
OpenCCS did not write a job trace file
It may happen that OpenCCS is temporarily not able to write the trace file to the specified directory. In such cases OpenCCS writes the file to $CCS/tmp/OCULUS/TRACES.
My parallel (MPI) job does not work anymore
This may be due to wrong Access Control List Entries (ACE) of your HOME and/or .ssh directory. Please send an Email to pc2-gurus(at)upb.de.
Which node types are available?
|normal||552||two Intel Xeon E5-2670||16||64GB||-|
|washington||20||two Intel Xeon E5-2670||16||256GB||-|
|tesla||32||two Intel Xeon E5-2670||16||64GB||nVIDIA K20 (Kepler)|
|phi||8||two Intel Xeon E5-2670||16||64GB||Intel Xeon Phi (5110P)|
|smp||4||four Intel Xeon E5-4670||32||1TB||-|
'ccsinfo -n' shows 'only local jobs = true'. What does it mean?
This node does only accept jobs which run completey on that node. Jobs using more than one node are not mapped to this node.
Cannot get 16 cores on a gpu or phi node
Nodes hosting a Tesla or a Xeon-Phi card keep one core free for jobs requesting a tesla / phi card. Hence, jobs not requesting a tesla/phi card, will only get 15 cores at maximum on that nodes.
How to (un)select specific node types?
Normally, you won't care about this question, because you just request cores, memory, accelerators, or licences and CCS cares about the mapping. However, for benchmarking purposes it may be useful to (un)select specific node types. For this purpose, we provide resources of type Boolean to (un)select node types. ccsinfo -a shows the available resources:
Name Type, Amount Default Purpose Flags Used/Online/Max ============================================================= ncpus U,C 4920/8704/8704 1 number of cores nodes U,C 0/525/617 1 number of exclusively used nodes mem S,C 18.92t/40.18t/40.18t 3.94g physical memory vmem S,C 23.50t/50.51t/50.51t 4.89g virtual memory cput T, - N/A CPU time walltime T,J - N/A walltime hostname A, - N/A hostname arch A, - N/A host architecture mpiprocs U, - N/A number of mpi processes per chunk ompthreads U, - N/A number of threads per chunk mdce U,CJ 0/256/256 N/A Matlab Distributed Computing Environment licenses norm B, - N/A 64GB compute node phi U,C 0/0/8 N/A Intel Xeon Phi card rack U, - N/A rack number smp B, - N/A SMP node tesla U,C 0/32/32 N/A Tesla K20xm card wash B, - N/A washington node
For example. If you want to run a job only on the washington nodes set wash=true
requests 2 chunks each with 5 cores and wash==true.
To exclude washington and smp nodes use:
How to allocate a Tesla accelerator
For Tesla K20, we provide the consumable resource tesla. This avoids that more than one job will be scheduled to a card at the same time.
Hence, to request 2 chunks each with 8 cpus and one Tesla card use:
For offload jobs CCS sets the environment variable:
For jobs mapped on a GPU node but not requesting the Tesla accelerator CCS sets the environment variable:
which is an invalid value.
Typically sufficient vmem has to be allocated.
How to allocate an Intel Xeon-Phi accelerator for offloading executables
For Intel Xeon Phi, we provide the consumable resource phi. This avoids that more than one job will be scheduled to a card at the same time.
To run an offload job using 8 cores on the phi host and the MIC card use:
For offload jobs CCS sets the environment variables:
MKL_MIC_ENABLE=1 and OFFLOAD_DEVICES=0
For jobs mapped on a node with a Xeon Phi card but not requesting the Xeon Phi CCS sets the environment variable:
which is an invalid value.
How to allocate an Intel Xeon-Phi accelerator for "native" executables
To run a native job using a MIC card use:
This will request one MIC in native mode.
To run a job on "normal" nodes and a MIC card use:
This will request one MIC in native mode and 4 chunks on the "normal" nodes.
OpenCCS assigns the boot host to one of the hosts which satisfy the last specified chunk. In this example, the boot host will be one of (4:ncpus=8:mem=32g).
For detailed information refer to the Xeon-Phi pages or $CCS/examples/MIC.