- 1 System Specific Environment Variables
- 2 FAQ
- 2.1 How to get information about current resource usage
- 2.2 How to convert job scripts from another WLM (SLURM, PBS, Torque, ...)
- 2.3 How to run job-chains (one job after another)
- 2.4 How to search for module files
- 2.5 ccsinfo reqID does not find the data
- 2.6 No host found which provides the requested resources:Perhaps a collision with a node freepool
- 2.7 OpenCCS did not write a job trace file
- 2.8 Which node types are available?
- 2.9 'ccsinfo -n' shows 'only local jobs = true'. What does it mean?
- 2.10 Cannot get 16 cores on a gpu node
- 3 HowTos
System Specific Environment Variables
They are set automatically and can be used in scripts. Refer also to OCuLUS file systems.
|HOME||Absolute path to your PC² wide home directory|
|PC2DATA|| Absolute path to the PC² wide group data directories. |
Mounted read-only on the compute nodes and read-write on the frontends.
|PC2PFS||Absolute path to the group directories on the OCULUS parallel file system BeeGFS|
|PC2SCRATCH|| Absolute path to PC² wide group scratch directories.|
Mounted read-write on the compute nodes and the frontends.
|PC2SW||Absolute path to the OCULUS software directory|
|PC2SYSNAME||The name of the system: "OCULUS"|
How to get information about current resource usage
- you may use the command pc2status, which gives you also this information
- ccsinfo -u gives you something like this
Allocated Resources of Group: hpc-prf-hell Resource Limit Allocated (% of Limit) ==================================================================== ncpus 2048 228 ( 11.13%) mem N/A 885.13g ( N/A) vmem N/A 1.07t ( N/A) tesla 1 1 (100.00%)
How to convert job scripts from another WLM (SLURM, PBS, Torque, ...)
How to run job-chains (one job after another)
Job chains are useful if one job depends on the output of another job.
Example: We have 3 jobs: job1.sh, job2.sh, and job3.sh
Job3 depends on job2 and job2 depends on job1.
So, we have to start them one after another.
Because CCS does not support job-chains directly, we provide the script $PC2SW/examples/submitCCSJobchain.sh.
Calling it without a parameter prints a help text.
How to search for module files
We provide the command mdlsearch to search for module files. For example to search for all fftw module files: try mdlsearch fftw
ccsinfo reqID does not find the data
OpenCCS holds the data of completed jobs for 30 minutes in its memory. After that time the data is removed and ccsinfo prints this message:
[kel@fe2]$ ccsinfo 4229599 ERROR:Inquiry denied for request (4229599):Unknown request 4229599
The command ccstracejob allows to print log and accounting data of such jobs. Refer to the man page of ccstracejob for detailed information.
No host found which provides the requested resources:Perhaps a collision with a node freepool
A node freepool limits the access to a node. The constraints are part of the node's properties.
One can inspect these propertiers by calling ccsinfo -n --state=%H%p%m.
More details are in the OpenCCS User-Manual Appendix K
OpenCCS did not write a job trace file
It may happen that OpenCCS is temporarily not able to write the trace file to the specified directory. In such cases OpenCCS writes the file to $CCS/tmp/OCULUS/TRACES.
Which node types are available?
|normal||552||two Intel Xeon E5-2670||16||64GB||-|
|washington||20||two Intel Xeon E5-2670||16||256GB||-|
|tesla||14||two Intel Xeon E5-2670||16||64GB||1 nVIDIA K20 (Kepler)|
|tesla||7||two Intel Xeon E5-2670||16||64GB||2 nVIDIA K20 (Kepler)|
|gtx1080||13||two Intel Xeon E5-2670||16||64GB||1 nVIDIA GeForce GTX-1080 Ti)|
|gtx1080||2||two Intel Xeon E5-2670||16||64GB||2 nVIDIA GeForce GTX-1080 Ti)|
|rtx2080||1||two Intel Xeon E5-2670||16||64GB||2 nVIDIA GeForce RTX-2080 Ti)|
|smp||4||four Intel Xeon E5-4670||32||1TB||-|
'ccsinfo -n' shows 'only local jobs = true'. What does it mean?
This node does only accept jobs which run completey on that node. Jobs using more than one node are not mapped to this node.
Cannot get 16 cores on a gpu node
Nodes hosting GPU cards keep one core per GPU free for jobs requesting GPUs. Hence, jobs not requesting a GPU card, will only get 14 or 15 cores at maximum on that nodes.
Install Python packages in your home directory
Choose the python release you want to use by loading the related module.
$ pip install --user numpy
This will install the numpy package in your home directory: $HOME/.local/. Once that is done, you will need to make sure:
$HOME/.local/bin is in your ‘PATH’ variable and $HOME/.local/lib/pythonx.y/site-packages/ is in your PYTHONPATH.
Make sure to replace the x.y part with the actual version of Python you are using. For instance:
$ export PATH=$HOME/.local/bin:$PATH $ export PYTHONPATH=$HOME/.local/lib/python2.7/site-packages/:$PYTHONPATH
How to (un)select specific node types?
Normally, you won't care about this question, because you just request cores, memory, accelerators, or licences and CCS cares about the mapping. However, for benchmarking purposes it may be useful to (un)select specific node types. For this purpose, we provide resources of type Boolean to (un)select node types. ccsinfo -a shows the available resources:
Name Type, Amount Default Purpose Flags Used/Online/Max ============================================================= ncpus U,C 4920/8704/8704 1 number of cores nodes U,C 0/525/617 1 number of exclusively used nodes mem S,C 18.92t/40.18t/40.18t 3.94g physical memory vmem S,C 23.50t/50.51t/50.51t 4.89g virtual memory cput T, - N/A CPU time walltime T,J - N/A walltime hostname A, - N/A hostname arch A, - N/A host architecture mpiprocs U, - N/A number of mpi processes per chunk ompthreads U, - N/A number of threads per chunk mdce U,CJ 0/256/256 N/A Matlab Distributed Computing Environment licenses norm B, - N/A 64GB compute node gtx1080 U,C 0/8/16 2 nVIDIA GeForce GTX1080 Ti (11GB RAM) rtx2080 U,C 0/2/2 0 NVIDIA RTX2080Ti GPU rack U, - N/A rack number smp B, - N/A SMP node tesla U,C 0/32/32 1 Tesla K20xm card wash B, - N/A washington node
For example. If you want to run a job only on the washington nodes set wash=true
requests 2 chunks each with 5 cores and wash==true.
To exclude washington and smp nodes use:
How to allocate a Tesla accelerator
For Tesla K20, we provide the consumable resource tesla. This avoids that more than one job will be scheduled to a card at the same time.
Hence, to request 2 chunks each with 8 cpus and one Tesla card use:
For offload jobs CCS sets the environment variable:
For jobs mapped on a GPU node but not requesting the Tesla accelerator CCS sets the environment variable:
which is an invalid value.
Typically sufficient vmem has to be allocated.
How to allocate a GTX 1080 TI or RTX 2080 TI accelerator
For GTX 1080 TI, we provide the consumable resource gtx1080. For RTX 2080 TI, we provide the consumable resource rtx2080.
This avoids that more than one job will be scheduled to a card at the same time.
Hence, to request 2 chunks each with 8 cpus and one GTX 1080 TI card use:
CCS will automatically set the environment variable CUDA_VISIBLE_DEVICES accordingly, so that your job only uses the allocated accelerator. This avoids interference with other users that use the second accelerator in the node. Please do not set this environment variable manually in your job script our your program.
For jobs mapped on a GPU node but not requesting an accelerator CCS sets the environment variable:
which is an invalid value.
Typically sufficient vmem has to be allocated.
How to keep Java from using to many cpu cores
Even if the user sets the number of threads within a java program, java uses additional threads for the garbage collection (GC). Thus, it can happen that a job running a java program exceeds its allowed cpu usage on Oculus if the number of compute threads is identical to the choice of ncpus in the job resource specification. The number of threads for the garbage collection can be controlled with the command line arguments
java -XX:ParallelGCThreads=2 -XX:ConcGCThreads=1 HelloWorld
sets to use 2 threads for the parallel GC and one thread for the concurrent garbage collectors.
If you want to be on the safe side, the number of (ParallelGCThreads)+(ConcGCThreads)+(number of compute threads) should equal the number of requested cpu cores (ncpus).
It is also possible to use the argument
do use a serial garbage collection that uses exactly one thread. With UseSerialGC the safe choice would be (number of compute threads)+1=ncpus.