OCuLUS-FAQ
Inhaltsverzeichnis
- 1 FAQs
- 1.1 My job was aborted with the message: "... usage .. exceeded limit"
- 1.2 How to get information about current resource usage
- 1.3 How to convert job scripts from another WLM (SLURM, PBS, Torque, ...)
- 1.4 How to run job-chains (one job after another)
- 1.5 How to search for module files
- 1.6 ccsinfo reqID does not find the data
- 1.7 No host found which provides the requested resources:Perhaps a collision with a node freepool
- 1.8 OpenCCS did not write a job trace file
- 1.9 Which node types are available?
- 1.10 'ccsinfo -n' shows 'only local jobs = true'. What does it mean?
- 1.11 Cannot get 16 cores on a gpu node
- 2 HowTos
FAQs
My job was aborted with the message: "... usage .. exceeded limit"
If you request resources in shared mode, CCS will observe your job and abort it, if the job uses more than the requested resources.
This is to ensure that all jobs running on a node will be able to use their requested resources.
Chapter 5.6 in the User-Manual explaines this limit enforcement more detailed.
If you cannot estimate the needed resources, it is a good starting point to allocate nodes exclusively and to activate
- Email notification (ccsalloc -m ...) or
- writing of a job trace file (ccsalloc --trace ...)
After the job ended, you will get a report of the used resources.
ccsinfo <reqID> prints the current resource usage while the job is running.
ccstracejob <reqID> prints this information after the job ended. See also the FAQ below.
E.g., to allocate 4 nodes exclusively you may use
* ccsalloc -n 4 * ccsalloc --res=rset=4:ncpus=1,place=:excl
The latter one, gives you more options to specify the characteristics of a node.
In Brief Instructions you will find a lot of examples on how to use ccsalloc --res
How to get information about current resource usage
- you may use the command pc2status, which gives you also this information
- ccsinfo -u gives you something like this
Allocated Resources of Group: hpc-prf-hell Resource Limit Allocated (% of Limit) ==================================================================== ncpus 2048 228 ( 11.13%) mem N/A 885.13g ( N/A) vmem N/A 1.07t ( N/A) gpus 10 1 ( 10.00%)
How to convert job scripts from another WLM (SLURM, PBS, Torque, ...)
Read WLM-Rosetta and Oculus Brief Instructions
How to run job-chains (one job after another)
Job chains are useful if one job depends on the output of another job.
Example: We have 3 jobs: job1.sh, job2.sh, and job3.sh
Job3 depends on job2 and job2 depends on job1.
So, we have to start them one after another.
Because CCS does not support job-chains directly, we provide the script $PC2SW/examples/submitCCSJobchain.sh.
Calling it without a parameter prints a help text.
How to search for module files
We provide the command mdlsearch to search for module files.
$ mdlsearch purpose: search for module files usage: mdlsearch ITEM ITEM is not case sensitive. examples: mdlsearch fftw prints all modules which names contain fftw mdlsearch "^fftw" prints all modules which names start with fftw
ccsinfo reqID does not find the data
OpenCCS holds the data of completed jobs for 30 minutes in its memory. After that time the data is removed and ccsinfo prints this message:
[kel@fe2]$ ccsinfo 4229599 ERROR:Inquiry denied for request (4229599):Unknown request 4229599
The command ccstracejob allows to print log and accounting data of such jobs. Refer to the man page of ccstracejob for detailed information.
No host found which provides the requested resources:Perhaps a collision with a node freepool
A node freepool limits the access to a node. The constraints are part of the node's properties.
One can inspect these propertiers by calling ccsinfo -n --state=%H%p%m.
More details are in the OpenCCS User-Manual Appendix K
OpenCCS did not write a job trace file
It may happen that OpenCCS is temporarily not able to write the trace file to the specified directory. In such cases OpenCCS writes the file to $CCS/tmp/OCULUS/TRACES.
Which node types are available?
Type | Nodes | CPU-Type | Cores | Memory | Accelerator |
---|---|---|---|---|---|
normal | 552 | two Intel Xeon E5-2670 | 16 | 64GB | - |
washington | 20 | two Intel Xeon E5-2670 | 16 | 256GB | - |
tesla | 1 | two Intel Xeon E5-2670 | 16 | 64GB | 1 nVIDIA K20 (Kepler) |
tesla | 7 | two Intel Xeon E5-2670 | 16 | 64GB | 2 nVIDIA K20 (Kepler) |
gtx1080 | 14 | two Intel Xeon E5-2670 | 16 | 64GB | 1 nVIDIA GeForce GTX-1080 Ti) |
gtx1080 | 2 | two Intel Xeon E5-2670 | 16 | 64GB | 2 nVIDIA GeForce GTX-1080 Ti) |
rtx2080 | 15 | two Intel Xeon E5-2670 | 16 | 64GB | 1 nVIDIA GeForce RTX-2080 Ti) |
rtx2080 | 2 | two Intel Xeon E5-2670 | 16 | 64GB | 2 nVIDIA GeForce RTX-2080 Ti) |
smp | 4 | four Intel Xeon E5-4670 | 32 | 1TB | - |
'ccsinfo -n' shows 'only local jobs = true'. What does it mean?
This node does only accept jobs which run completey on that node. Jobs using more than one node are not mapped to this node.
Cannot get 16 cores on a gpu node
Nodes hosting GPU cards keep one core per GPU free for jobs requesting GPUs. Hence, jobs not requesting a GPU card, will only get 14 or 15 cores at maximum on that nodes.
HowTos
Install Python packages in your home directory
To accelerate the file-I/O and avoid quota problems with $HOME, you may establish softlinks from $HOME/.local and $HOME/.cache to your $PC2PFS group directory.
Example: #save the original directories mv $HOME/.local $HOME/.local-sic mv $HOME/.cache $HOME/.cache-sic #create the new directories in $PC2PFS cd $PC2PFS/MYGROUP/MYDIR mkdir -p homelocal homecache #create the softlinks cd $HOME ln -s $PC2PFS/MYGROUP/MYDIR/homelocal .local ln -s $PC2PFS/MYGROUP/MYDIR/homecache .cache
Choose the python release you want to use by loading the related module (e.g. module load lang/Python/3.7.4-GCCcore-8.3.0).
You may search for Python modules by using mdlsearch python. Then install for example numpy.
$ pip install --user numpy
This will install the numpy package in your home directory: $HOME/.local/. Once that is done, you will need to make sure:
$HOME/.local/bin is in your ‘PATH’ variable and $HOME/.local/lib/pythonx.y/site-packages/ is in your PYTHONPATH.
Make sure to replace the x.y part with the actual version of Python you are using. For instance:
$ export PATH=$HOME/.local/bin:$PATH $ export PYTHONPATH=$HOME/.local/lib/python2.7/site-packages/:$PYTHONPATH
How to (un)select specific node types?
Normally, you won't care about this question, because you just request cores, memory, accelerators, or licences and CCS cares about the mapping. However, for benchmarking purposes it may be useful to (un)select specific node types. For this purpose, we provide resources of type Boolean to (un)select node types. ccsinfo -a shows all available resources:
Name Type, Amount Default Purpose Flags Used/Online/Max ============================================================= ncpus U,C 4775/8784/9264 1 Number of cores nodes U,C 447/545/619 1 Number of nodes mem S,C 23.77t/40.63t/42.91t 3.93g Physical memory vmem S,C 25.18t/52.04t/55.09t 4.93g Virtual memory cput T, - N/A CPU time walltime T,J - N/A Walltime hostname A, - N/A Hostname arch A, - N/A Host architecture mpiprocs U, - N/A Number of MPI processes per chunk ompthreads U, - N/A Number of threads per chunk amd B, - N/A node with AMD CPU gpunode B, - N/A GPU node gpus U,C 13/48/59 0 GPU gtx1080 B, - N/A NVIDIA GTX1080Ti GPU ibswitch V, - N/A Infiniband-switch number mdce U,CJ 0/256/256 false Matlab Distributed Computing Environment licenses norm B, - N/A 62GiByte compute node rack U, - N/A Rack number rtx2080 B, - N/A NVIDIA RTX2080Ti GPU smp B, - N/A SMP node tesla B, - N/A NVIDIA Tesla K20xm GPU vtune B, - N/A node equipped with VTune HW-performance counter wash B, - N/A Washington node
Examples:
- run a job only on the 62GiByte compute nodes
- --res=rset=2:ncpus=5:norm=t
- run a job only on the washington nodes
- --res=rset=2:ncpus=5:wash=t
- Requesting a GPU of any type
- --res=rset=ncpus=8:mem:40g:gpus=1
- Requesting a GPU of any type but no Tesla
- --res=rset=ncpus=8:mem:40g:gpus=1:tesla=f
- Excluding GPU nodes:
- --res=rset=2:ncpus=5:gpunodes=f
How to allocate / avoid AMD CPUs
Since AMD CPUs are compatible to Intel CPUs, CCS does not distinguish between Intel and AMD CPUs. However, if you explicitely want to use AMD CPUs use:
--res=rset=2:amd=t:ncpus=5
If you explicitely NOT want to use AMD CPUs use:
--res=rset=2:amd=f:ncpus=5
How to allocate GPUs
For GPUs, we provide the consumable resource gpus. This avoids that more than one job will be scheduled to a card at the same time. The boolean resources tesla, gtx1080, and rtx2080 may be used to (un)select specific GPU types.
Hence, to request 2 chunks each with 8 cpus and one Tesla card use:
--res=rset=2:ncpus=8:gpus=1:tesla=true
For offload jobs CCS sets the environment variable:
CUDA_VISIBLE_DEVICES=0
For jobs mapped on a GPU node but not requesting a GPU CCS sets the environment variable:
CUDA_VISIBLE_DEVICES=1024
which is an invalid value.
For jobs mapped to a non GPU node, CCS does not set the environment variable CUDA_VISIBLE_DEVICES
Typically sufficient vmem has to be allocated.
- Tesla: --res=rset=1:ncpus=1:tesla=t:gpus=1:mem=8g:vmem=85g
- GTX1080: --res=rset=1:ncpus=1:gtx1080=t:gpus=1:mem=4g
- RTX2080: --res=rset=1:ncpus=1:rtx2080=t:gpus=1:mem=4g
How to keep Java from using to many cpu cores
Even if the user sets the number of threads within a java program, java uses additional threads for the garbage collection (GC). Thus, it can happen that a job running a java program exceeds its allowed cpu usage on Oculus if the number of compute threads is identical to the choice of ncpus in the job resource specification. The number of threads for the garbage collection can be controlled with the command line arguments
-XX:ParallelGCThreads
and
-XX:ConcGCThreads
For example,
java -XX:ParallelGCThreads=2 -XX:ConcGCThreads=1 HelloWorld
sets to use 2 threads for the parallel GC and one thread for the concurrent garbage collectors.
If you want to be on the safe side, the number of (ParallelGCThreads)+(ConcGCThreads)+(number of compute threads) should equal the number of requested cpu cores (ncpus).
It is also possible to use the argument
-XX:+UseSerialGC
do use a serial garbage collection that uses exactly one thread. With UseSerialGC the safe choice would be (number of compute threads)+1=ncpus.