- 23.01.19, 08:00 - 18:00 Maintenance of the university file server infrastructure (hosting $HOME, $PC2DATA, and $PC2SCRATCH)
System Status Messages
- 03.01.19,13:00 - 04.01.19,11:00 Noctua: Lustre metadata problems. Access to directories may fail.
- 08.11.18,08:00 - 09.11.18:12:00: OCULUS: BeeGFS maintenance. System will be offline. $PC2PFS will be not available.
- 14.09.18,09:00 Faulty connection to the Campus storage, leading to I/O errors and job start failures.
- 30.08.18,11:00 - 04.09.18,11:00 Faulty connection to the Campus storage, leading to I/O errors and job start failures.
- 29.08.18,08:00 - 29.08.18,11:00 Faulty connection to the Campus storage, leading to I/O errors and job start failures.
- 24.08.18,16:00 - 27.08.18,11:00 Faulty connection to the Campus storage, leading to I/O errors and job start failures.
- 18.06.18,08:00 - 26.06.18,11:00 whole PC² was offline due to a major maintenance in preparation for our new cluster Noctua. We did:
* Update our Firewall systems.
* Improve our network infrastructure to get more redundancy.
* Improve our power supply and cooling infrastructure.
* Reorganize the directory structure of /upb/departments/pc2
* We will move from user based work and scratch directories to group based work and scratch directories.
Related directories are: $PC2GROUPS, $PC2WORK, $PC2SCRATCH, and $PC2SW.
During the whole maintenance:
* All PC² clusters (ARMINIUS, HTC, OCULUS, PLING3) are offline.
* The directories: $HOME, $PC2GROUPS, $PC2WORK, $PC2SCRATCH, and $PC2SW were also not accessible.
- 06.06.2018, 13:00-14:00 The connection to the Campus storage were faulty leading to I/O errors.
- 01.02.2018 - 08.02.2018 OCULUS: second frontend is locked due to a workshop. Please use first frontend fe.pc2.uni-paderborn.de
- 14.12.2017,17:30 ARMINIUS, HTC, and OCULUS are online again
- 13.12.2017,18:30 ARMINIUS, HTC, and OCULUS will be hopefully online again tomorrow at about 5pm (depends on the sucessful repair of the university power supply).
- 12.12.2017,11:00 ARMINIUS, HTC, and OCULUS were affected by a power failure in our compute center. A lot of compute nodes are offline.
- 16.10.2017,10:00 - 17.10.2017,13:00 OCuLUS: Maintenance: OpenCCS update to 0.9.8-2 and BeeGFS file system check
- 09.10.2017,10:00 - 16:00 Due to a complete failure of the cooling system in our computer rooms, one of the seven file servers of the OCuLUS parallel file system ($PC2SCRATCH, /scratch) died. File chunks stored on the defect node were not accessible.
- 06.10.2017,10:00 - 06.10.2017,14:20 HTCluster: Maintenance: OpenCCS update to 0.9.8-2
- 24.07.2017, 7:00-11:00 SOLVED: OCuLUS: jobs are running but new jobs will be not accepted; problem with scheduler communication
- 22.06.2017,10:00 - 22.06.2017,12:00 HTCluster: Maintenance
- 19.06.2017,10:00 - 20.06.2017,13:00 Arminius: Maintenance
- 05.06.2017,10:00 - 07.06.2017,13:00 OCuLUS: Maintenance
- 06.04.17 - 10.04.17: HTCluster maintenance
- 20.03.2017,14:00-21.03.2017,12:00 OCULUS,PLING3: Access to $HOME, $PC2WORK, and $PC2GROUPS were faulty.
- 17.03.2017,14:00 The defect storage has been repaired and the parallel file system ($PC2SCRATCH, /scratch) is completely online again.
- 06.03.2017,10:00 One of the seven file servers of the OCuLUS parallel file system ($PC2SCRATCH, /scratch) died during the weekend. Only file chunks stored on the defect node are not accessible. We are waiting for a replacement motherboard.
- 30.12.2016,08:00 - 13:00 Update of the PC² firewall. Affected are external connections. We expect short outages. All systems are drained within this period.
- 18.10.2016,17:00: OCuLUS is online again. Refer to the Newsticker for more information on what has changed.
- 26.09.2016,07:00: OCuLUS (including frontends and the parallel file system /scratch (aka $PC2SCRATCH) not available due to a complete system software upgrade until 18.10.2016
- 06.09.2016,15:00: OCULUS: The parallel filesystem /scratch (aka $PC2SCRATCH) is online again.
- 01.09.2016,13:00: ARMINIUS is online again.
- 29.08.2016,14:00: OCULUS: The parallel filesystem /scratch (aka $PC2SCRATCH) is offline. Jobs may fail.
- 29.08.2016,12:00: ARMINIUS: offline due to damaged power circuits.
- 26.08.2016: OCULUS: Parts of the parallel filesystem /scratch (aka $PC2SCRATCH) are damaged, leading to run time errors for some users. We are currently checking the file system. However, we may have to shut down $PC2SCRATCH in a few days to be able to repair it.
- 19.08.2016: hostname fe.pc2.upb.de is available again.
- 18.08.2016: hostname fe.pc2.upb.de temporaily cannot be resolved due to a upgrade in the university network. Please use fe-1.cv2012.pc2.uni-paderborn.de instead.
- 16.08.2016,07:00 - 17.08.2016,13:00 OCuLUS: Maintenance: OpenCCS update.
- 08.06.2016 08:00 - 11:30 Maintenance of the Campus Storage System (Major update). $HOME, $PC2WORK, $PC2GROUPS will fail. ARMINIUS, OCULUS, and HTC were drained (no jobs will be started during this period).
- 15.03.2016,07:00 - 16.03.16,11:00 Maintenance of the PC2 firewall. Arminius, OCuLUS, and HTC are offline during the maintenance.
- 11.01.2016,10:00 - 12.01.2016,11:00 OCuLUS: Maintenance
- 09.11.2015 - 30.11.2015 Arminius: Upgrade of the operating system and hardware move to other racks.
- 18.09.2015 12:00 All systems are up and running.
- 18.09.2015 10:00 - 12:00 We were offline due to the HW-Maintenance of a core switch in the University.
- 17.09.2015 07:00 - 08:00 SW-Maintenance of a core switch in the University. Jobs were affected due to LDAP outages.
- 11.09.2015 11:00 HTC is online again. Thanks for the patience.
- 08.09.2015 12:00 HTC is not available due to a physical movement of the nodes. We hope to be online again today.
- 27.05.2015 16:30, all systems are up and running.
- 27.05.2015 Power Grid Failure in Paderborn Arminius, OCuLUS, HTC were affected. Services are recovering.
- 17.04.2015 13:00 - 16:00: Maintenance of core Ethernet switches in the University network. Connection interrupts and job failures are possible.
- 06.02.15 All systems are up and running.
- 05.02.15 08:00 - 06.02.15,12:00 HTC: Maintenance: Torque upgrade to 4.2.9. System will be offline.
- 02.02.15 10:00-13:00 OCuLUS: /scratch filesystem (PC2SCRATCH) was corrupt due to an XFS failure on storage02. Parts of the filesystem were not available.
- 20.01.15 15:00 - 17:00 Maintenance of the Campus Storage System (Firmware update). $HOME, $PC2WORK, $PC2GROUPS may have short outages.
- 14.01.15 All systems are up and running.
- 13.01.15 16:00 OCuLUS and HTC are online again.
- 13.01.15 10:00 - 14.01.15,13:00 ARMINIUS: Due to a failure in the power supply of the whole university, a lot of compute nodes switched off. System is offline
- 13.01.15 10:00 - 16:00 OCuLUS,HTC Due to a failure in the power supply of the whole university, a lot of compute nodes switched off.
- 06.01.15 10:00 - 07.01.15:20:00 OCuLUS: Maintenance of the parallel file system. PC2SCRATCH and OCuLUS are offline.
- 10.12.14 18:00 All systems are up and running.
- 10.12.14 08:00 - 10.12.14:18:00 Arminius: Maintenance.
- 31.10.14 All systems are up and running.
- 30.10.14 22:30 - 31.10.14,15:30 Due to a failure in the university network, basic services like LDAP or Mail were not working correctly. All our systems were affected
- 22.10.14 OCuLUS: /scratch filesystem (PC2SCRATCH) has been fixed. All systems are up and running.
- 20.10.14 OCuLUS: /scratch filesystem (PC2SCRATCH) is corrupt due to a failed RAID controller hardware in a meta data server. Parts of the filesystem are not available.
- The outage will take at least until 22.10.14
- 09.10.14 The new High-Throughput Cluster is online.
- 01.09.14 08:00 - 01.09.14,15:00 Arminius: Maintenance. System not available
- 12.08.14 10:00 - 19:00 OCuLUS: Offline due to a upgrade of the parallel file system (PC2SCRATCH). Frontends are also offline
- 17.07.14 - 18.07.14,16:30 Arminius offline due to network problems.
- 15.07.14 - 17.07.14,15:00: OCuLUS and Arminius: The connections to PC² are often broken due to network problems. Both clusters were not accessible.
- 08.07.14 08:00 - 08.07.14,15:00 Maintenance of the PC² firewall. External connections may be affected.
- 27.06.14 - 08.07.14: Problems in some campus core switches, connections to PC² file-I/O and LDAP requests were affected.
- 22.06.14 15:50 - 23.06.14,12:00 OCuLUS: Offline due to massive network problems.
- 12.06.14 10:00 - 12.06.14,14:20 OCuLUS: Maintenance. Running jobs were not affected.
- 06.05.14 11:00 - 06.05.14,18:00 Arminius: Upgrade of OpenCCS.
- 04.04.14 10:00 - 04.04.14,18:00 OCuLUS: Upgrade of OpenCCS. Running jobs are not affected.
- 26.02.14 08:00 - 27.02.14,16:00 Arminius: Whole cluster down, due to a power failure.
- 18.02.14 12:00 All systems online
- 10.02.14 10:00 - 18.02.14,12:00 OCuLUS: Maintenance: Switch to Campus Storage. System will be not available
- 13.01.14 10:00 - 10.02.14,11:00 Arminius: Maintenance: Switch to Campus Storage. System is not available
- 13.12.13 16:30 -17:15 Arminius: Both frontends are not accessible due to a hardware problem. Jobs are running
- 23.10.13 10:00 - 11.11.13,09:30 OCuLUS: Maintenance
- 29.08.13 10:00 - 30.08.13,17:00 Arminius: Maintenance
- 20.08.13 10:00 - 21.08.13,12:00 OCuLUS: Maintenance
- 19.06.13 All systems running
- 18.06.13 10:00 OCuLUS offline until further notice due to network problems
- 14.06.13 19:15 OCuLUS up and running
- 11.06.13 11:55 OCuLUS offline until further notice due to a power supply problem
- 10.06.13 09:30 - 11.06.13,12:00 OCuLUS: Maintenance of the parallel file system
- 02.05.13 All systems running
- 30.04.13 Arminius offline due to a OpenCCS upgrade and a maintenance of the file server
- 27.03.13 All systems running
- Noctua - HPC Cluster with FPGA Accelerators
- OCuLUS - HPC Cluster with GPU Accelerators and Large Shared-Memory Nodes
- Arminius - HPC Cluster
- HT-Cluster - High Throughput Cluster
- HARP2 - Xeon+FPGA Cluster in the Intel Hardware Accelerator Research Program (restricted access to systems and documentation)
- Available file systems
Applications and Software Tools
Workload Management Systems
How to Become a PC² User