Hauptseite

Aus PC2 Doc
Wechseln zu: Navigation, Suche


Upcoming Events

System Status Messages

  • 24.05.19, 13:00 - 17:00: OCULUS: Maintenance. All computes nodes will be rebooted.
  • 15.04.2019, 09:00 - 10:00 Emergency patch for university file server infrastructure (hosting $HOME, $PC2DATA, and $PC2SCRATCH), improving stability
    • During the maintenance several connection failures may happen which may affect running jobs on our clusters.
  • 01.04.2019,08:00 - 08.04.2019,12:00 ARMINIUS,HTC,Noctua,OCuLUS: Maintenance: Transformations to the new project model.
    • All systems are offline. On OCULUS, we will do also a BeeGFS file system check. Hence $PC2PFS on OCULUS is also offline.
  • 25.02.19, 06:00 - 08.03.19,18:00 Maintenance of Noctua. System will be offline including the frontends and the parallel file system. We have to extend the maintenance due to technical problems at least until 05.03.19 (perhaps longer).
  • 22.02.19: After the update of the university file server, we see problems related to file locking and I/O performance (especially on OCULUS). We are working on these issues.
  • 19.02.19, 08:00 - 19:40 Maintenance of the university file server infrastructure (hosting $HOME, $PC2DATA, and $PC2SCRATCH)
  • 23.01.19, 08:00 - 14:30 Maintenance of the university file server infrastructure (hosting $HOME, $PC2DATA, and $PC2SCRATCH)
  • 03.01.19,13:00 - 04.01.19,11:00 Noctua: Lustre metadata problems. Access to directories may fail.
  • 08.11.18,08:00 - 09.11.18:12:00: OCULUS: BeeGFS maintenance. System will be offline. $PC2PFS will be not available.
  • 14.09.18,09:00 Faulty connection to the Campus storage, leading to I/O errors and job start failures.
  • 30.08.18,11:00 - 04.09.18,11:00 Faulty connection to the Campus storage, leading to I/O errors and job start failures.
  • 29.08.18,08:00 - 29.08.18,11:00 Faulty connection to the Campus storage, leading to I/O errors and job start failures.
  • 24.08.18,16:00 - 27.08.18,11:00 Faulty connection to the Campus storage, leading to I/O errors and job start failures.
  • 18.06.18,08:00 - 26.06.18,11:00 whole PC² was offline due to a major maintenance in preparation for our new cluster Noctua. We did:
 * Update our Firewall systems.
 * Improve our network infrastructure to get more redundancy.
 * Improve our power supply and cooling infrastructure.
 * Reorganize the directory structure of /upb/departments/pc2
   * We will move from user based work and scratch directories to group based work and scratch directories.
     Related directories are: $PC2GROUPS, $PC2WORK, $PC2SCRATCH, and $PC2SW.
  During the whole maintenance:
 * All PC² clusters (ARMINIUS, HTC, OCULUS, PLING3) are offline.
 * The directories: $HOME, $PC2GROUPS, $PC2WORK, $PC2SCRATCH, and $PC2SW were also not accessible.
  • 06.06.2018, 13:00-14:00 The connection to the Campus storage were faulty leading to I/O errors.
  • 01.02.2018 - 08.02.2018 OCULUS: second frontend is locked due to a workshop. Please use first frontend fe.pc2.uni-paderborn.de
  • 14.12.2017,17:30 ARMINIUS, HTC, and OCULUS are online again
  • 13.12.2017,18:30 ARMINIUS, HTC, and OCULUS will be hopefully online again tomorrow at about 5pm (depends on the sucessful repair of the university power supply).
  • 12.12.2017,11:00 ARMINIUS, HTC, and OCULUS were affected by a power failure in our compute center. A lot of compute nodes are offline.
  • 16.10.2017,10:00 - 17.10.2017,13:00 OCuLUS: Maintenance: OpenCCS update to 0.9.8-2 and BeeGFS file system check
  • 09.10.2017,10:00 - 16:00 Due to a complete failure of the cooling system in our computer rooms, one of the seven file servers of the OCuLUS parallel file system ($PC2SCRATCH, /scratch) died. File chunks stored on the defect node were not accessible.
  • 06.10.2017,10:00 - 06.10.2017,14:20 HTCluster: Maintenance: OpenCCS update to 0.9.8-2
  • 24.07.2017, 7:00-11:00 SOLVED: OCuLUS: jobs are running but new jobs will be not accepted; problem with scheduler communication
  • 22.06.2017,10:00 - 22.06.2017,12:00 HTCluster: Maintenance
  • 19.06.2017,10:00 - 20.06.2017,13:00 Arminius: Maintenance
  • 05.06.2017,10:00 - 07.06.2017,13:00 OCuLUS: Maintenance
  • 06.04.17 - 10.04.17: HTCluster maintenance
  • 20.03.2017,14:00-21.03.2017,12:00 OCULUS,PLING3: Access to $HOME, $PC2WORK, and $PC2GROUPS were faulty.
  • 17.03.2017,14:00 The defect storage has been repaired and the parallel file system ($PC2SCRATCH, /scratch) is completely online again.
  • 06.03.2017,10:00 One of the seven file servers of the OCuLUS parallel file system ($PC2SCRATCH, /scratch) died during the weekend. Only file chunks stored on the defect node are not accessible. We are waiting for a replacement motherboard.
  • 30.12.2016,08:00 - 13:00 Update of the PC² firewall. Affected are external connections. We expect short outages. All systems are drained within this period.
  • 18.10.2016,17:00: OCuLUS is online again. Refer to the Newsticker for more information on what has changed.
  • 26.09.2016,07:00: OCuLUS (including frontends and the parallel file system /scratch (aka $PC2SCRATCH) not available due to a complete system software upgrade until 18.10.2016
  • 06.09.2016,15:00: OCULUS: The parallel filesystem /scratch (aka $PC2SCRATCH) is online again.
  • 01.09.2016,13:00: ARMINIUS is online again.
  • 29.08.2016,14:00: OCULUS: The parallel filesystem /scratch (aka $PC2SCRATCH) is offline. Jobs may fail.
  • 29.08.2016,12:00: ARMINIUS: offline due to damaged power circuits.
  • 26.08.2016: OCULUS: Parts of the parallel filesystem /scratch (aka $PC2SCRATCH) are damaged, leading to run time errors for some users. We are currently checking the file system. However, we may have to shut down $PC2SCRATCH in a few days to be able to repair it.
  • 19.08.2016: hostname fe.pc2.upb.de is available again.
  • 18.08.2016: hostname fe.pc2.upb.de temporaily cannot be resolved due to a upgrade in the university network. Please use fe-1.cv2012.pc2.uni-paderborn.de instead.
  • 16.08.2016,07:00 - 17.08.2016,13:00 OCuLUS: Maintenance: OpenCCS update.
  • 08.06.2016 08:00 - 11:30 Maintenance of the Campus Storage System (Major update). $HOME, $PC2WORK, $PC2GROUPS will fail. ARMINIUS, OCULUS, and HTC were drained (no jobs will be started during this period).
  • 15.03.2016,07:00 - 16.03.16,11:00 Maintenance of the PC2 firewall. Arminius, OCuLUS, and HTC are offline during the maintenance.
  • 11.01.2016,10:00 - 12.01.2016,11:00 OCuLUS: Maintenance
  • 09.11.2015 - 30.11.2015 Arminius: Upgrade of the operating system and hardware move to other racks.
  • 18.09.2015 12:00 All systems are up and running.
  • 18.09.2015 10:00 - 12:00 We were offline due to the HW-Maintenance of a core switch in the University.
  • 17.09.2015 07:00 - 08:00 SW-Maintenance of a core switch in the University. Jobs were affected due to LDAP outages.
  • 11.09.2015 11:00 HTC is online again. Thanks for the patience.
  • 08.09.2015 12:00 HTC is not available due to a physical movement of the nodes. We hope to be online again today.
  • 27.05.2015 16:30, all systems are up and running.
  • 27.05.2015 Power Grid Failure in Paderborn Arminius, OCuLUS, HTC were affected. Services are recovering.
  • 17.04.2015 13:00 - 16:00: Maintenance of core Ethernet switches in the University network. Connection interrupts and job failures are possible.
  • 06.02.15 All systems are up and running.
  • 05.02.15 08:00 - 06.02.15,12:00 HTC: Maintenance: Torque upgrade to 4.2.9. System will be offline.
  • 02.02.15 10:00-13:00 OCuLUS: /scratch filesystem (PC2SCRATCH) was corrupt due to an XFS failure on storage02. Parts of the filesystem were not available.
  • 20.01.15 15:00 - 17:00 Maintenance of the Campus Storage System (Firmware update). $HOME, $PC2WORK, $PC2GROUPS may have short outages.
  • 14.01.15 All systems are up and running.
  • 13.01.15 16:00 OCuLUS and HTC are online again.
  • 13.01.15 10:00 - 14.01.15,13:00 ARMINIUS: Due to a failure in the power supply of the whole university, a lot of compute nodes switched off. System is offline
  • 13.01.15 10:00 - 16:00 OCuLUS,HTC Due to a failure in the power supply of the whole university, a lot of compute nodes switched off.
  • 06.01.15 10:00 - 07.01.15:20:00 OCuLUS: Maintenance of the parallel file system. PC2SCRATCH and OCuLUS are offline.
  • 10.12.14 18:00 All systems are up and running.
  • 10.12.14 08:00 - 10.12.14:18:00 Arminius: Maintenance.
  • 31.10.14 All systems are up and running.
  • 30.10.14 22:30 - 31.10.14,15:30 Due to a failure in the university network, basic services like LDAP or Mail were not working correctly. All our systems were affected
  • 22.10.14 OCuLUS: /scratch filesystem (PC2SCRATCH) has been fixed. All systems are up and running.
  • 20.10.14 OCuLUS: /scratch filesystem (PC2SCRATCH) is corrupt due to a failed RAID controller hardware in a meta data server. Parts of the filesystem are not available.
    • The outage will take at least until 22.10.14
  • 09.10.14 The new High-Throughput Cluster is online.
  • 01.09.14 08:00 - 01.09.14,15:00 Arminius: Maintenance. System not available
  • 12.08.14 10:00 - 19:00 OCuLUS: Offline due to a upgrade of the parallel file system (PC2SCRATCH). Frontends are also offline
  • 17.07.14 - 18.07.14,16:30 Arminius offline due to network problems.
  • 15.07.14 - 17.07.14,15:00: OCuLUS and Arminius: The connections to PC² are often broken due to network problems. Both clusters were not accessible.
  • 08.07.14 08:00 - 08.07.14,15:00 Maintenance of the PC² firewall. External connections may be affected.
  • 27.06.14 - 08.07.14: Problems in some campus core switches, connections to PC² file-I/O and LDAP requests were affected.
  • 22.06.14 15:50 - 23.06.14,12:00 OCuLUS: Offline due to massive network problems.
  • 12.06.14 10:00 - 12.06.14,14:20 OCuLUS: Maintenance. Running jobs were not affected.
  • 06.05.14 11:00 - 06.05.14,18:00 Arminius: Upgrade of OpenCCS.
  • 04.04.14 10:00 - 04.04.14,18:00 OCuLUS: Upgrade of OpenCCS. Running jobs are not affected.
  • 26.02.14 08:00 - 27.02.14,16:00 Arminius: Whole cluster down, due to a power failure.
  • 18.02.14 12:00 All systems online
  • 10.02.14 10:00 - 18.02.14,12:00 OCuLUS: Maintenance: Switch to Campus Storage. System will be not available
  • 13.01.14 10:00 - 10.02.14,11:00 Arminius: Maintenance: Switch to Campus Storage. System is not available
  • 13.12.13 16:30 -17:15 Arminius: Both frontends are not accessible due to a hardware problem. Jobs are running
  • 23.10.13 10:00 - 11.11.13,09:30 OCuLUS: Maintenance
  • 29.08.13 10:00 - 30.08.13,17:00 Arminius: Maintenance
  • 20.08.13 10:00 - 21.08.13,12:00 OCuLUS: Maintenance
  • 19.06.13 All systems running
  • 18.06.13 10:00 OCuLUS offline until further notice due to network problems
  • 14.06.13 19:15 OCuLUS up and running
  • 11.06.13 11:55 OCuLUS offline until further notice due to a power supply problem
  • 10.06.13 09:30 - 11.06.13,12:00 OCuLUS: Maintenance of the parallel file system
  • 02.05.13 All systems running
  • 30.04.13 Arminius offline due to a OpenCCS upgrade and a maintenance of the file server
  • 27.03.13 All systems running

Newsticker

Systems

  • Noctua - HPC Cluster with FPGA Accelerators
  • OCuLUS - HPC Cluster with GPU Accelerators and Large Shared-Memory Nodes
  • HARP2 - Xeon+FPGA Cluster in the Intel Hardware Accelerator Research Program (restricted access to systems and documentation)
  • Available file systems

Applications and Software Tools

Workload Management Systems

How to Become a PC² User