Recente Storingen

From Cncz
Revision as of 16:54, 17 January 2013 by Petervc (talk | contribs) ([Actuele storingen en Onderhoud][Current Service Interruptions and Maintenance])
Jump to: navigation, search

{{#customtitle:Recent Service Interruptions}}

Current Service Interruptions and Maintenance

New Radius server for ru-wlan and eduroam (wireless)

On Monday, January 28th 2013 at 8:00 pm, one of the servers that is being used by the wireless network of the RU, will be replaced. This replacement will affect you as a user of the wireless networks ru-wlan and eduroam: There will appear a new certificate when connecting. You can just accept this, after which the connection should work. If this appears not to be the case, then it’s best that you remove your old Eduroam- respectively your old RU-WLAN settings first to activate the new connection .

Specifically for iPhone / iPad users: We recommend that you first remove your old Eduroam- respectively your old RU-WLAN profile before activating the new connection without a profile. If that unexpectedly fails, please review the information on for iPhone/iPads. If necessary, you can also download a new profile from that site.

Marcel Kuppens 17 jan 2013 10:53 (CET)

Report a problem

Use this form to report less urgent problems. For urgent problems, call 56666 (helpdesk).

Recently Resolved Service Interruptions and Maintainance

LDAP server vernieuwd

  Date         : 20121214
  Affected     : Users with a Fedora based desktop PC

Older Fedora desktop PC's may experience startup problems after an upgrade of one of our LDAP servers. A fix is available and has been applied. If you still encounter this problem, please contact C&CZ.

Mail problems after supplying password to phishers

  Begin        : 20121116 04:45
  End          : 20121117 ca 12:00
  Affected     : Users of horde webmail and users wanting to send mail to e.g.

Horde webmail again appeared to be misused for sending spam. This could happen because a naive user gave the Science password to phishers/spammers. After first stopping horde, early Friday morning we disabled the account of the naive user and restarted horde. Saturday morning it appeared that this short spam-outbreak had caused administrators of to add our mail server to their blacklist. Therefore we switched the IP-number of this mail server Saturday morning.

Homeserver bundle will be rebooted

  Begin         : 2012-10-24 ~ 12:45
  Einde         : 2012-10-24 ~ 13:00
  Affected      : all FNWI users with a homedirectory on fileserver bunlde

Because the file server refuses to accept a spare disk, it needs a reboot.

Homeserver bundle unavailable

  Begin         : 2012-10-22 12:15
  End           : 2012-10-22 13:00
  Affected      : all FNWI users with a homedirectory on fileserver bunlde

At the moment, we are solving the problem.

Services unavailable due to power and network outage

  Begin         : 20121018 03:00
  End           : 20121018 10:00
  Affected      : all users until 09:30; afterwards: "bundle" home directories, wireless, "plus" network shares and several websites

During the night of wednesday on thursday a power outage resulted in a network outage in the basement computing facilities. The power was restored to the network equipment using a bypass thus circumventing the UPS at about 09:15. Further checks implied that most servers had not become powerless so that most services became automatically available again. Network drivers on "bundle" had to be restarted in order to get access to home directories for a large number of users. Furthermore, several websites had to be restarted which made it possible for PC's to boot properly. During the day, an unrelated issue with the RAID storage of "plus" has been fixed as well granting access to the following network shares: sofie, ams*, molchem, mb*, encapson, milkun4, snn, neuropi, digicd. carta, ... Since wireless devices were unable to acquire IP addresses, i.e. gain access to the network, a split-brain situation was diagnosed within the DHCP service which was resolved around 13:00.

Announced downtime: home server "pile" down for reboot

  Begin        : 20121012 07:00
  End          : 20121012 09:00
  Affected     : Users with homedirectory server "pile" (as can be seen on

Next Friday morning, the home server "pile" will be rebooted. There are problems with the snapshots, which could make a reboot take more time. Therefore we schedule the reboot for early next Friday.

Peage top-up unit near Huygens restaurant in maintenance

In order to test new software, the Peage top-up unit near the Huygens restaurant was switched to maintenance mode. This unit is not used often yet, therefore this wil not have caused problems. Students that wanted to top-up their Peage account, could do that only elsewhere on campus. See the Peage website], locations are the halls of the Erasmus, Spinoza and Library buildings.

Eduroam incoming doesn't work for iPhone/iPad/iPod

  Begin         : spring 2012 (?)
  End           : 20121005
  Affected      : incoming Eduroam users with an iPhone/iPad/iPod

The UCI network management reports that at this moment the incoming version of Eduroam doesn't work for iPhone/iPad/iPod. A solution is being worked upon. Eduroam incoming means that one uses the wireless network of a remote institute, with authentication (login/password) being checked by RU or Science.

Horde webmail server down because of spam

  Begin        : 20120925 23:05
  End          : 20120926 10:20
  Affected     : Users of horde webmail

Yesterday evening, horde webmail appeared to be misused for sending spam. This could happen because a naive user gave the Science password to spammers. First we stopped horde. This morning we disabled the account of the naive user and restarted horde.

Disk server "Stack" offline

  Begin        : 20120924 06:30
  End          : 20120924 09:35
  Affected     : Users of disk volumes on file server Stack.

Disk server "Plenty" offline

  Begin        : 20120924 06:30
  End          : 20120924 09:00
  Affected     : Users of disk volumes on file server Plenty. The S and T disks that are used in the PC rooms.

During the weekly reboot (monday mornings), the server got stuck in the BIOS.

Announced downtime: home server "pile" down for replacement

  Begin        : 20120724 07:00
  End          : 20120724 09:00 (ca)
  Affected     : Users with homedirectory server "pile" (as can be seen on

Next Tuesday morning, the home server "pile" will be replaced by a new, more powerful server. Because the data have been synchronized with the new server, there will not be much downtime.

Postponed: Announced downtime: home server "pile" down for replacement

The downtime below has been postponed, because we had a few questions on the new server, that could not be answered in time. To be continued...

  Begin        : 20120724 07:00
  End          : 20120724 09:00 (ca)
  Affected     : Users with homedirectory server "pile" (as can be seen on

Next Tuesday morning, the home server "pile" will be replaced by a new, more powerful server. Because the data have been synchronized with the new server, there will not be much downtime. The new server should be very dependable: hardware RAID-6, double processors and power supplies and a 5-year support contract from the supplier. The performance has improved, e.g. by using hardware RAID with a 1 GB write cache with battery backup.

Partly announced downtime for mailman + horde webmail server

  Begin        : 20120712 09:09
  End          : 20120712 14:00 (ca)
  Affected     : Users of horde webmail and/or mailman mailing lists

This morning, horde webmail appeared to be misused for sending spam. This could happen because naive users gave their Science password to spammers. After we found out who the users were and had them change their password, we decide to also replace a defective cpu fan. Therefore also Mailman mailing lists will be down from 13:00 to 14:00 hours.

SMTP server blacklisted by MS Live Hotmail

  Begin        : 20120711 03:08
  End          : 20120711 14:55
  Affected     : Science mail users trying to send mail to MS-domains:,, ...

This morning, users reported that mail from to hotmail users was being bounced by hotmail. We have tried to let the hotmail administrators change this fast, but when this took too long, we changed the IP-number of our smtp-server.

Planned service interruption: file server with problems

  Begin        : 20120622 17:03
  End          : 20120624 19:30
  Affected     : stack fileservices

A hardware failure of a boot disk of the fileserver stack was reported Friday morning June 22. We decided to repair this after working hours. Thus at approximately 17:00 the defective boot disk was removed from the machine and replaced by a spare one. Enabling the disk, making it bootable, restoring file systems and rebooting the machine (after removing all snapshots) took a lot of time. When this was resolved Friday evening, the NFS/SMB fileservice was not active on the mounted filesystems. It took a reboot Sunday evening to resolve all problems.

Tracelab server poly defective

  Begin       : 20120621 14:12
  End         : 20120621 17:15
  Affected    : Tracelab for users. For administrators also Prism&Deploy and the WDS-service

A hardware failure of the server poly was reported at 2012-06-21 14:12. After a restart of the machine, it stopped working again. No more recoveries were attempted and an identical spare machine was outfitted with the disks from the defective server. Disks had to be synchronized before making the machine available again.

Servers without electric power

  Begin       : 20120607 13:45
  End         : 20120607 15:30
  Affected    : e-mail and users of the fileservers bundle, heap and stack

A power failure in a rack in a server room brought some C&CZ servers down. After less than two hours all problems were dealt with. Affected systems ware mainly: postvak (Science mail server), bundle (user homedisk), heap/stack (network discs), resser/kookpunt/brievenbus/rustug (mail transport smtp servers)

Planned Service: website-databases and maybe Linux clients

20 Apr 2012 17:00 - 17:15

A defective hard disc has been replaced in a server, but the server needs to be rebooted to ensure that this is reboot proof. The MySQL database of roughly 70 websites will therefore be down for a short time. Since this server also provides the Kerberos authentication for Linux clients, Linux clients might encounter service interruptions during a short period.

Windows server "plenty" with xpsoftware unavailable

Thursday July 7, around 13.00 hours the server "plenty" could not be reached. Because this server serves the "xpsoftware" share for the Managed Windows PC's, all these PC's had a problem. After the server was restarted and the disks had been checked, it was available again at 14:26.

Downtime Science servers: Sunday July 3, 09:00 - 12:00 hours

In order to improve the cooling of a server room, we plan to move three racks of Science servers a few meters on Sunday morning, July 3. We will have to switch off a lot of servers temporarily. Therefore several services will be unavailable some time starting July 3, 09:00 hours. We expect the downtime will last until 10:00 hours for servers with a lot of different users. The cn compute cluster will probably be fully operational again at 12:00 hours.

The servers/services affected are:

fileservers: plenty/pile/bundle with shares like:
             amsbackup2 bbb-priv botany bsweet comsol exoarchief gi3 hfml-data ifl iris
             lambiek mestrelab mi1/2/3 molchem2 molphtec morph multimedia olsen pcb planthgl
             sdisk share snn2 spmdata1 tdisk tece temp wallpaper xpcursus xpsoftware
potkast: films via Blackboard
ts2: Windows Terminal Server
lilo1: Linux Login Server, alternative: lilo/lilo2
cn compute cluster
horde webmail
License server for: Comsol

With apologies for the inconvenience

Peter van Campen 22 jun 2011 09:57 (UTC)

Network outage June 22, 10:55-11:30

This morning, in the network hub for Huygens South a UPS (battery power supply) went down, which made a set of network switches loose power. Because of this, users in Huygens wing 1 and spin-off companies lost their connection to the network. After bypassing the UPS, everything was up and running again at 11:30. We are still searching for the exact origin of this outage.

New SSH keys for new login servers

The LInux LOgin server lilo has been replaced. The name now points to the new machine lilo2, because that one is faster than the other login server lilo1. Therefore it is quite normal to accept once the new SSH-key.

Planned Service: Limited computer services

12 Feb 2011 7:00 - 11:00

A backup cooling system will be installed in our main computer room. Therefore the air conditioning system must be switched off, which means that most of the computer facilities in this room must be shut down. This includes the cluster nodes cn00 through cn53 and many of the web- and file- (network share) servers. It is advised to expect a very limited service level. We will try to keep all home directories and the mail system available. For detailed information about the impact please contact C&CZ.

Printer lp5

24 Jan 2011 - 11 Mar 2011

Printer lp5 has been moved to HG00.089. You can't use this printer at the moment, there's a problem with the power supply unit.

Fixed phone problem

7 Mrt 2011

You can't reach certain fixed phones at the university right now, mobile phones and Skype do work ok though.

Mailserver blacklisted

4 Feb 2011 9:00 - 12:00

One of our mail servers has been sending loads of spam after a successful phishing attack. Since then, our server has been blacklisted on several domains. Currently this affects the delivery of email to @hotmail and @live addresses.