Recente Storingen

From Cncz
Revision as of 11:38, 19 October 2012 by Erik Joost Visser (talk | contribs) ([Recent Verholpen Storingen en Onderhoud][Recently Resolved Service Interruptions and Maintainance])
Jump to: navigation, search

{{#customtitle:Recent Service Interruptions}}


Current Service Interruptions and Maintenance



Report a problem

Use this form to report less urgent problems. For urgent problems, call 56666 (helpdesk).

Recently Resolved Service Interruptions and Maintainance

Announced downtime: home server "pile" down for reboot

  Begin        : 20121012 07:00
  End          : 20121012 09:00
  Affected     : Users with homedirectory server "pile" (as can be seen on http://DIY.science.ru.nl)

Next Friday morning, the home server "pile" will be rebooted. There are problems with the snapshots, which could make a reboot take more time. Therefore we schedule the reboot for early next Friday.

Peage top-up unit near Huygens restaurant in maintenance

In order to test new software, the Peage top-up unit near the Huygens restaurant was switched to maintenance mode. This unit is not used often yet, therefore this wil not have caused problems. Students that wanted to top-up their Peage account, could do that only elsewhere on campus. See the http://www.ru.nl/peage Peage website], locations are the halls of the Erasmus, Spinoza and Library buildings.

Eduroam incoming doesn't work for iPhone/iPad/iPod

  Begin         : spring 2012 (?)
  End           : 20121005
  Affected      : incoming Eduroam users with an iPhone/iPad/iPod

The UCI network management reports that at this moment the incoming version of Eduroam doesn't work for iPhone/iPad/iPod. A solution is being worked upon. Eduroam incoming means that one uses the wireless network of a remote institute, with authentication (login/password) being checked by RU or Science.

Horde webmail server down because of spam

  Begin        : 20120925 23:05
  End          : 20120926 10:20
  Affected     : Users of horde webmail

Yesterday evening, horde webmail appeared to be misused for sending spam. This could happen because a naive user gave the Science password to spammers. First we stopped horde. This morning we disabled the account of the naive user and restarted horde.

Disk server "Stack" offline

  Begin        : 20120924 06:30
  End          : 20120924 09:35
  Affected     : Users of disk volumes on file server Stack.

Disk server "Plenty" offline

  Begin        : 20120924 06:30
  End          : 20120924 09:00
  Affected     : Users of disk volumes on file server Plenty. The S and T disks that are used in the PC rooms.

During the weekly reboot (monday mornings), the server got stuck in the BIOS.

Announced downtime: home server "pile" down for replacement

  Begin        : 20120724 07:00
  End          : 20120724 09:00 (ca)
  Affected     : Users with homedirectory server "pile" (as can be seen on http://DIY.science.ru.nl)

Next Tuesday morning, the home server "pile" will be replaced by a new, more powerful server. Because the data have been synchronized with the new server, there will not be much downtime.

Postponed: Announced downtime: home server "pile" down for replacement

The downtime below has been postponed, because we had a few questions on the new server, that could not be answered in time. To be continued...

  Begin        : 20120724 07:00
  End          : 20120724 09:00 (ca)
  Affected     : Users with homedirectory server "pile" (as can be seen on http://DIY.science.ru.nl)

Next Tuesday morning, the home server "pile" will be replaced by a new, more powerful server. Because the data have been synchronized with the new server, there will not be much downtime. The new server should be very dependable: hardware RAID-6, double processors and power supplies and a 5-year support contract from the supplier. The performance has improved, e.g. by using hardware RAID with a 1 GB write cache with battery backup.

Partly announced downtime for mailman + horde webmail server

  Begin        : 20120712 09:09
  End          : 20120712 14:00 (ca)
  Affected     : Users of horde webmail and/or mailman mailing lists

This morning, horde webmail appeared to be misused for sending spam. This could happen because naive users gave their Science password to spammers. After we found out who the users were and had them change their password, we decide to also replace a defective cpu fan. Therefore also Mailman mailing lists will be down from 13:00 to 14:00 hours.


SMTP server blacklisted by MS Live Hotmail

  Begin        : 20120711 03:08
  End          : 20120711 14:55
  Affected     : Science mail users trying to send mail to MS-domains: hotmail.com, live.com, ...

This morning, users reported that mail from smtp.science.ru.nl to hotmail users was being bounced by hotmail. We have tried to let the hotmail administrators change this fast, but when this took too long, we changed the IP-number of our smtp-server.

Planned service interruption: file server with problems

  Begin        : 20120622 17:03
  End          : 20120624 19:30
  Affected     : stack fileservices

A hardware failure of a boot disk of the fileserver stack was reported Friday morning June 22. We decided to repair this after working hours. Thus at approximately 17:00 the defective boot disk was removed from the machine and replaced by a spare one. Enabling the disk, making it bootable, restoring file systems and rebooting the machine (after removing all snapshots) took a lot of time. When this was resolved Friday evening, the NFS/SMB fileservice was not active on the mounted filesystems. It took a reboot Sunday evening to resolve all problems.

Tracelab server poly defective

  Begin       : 20120621 14:12
  End         : 20120621 17:15
  Affected    : Tracelab for users. For administrators also Prism&Deploy and the WDS-service

A hardware failure of the server poly was reported at 2012-06-21 14:12. After a restart of the machine, it stopped working again. No more recoveries were attempted and an identical spare machine was outfitted with the disks from the defective server. Disks had to be synchronized before making the machine available again.

Servers without electric power

  Begin       : 20120607 13:45
  End         : 20120607 15:30
  Affected    : e-mail and users of the fileservers bundle, heap and stack

A power failure in a rack in a server room brought some C&CZ servers down. After less than two hours all problems were dealt with. Affected systems ware mainly: postvak (Science mail server), bundle (user homedisk), heap/stack (network discs), resser/kookpunt/brievenbus/rustug (mail transport smtp servers)

Planned Service: website-databases and maybe Linux clients

20 Apr 2012 17:00 - 17:15

A defective hard disc has been replaced in a server, but the server needs to be rebooted to ensure that this is reboot proof. The MySQL database of roughly 70 websites will therefore be down for a short time. Since this server also provides the Kerberos authentication for Linux clients, Linux clients might encounter service interruptions during a short period.

Windows server "plenty" with xpsoftware unavailable

Thursday July 7, around 13.00 hours the server "plenty" could not be reached. Because this server serves the "xpsoftware" share for the Managed Windows PC's, all these PC's had a problem. After the server was restarted and the disks had been checked, it was available again at 14:26.


Downtime Science servers: Sunday July 3, 09:00 - 12:00 hours

In order to improve the cooling of a server room, we plan to move three racks of Science servers a few meters on Sunday morning, July 3. We will have to switch off a lot of servers temporarily. Therefore several services will be unavailable some time starting July 3, 09:00 hours. We expect the downtime will last until 10:00 hours for servers with a lot of different users. The cn compute cluster will probably be fully operational again at 12:00 hours.

The servers/services affected are:

fileservers: plenty/pile/bundle with shares like:
             amsbackup2 bbb-priv botany bsweet comsol exoarchief gi3 hfml-data ifl iris
             lambiek mestrelab mi1/2/3 molchem2 molphtec morph multimedia olsen pcb planthgl
             sdisk share snn2 spmdata1 tdisk tece temp wallpaper xpcursus xpsoftware
potkast: films via Blackboard
ts2: Windows Terminal Server
lilo1: Linux Login Server, alternative: lilo/lilo2
cn compute cluster
horde webmail
License server for: Comsol

With apologies for the inconvenience
C&CZ

Peter van Campen 22 jun 2011 09:57 (UTC)

Network outage June 22, 10:55-11:30

This morning, in the network hub for Huygens South a UPS (battery power supply) went down, which made a set of network switches loose power. Because of this, users in Huygens wing 1 and spin-off companies lost their connection to the network. After bypassing the UPS, everything was up and running again at 11:30. We are still searching for the exact origin of this outage.

New SSH keys for new login servers

The LInux LOgin server lilo has been replaced. The name now points to the new machine lilo2, because that one is faster than the other login server lilo1. Therefore it is quite normal to accept once the new SSH-key.

Planned Service: Limited computer services

12 Feb 2011 7:00 - 11:00

A backup cooling system will be installed in our main computer room. Therefore the air conditioning system must be switched off, which means that most of the computer facilities in this room must be shut down. This includes the cluster nodes cn00 through cn53 and many of the web- and file- (network share) servers. It is advised to expect a very limited service level. We will try to keep all home directories and the mail system available. For detailed information about the impact please contact C&CZ.

Printer lp5

24 Jan 2011 - 11 Mar 2011

Printer lp5 has been moved to HG00.089. You can't use this printer at the moment, there's a problem with the power supply unit.

Fixed phone problem

7 Mrt 2011

You can't reach certain fixed phones at the university right now, mobile phones and Skype do work ok though.

Mailserver blacklisted

4 Feb 2011 9:00 - 12:00

One of our mail servers has been sending loads of spam after a successful phishing attack. Since then, our server has been blacklisted on several domains. Currently this affects the delivery of email to @hotmail and @live addresses.