File storage needs reboot…

The file storage server needs a reboot. This should only take a few minutes. Some web services may be temporarily affected.

Expired certificate

Ooops. Our mail.messagingengine.com SSL certificate expired. We did actually have an updated certificate, but with the distraction of the server 4 outage, we forgot to restart the services that required the new certificate. We’ve done that now, so anyone having problems should be fine again.

Filestorage down

The filestorage subsystem is down temporarily as we upgrade the kernel on the machine where the files are stored.

You will still be able to navigate filestorage, just not upload or download files. Websites also will not work during this time. We estimate it will be offline for about 5 minutes.

Bron.

Update: – sorry for not updating this immediately. We were offline for about 10 minutes total because we had to reboot twice, and our big drive arrays take about 5 minutes to mount!

Restores complete

Everyone now has their emails restored. The mail queues are being cleared. I wil post an update when we have an ETA on email delivery being complete.

Update: Sorry, didn’t update this. Mail queues were cleared about 5 hours after this began. All users and services have been restored. We’ve put together a Server 4 outage FAQ to explain what happened and what we’re doing in the future to avoid it happening again.

Mail delivery

90% of users are now restored. We have suspended further restores briefly, in order to allow the mail queue to clear.

Update: The mail queue is clearing quickly, and the remaining restores have been resumed.

Mail delivery to s4 suspended during restores

The restores have been going slower than they did in our benchmarking, and we’ve worked out why. The problem is that running normal email delivery at the same time as restoring users slows the process down enormously. So, for the next 12 hours (at least) we are going to suspend deliveries of queued emails to server4.

Also, we’re going to split the load a bit more by restoring users to 2 additional servers. In 12 hours we’ll review the status of the restorations and decide when email delivery can be safely restored to server4.

server4 users coming back online

server4 users are being restored and are coming back online. We should have an updated estimate for all users to be restored shortly. If you find you can login, then your email has been restored already, and you can go ahead and use your account in any way that you need to.

Email sent in the last day or so may not turn up right away – the mail queues will take a while to empty. We’ll update this blog when the queues are clear.

Update on server4 outage

We’ve been working hard trying to get server4 back up. The good news is that you won’t lose any mail. New mail is being queued, and old mail is safe. The bad news is that the file system is corrupted. We have to restore from backups onto a fresh file system.

The first user restores will be complete in around 10 hours. All users should be restored around 45 hours after that, although we’ll restore the smaller mailboxes first, such that 75% of users will have their mail restored after around 6 hours from the start of the process.

server4 outage

One of our servers (server4) will be down for the next hour and a half, to resolve a problem with one of its disks. This disk has been in a bad way today, so people on this server will have noticed slowness.

Sorry about the outage – we’ll keep you updated on progress.

Server down

One of our IMAP servers is down. We are investigating.

Update: The machine is back up and everything is running fine again