The file storage server needs a reboot. This should only take a few minutes. Some web services may be temporarily affected.
The file storage server needs a reboot. This should only take a few minutes. Some web services may be temporarily affected.
Ooops. Our mail.messagingengine.com SSL certificate expired. We did actually have an updated certificate, but with the distraction of the server 4 outage, we forgot to restart the services that required the new certificate. We’ve done that now, so anyone having problems should be fine again.
The filestorage subsystem is down temporarily as we upgrade the kernel on the machine where the files are stored.
You will still be able to navigate filestorage, just not upload or download files. Websites also will not work during this time. We estimate it will be offline for about 5 minutes.
Bron.
Update: – sorry for not updating this immediately. We were offline for about 10 minutes total because we had to reboot twice, and our big drive arrays take about 5 minutes to mount!
Everyone now has their emails restored. The mail queues are being cleared. I wil post an update when we have an ETA on email delivery being complete.
Update: Sorry, didn’t update this. Mail queues were cleared about 5 hours after this began. All users and services have been restored. We’ve put together a Server 4 outage FAQ to explain what happened and what we’re doing in the future to avoid it happening again.
90% of users are now restored. We have suspended further restores briefly, in order to allow the mail queue to clear.
Update: The mail queue is clearing quickly, and the remaining restores have been resumed.
The restores have been going slower than they did in our benchmarking, and we’ve worked out why. The problem is that running normal email delivery at the same time as restoring users slows the process down enormously. So, for the next 12 hours (at least) we are going to suspend deliveries of queued emails to server4.
Also, we’re going to split the load a bit more by restoring users to 2 additional servers. In 12 hours we’ll review the status of the restorations and decide when email delivery can be safely restored to server4.
server4 users are being restored and are coming back online. We should have an updated estimate for all users to be restored shortly. If you find you can login, then your email has been restored already, and you can go ahead and use your account in any way that you need to.
Email sent in the last day or so may not turn up right away – the mail queues will take a while to empty. We’ll update this blog when the queues are clear.
We’ve been working hard trying to get server4 back up. The good news is that you won’t lose any mail. New mail is being queued, and old mail is safe. The bad news is that the file system is corrupted. We have to restore from backups onto a fresh file system.
The first user restores will be complete in around 10 hours. All users should be restored around 45 hours after that, although we’ll restore the smaller mailboxes first, such that 75% of users will have their mail restored after around 6 hours from the start of the process.
One of our servers (server4) will be down for the next hour and a half, to resolve a problem with one of its disks. This disk has been in a bad way today, so people on this server will have noticed slowness.
Sorry about the outage – we’ll keep you updated on progress.
One of our IMAP servers is down. We are investigating.
Update: The machine is back up and everything is running fine again