Status (by robm at Fri Oct 30 16:56 UTC)
One IMAP server is down, affecting email access for some users. We’re investigating
Update (by robm at Fri Oct 30 17:09 UTC)
Everything should be running again
Status (by robm at Fri Oct 30 16:56 UTC)
One IMAP server is down, affecting email access for some users. We’re investigating
Update (by robm at Fri Oct 30 17:09 UTC)
Everything should be running again
Status (by robm at Wed Oct 28 21:28 UTC)
Due to a server failure, some services are currently significantly slower than normal (web mostly). We’re working on this.
Update (by robm at Fri Oct 30 00:06 UTC)
Forgot to update this. We had a few problems during the slowness and failing over of services, but everything was restored to full working order after an hour or two.
Status (by robm at Tue Oct 27 01:55 UTC)
Some users will not be able to access their email due to an IMAP server being down. We’re investigating.
Update (by robm at Tue Oct 27 02:27 UTC)
Server restored. All services should be normal again.
Status (by brong at Sun Oct 25 02:00 UTC)
One of the disk units attached to a server has frozen – I’m getting the techs to reboot it before switching services, so a small percentage of users will be offline for about 10-15 minutes.
Update (by brong at Sun Oct 25 02:46 UTC)
Ok – everything is running again
Status (by robm at Thu Oct 22 06:10 UTC)
A BIOS upgrade on our beta server seems to have gone badly wrong, causing the server to not want to reboot. We’re working on trying to fix it, but it’s taking longer than expected, so it’s been down almost 24 hours now, and might be for another 24.
For now, just use the regular server.
Status (by robm at Wed Oct 21 18:06 UTC)
One of our IMAP servers is down, affecting email access for some users. We’re looking into it
Update (by robm at Wed Oct 21 19:12 UTC)
All services have been restored
Status (by robm at Tue Oct 13 22:55 UTC)
The web interface is currently down. We’re investigating.
Update (by robm at Tue Oct 13 23:17 UTC)
Everything should be working again now.
Status (by rjlov at Fri Oct 9 00:16 UTC)
We’re experience a large load spike, and the web interface is currently down. It should be up again in five to ten minutes.
Update (by rjlov at Fri Oct 9 00:32 UTC)
The web interface is back up and running again now.
Status (by brong at Wed Oct 7 21:08 UTC)
A pattern of web connections has caused very high load on our web servers, which has created outages for some users. We’re investigating how the requests are overloading our servers.
Until this is fixed you may see very slow requests through the web interface, and possibly timeouts that redirect you here.
IMAP, POP3 and SMTP are unaffected.
Update (by brong at Wed Oct 7 23:23 UTC)
Everything seems to be working properly. There were a bunch of connections all from the same address range that managed to freeze the backend for 15 seconds, so we’re figuring out what they were doing and making sure it doesn’t happen again! No sign of anything other than a denial of service.
it’s not even slashdot! It looks like power has failed to one of our cabinets, and it’s the one that we can’t afford to lose because it contains both the primary DB server and the firewall machines. Growth has broken our “can lose any cabinet and keep functioning” initial plan!
Techs in New York are investigating for us, so we’ll just have to sit tight and wait for an update at this point.
It’s not even thanks to slashdot either, looks totally unrelated to us being in the press, but it’s awful timing.
I’ll keep you updated as soon as I know anything more,
Bron.
UPDATE (sorry, I’m using the blog directly, so no nice timestamps, but it’s about 1/2 hour later)
All running again. The problem was a failed network switch. We’re investigating why it failed. A reboot fixed it, but unless we can find the underlying cause we’ll probably just replace the whole unit. Other plans for tomorrow, a reshuffling of the cabinet. Some of these high-flying servers will be demoted to the backbench for sure (does that joke work outside Australia? Don’t know if you use the same political terminology as us. Google it if you’re unsure)
Bron ( slightly crazy from sleep dep, but not too bad, it’s not too far after midnight yet )
P.S. everything looks happy except the mx servers. They really didn’t like being without database access for so long, so I’m babying them back to happiness now, hence the slow update. We’ve actually been back up a while.