One of the frontend servers is down

The symptoms you will see – some connections will fail, but if you try again you will eventually get through.

There’s nothing we can do about this until the server is back – the techs in New York are working on it now. We suspect a cable came loose when we shut it down briefly to remove a card, and they need to find the cable and plug it back in tightly!

everything is back running again now

5 minute outages on servers 1 and 2

The other half of the upgrade, we’re installing the new kernel on server 1 and 2. Each should be out for about 5 minutes as it reboots.

All went cleanly, each server was down for a few minutes, but they’re running fine now

server4 very slow for 10 minutes

Due to a programming error, one of the servers was significantly overloaded for 10 minutes causing very slow response times. This has now been fixed.

5 minute outages on servers 3 and 4

Servers 3 and 4 are being rebooted to upgrade to the new 2.6.12.3 kernel. There will be about a 5 minute outage on each one.

done – it took a little while to get s3 ready for reboot, but it was only down a couple of minutes

Server4 being rebooted with new kernel

To help avoid overloads like the one the other day, we’re going to reboot server4 with the new kernel that’s been running on server3 for the past couple of weeks. We’re happy it’s stable now and it offers some better drivers which should make things run more smoothly.

Users on s4 will experience an approximately 5 minute outage.

Bron.

edit: came back up fine and everything’s working again

S4 overload caused short outage

The fallout of the previous problem with mx3 included an insane number of messages being delivered to fastmail staff, all of which are currently on s4. This caused s4 to overload and slow down. Sorry I haven’t been able to post until now, but I was busy writing a script to nuke about 12,000 emails out of the queue on s4 without killing any that were legitimate email rather than just error notifications.

It’s all fixed now – the message storm is over. Fingers crossed I can sleep now!

Bron.

Some incoming mail delayed

One of our servers has been holding on to incoming email rather than delivering it for the past 24 hours, meaning some emails may be delayed by up to a day. They will all reach your accounts soon – but if you were expecting an email and it hasn’t arrived, this may be why.

See this forum thread for all the gory details.

Slow response time

There appears to currently be some issue with our network connection causing slow and intermittent response times. We’re currently investigating…

Update: After working through this with our network provider, we’ve now fixed the issue.

server3 being moved

One of our servers is being moved to another cabinet and will be down for 15-30 minutes or so. Currently this is the server with the least number of users on it so it should be only affecting a small percentage of users. We hope to have it moved and up as soon as possible and will update this ticket ASAP.

Update: The server is now up and running again and all services restored.

Update 2: We’ve had to take the server down again for 10 minutes to fix something we got wrong. It should be back shortly. Sorry about that

Update 3: All services back again and working now

Site down

The entire FastMail site is currently down. The problem appears to be our frontend servers, which are not correctly accepting connections. We’re trying to work out what is going on.

We’ve found a bottleneck in our code and we’ve put a simple cache in place to tide us over for the next few days until we come up with a real solution.

My deepest apologies for the problem – the underlying cause was a long running query I was running. I had hoped it would be finished before the US day started, but it didn’t quite make it. This slowed down other responses to database queries enough that logins didn’t respond fast enough to client programs which then tried to connect again, adding yet another query to the load on the server!

A much more scalable proxy tool has recently become available (just this week!) and I’ll be evaluating it as a very high priority.

Bron.

Follow

Get every new post delivered to your Inbox.

Join 50 other followers