Problem Send/Delete/Move (any button click)

There appeared to be a problem earlier today with POST type web requests timing out. A POST request is used whenever you click a button in the web-interface. A small number of users, all on LAN or DSL connections, in completely different parts of the world were affected by this.

We’re not too sure what caused this problem, but it seemed to be low-level network issue related to MTU values in the TCP/IP protocol. We have not yet determined whether the problem occured in our network, or on an external Internet router.

Update:Our data centre has found the problem, and reolved it. It was due to an MTU problem at a peering point.

Problem with .biz domains

Currently the .biz administrator has a problem with their records for reallyfast.biz and veryfast.biz that is causing this domains to fail in some areas of the internet. We regret the inconvenience this is causing customers using these domains – unfortunately there is nothing we can do until the .biz administrator fixes the problem. We have contacted them and are awaiting a reply.

Technical details: reallyfast.biz and veryfast.biz are currently showing
incorrect NS info on c.gtld.biz through f.gtld.biz:
—-
reallyfast.biz. 900 IN NS NS1.MESSAGINGENGINE.COM.
reallyfast.biz. 900 IN NS NS1.ZONEEDIT.COM.
reallyfast.biz. 900 IN NS NS2.MESSAGINGENGINE.COM.
reallyfast.biz. 900 IN NS NS5.ZONEEDIT.COM.
—-

Only a.gtld.biz and b.gtld.biz are currently showing the correct info:
—-
reallyfast.biz. 900 IN NS NS1.MESSAGINGENGINE.COM.
reallyfast.biz. 900 IN NS NS2.MESSAGINGENGINE.COM.

We have asked ZoneEdit to add appropriate records in their name servers to allow the domains to resolve. We are awaiting a reply.

Update: ZoneEdit have very kindly created the necessary records, allowing these names to resolve correctly.

Update on recent outages

On the night of August the 19th, FastMail.FM went down due to corruption of the main database. The primary backup of the main database is a realtime replica. This can provide near instantaneous recovery from errors caused by hardware failure, and many types of software failure. Unfortunately, in this case the corruption was also replicated to the primary backup.

As a result, we had to recover from the secondary backup, which turned out to be a very lengthy process, causing FastMail.FM to be down all night. We also found that the secondary backup had consistency problems in the addresses, and billing, tables. This meant that we had to use an outdated addresses table until the consistency problems were resolved, such that addresses that you had recently added to your address book were not available for a couple of days. Furthermore, all billing related services were down for this time (such as renewals, upgrades, and signups). Because so much mail was queued up on our secondary US and European mail servers overnight, the mail queue took much of the day to clear, resulting in mail delivery delays.

All of these problems are now fully resolved, with the exception that some groups stored in some users’ address books are missing some addresses. We are currently in the process of e-mailing the small number of users affected by this.

After four years of operation with no extended, unscheduled outages of this type, this week’s power outage and database corruption problems are most disappointing. We understand the inconvenience that this caused many of you, and we are doing everything that we can to ensure that it does not happen again. We have already instituted a new secondary backup policy of the database, which involves taking a complete nightly backup, and storing it on our backup server in Texas, which will result in much simpler and faster restores then using incremental backups to tape. If we ever have to restore from secondary backup again, we would not expect it to take more than one hour.

All queues cleared

All mail that was queued up by the backup servers during the recent outage has been delivered.

‘Server down’ screen

Some people are reporting that the server still appears to be down when it is running fine. This sometimes occurs when proxies incorrectly cache the ‘Server down’ page.

To fix this, go to www.fastmail.fm, then hold down the Ctrl key and press F5 to force a refresh of this page

Options screen restored

Most services on the Options screen are now available. Payment related services will not be available for another 24 hours.

Mail queues are being processed

Please note that all mail that was received by the backup servers during the outage is still in the process of being delivered to your mailbox. It is currently not known how long it will take for all queues to clear. Once all mail has been delivered we will update this page.

IMAP/POP/SMTP online

IMAP, POP, and SMTP access are now available, so you can access your email using your favourite email client.

Web interface online

You can now access your email through http://www.fastmail.fm . We have disabled signups, payments, and the options screen for 24 hours to ensure smooth operation of the system.

Once the server has settled down, we’ll bring up IMAP/POP (in around 10-15 minutes time), and then once the server recovers from all the IMAP clients automatically logging in at the same time, we’ll start clearing the email queues and delivering queued email.

Your Address Book will be out of date, without your newer addresses, until we update it in the next 24 hours.

DB reload complete

We have now completed reloading the database. Everything ran smoothly, except for the Addresses table, which contains your saved e-mail Addresses – recently saved addresses were not recovered. Therefore we will need to load these from the secondary backups – this will take around 24 hours, so in the meantime we will bring everything back up, and everything should work just fine except that you will find your address book missing some entries (which should be back tomorrow).

So, we are now running through our test procedures, and after completing those and a complete snapshot of our database, we will bring services back online. This will take around 90 minutes. It will take some hours for all mail queues to clear after that, since a lot of mail has queued up on our backup servers.

Follow

Get every new post delivered to your Inbox.

Join 3,975 other followers