Verizon email delivery issues

Around December 24-25 we had some issues with email being delivered from Verizon to FastMail.

What was happening was that some spammers out there were generating a large amount of spam and sending it to a number of different services, but they were forging the “from” address to be from some of our domains (eg from address with the domains @fastmail.fm, @eml.cc, etc)

Now when sending email to most systems, they will either accept the email for known users or immediately reject it for unknown users. However there is another approach, where a service will always accept the email, and only after it’s accepted it will it check if the recipient is valid. If it finds the recipient is invalid, it’ll then generate a bounce message and send it to the from address of the original message.

The problem with this approach is because spammers forge the from address, the bounce email is then sent to an innocent and completely unrelated third party, in this case, us. This is called backscatter/outscatter. Because of this problem, it’s regarded as poor practice to configure your email servers that way, you should be rejecting email to unknown users immediately at the receiving stage. It means Verizon haven’t configured their email servers very well.

Now because the spammers forge lots of random from addresses, when the backscatter from Verizon comes back to us, it looks like Verizon is actually trying to deliver email to lots of random addresses at our server, which is very much what a dictionary harvest attack looks like.

So this is exactly what happened with Verizon. A spammer sent them lots of emails, that they accepted, but then generated bounce emails for most of them because most of the recipients were invalid, which they tried to send to us, but we thought they were attacking us, so we blacklisted their servers.

Now normally we have mechanisms in place to try and stop this blacklisting happening for known legitimate email service providers, but unfortunately in this case, Verizon have a slightly odd naming convention for their outgoing email servers, so it wasn’t stopped.

When we worked all this out around Dec 25, we added some extra rules to permanently whitelist Verizons outgoing email servers so that this shouldn’t happen again.

One of our IMAP servers is rebooting

Status (by brong at Thu Dec 25 21:59 UTC)
imap4 had a kernel issue which turned all filesystems read-only. I haven’t failed to replicas yet, but will in about 10 minutes if it’s not up and running nicely again.

Merry Christmas from the randomness of hardware…

Update (by brong at Thu Dec 25 22:17 UTC)
The problem has been narrowed down to one of the two attached drive units being offline. I’ve put in a ticket with NYI to check its power and reboot it if necessary. I can’t access it via the network either.

Update (by brong at Fri Dec 26 00:26 UTC)
We have been unable to recover the drive unit – not entirely sure what’s going on, and rather than try to fix it while everyone is in holiday mode, I have configured new replicas for all the affected slots and started them duplicating all users to the new replicas.

In a few hours, we will be fully replicated again, with no references to the failed unit. Thankfully we have enough standby capacity (empty slots on other machines) to do this with minimal fuss. I’m really happy about the forward planning that gives us this sort of spare capacity!

Copy and Delete broken briefly

Status (by brong at Mon Dec 8 02:46 UTC)
A bug fix in one piece of Cyrus code exposed a bug in another part which caused crashed when copying messages.

I’ve rolled out the fix now, but over the past couple of hours users may have noticed copies failing.

Bron.

Emails delivered for some users with older timestamp

Status (by robm at Thu Dec 4 11:47 UTC)
We just upgraded one of our servers. Unfortunately in the upgrade process the clock on the server was set back 6 hours. We didn’t noticed this immediately, and a number of emails were delivered to users mailboxes. Because the web interface and most email software display the "received" time of an email, and because the received time comes from the system clock, this means that emails delivered for users on the affected server would appear to actually have been delivered 6 hours earlier (eg an email actually delivered at 5AM EST would appear in the web interface and email software to have been delivered at 11PM EST the day before).

Unfortunately this is not easy for us to fix up. Fortunately it only affected a small number of users, and for the affected users, it only moved emails a few hours back, so they shouldn’t have been lost in a mailbox.

Follow

Get every new post delivered to your Inbox.

Join 50 other followers