productArticle08 Dec, 2024

Microsoft's disgraceful implementation of its corporate email security scanning tool causes endless headaches

Microsoft Office 365 ATP Safe Links feature performs excessive link scanning that results in massive, misleading bot traffic, significantly skewing site analytics and email marketing metrics, and imposing unnecessary costs on users.

In the last article on email delivery, I talked about how ALL links in the email were being clicked pretty much at the same second.

Upon various checking, this is a Microsoft feature called Office 365 Advanced Threat Protection (ATP) Safe Links, which scans and loads all links to test for scam and malware.

Of course, who can blame organizations and Microsoft from wanting to protect their users from endless shitty malicious threats on the Internet.

I didn't blame Microsoft from wanting to protect their users. Protecting users is a good thing and security is everyone's concern.

But now I blame Microsoft for the worst implementation of this feature ever.

Some braindead engineering team at Microsoft responsible for this product has decided to scan EACH and every email for EACH and every link. And the scanning is done by fully loading the link up automatically in a browser, resulting in a full site load - repeatedly, thousands of times.

Lets say the Network Operator has sent just 1,000 mass emails to a bunch of corporate customers with this security feature enabled. Lets say each mail has 15 links pointing to various posts from the Network site.

Obviously each mail is the same, they all have exactly the same 15 links.

The security analyzer will load the site fully in a browser 15,000 times in a span of say 50 seconds, ie 300 full site loads per SECOND.

Let that sink in. I'm repeating this for emphasis:

15,000 full site loads with 300 loads per second - and all this is bogus, fake, bot traffic.

Imagine one is sending 20,000 mass mails - there is nothing extraordinary about this number. Many services around the world have 10s of thousands of subscribers, and some have millions. So conservatively say 20,000 mails are being sent out. Say it takes 600 secs (ie full 10 mins) to be sent.

Microsoft's UTTERLY CRAPPY SECURITY TOOL will load the full site 300,000 times with 500 full site loads per second - and all this is bogus, fake, bot traffic.

And it doesn't stop here - even after relentlessly loading the site zillions of times when mails have been received by the security tool, there seems to be still more attempts to load the site at various times later.

So what's the impact of Microsoft's disgraceful implementation?

Incorrect site analytics

Since this is all bogus, fake and bot traffic, it completely skews all site analytics. And since this happens right after mass emails are sent out, site owners will look at their site stats (on Google Analytics or Control Panel or wherever) and be extremely pleased that their email sending resulting in such a lot of traffic. Nothing can be further from the truth.

Misleading email marketing metrics

The third and most critical impact is on the email marketing metrics - open rate, click-through rate, click-to-open rate, etc. Since Microsoft in all its stupidity decided to open every email and click on every link - all these stats screw up the email metrics and make the picture look rosier than it is. If metrics are incorrect, then all decisions based on the metrics are also incorrect.

Extreme spike load on servers

Imagine the impact of this massive load on the servers for this utterly unnecessary and needless site-loading zillions of times. I can well imagine normal CMS (eg Wordpress) sites on any standard hosting infrastructure (eg Hostgator, Godaddy, Hostinger, Namecheap etc etc) just dying from the load for 15 mins. This is malicious bot like activity, bordering on a DDOS attack - this itself is criminal on the part of Microsoft.

Cost to serve the site

Finally: bandwidth cost to serve all these needless site loads. If one has a JS heavy site, all the site assets need to be served before the site comes up in the browser. If mass mails are send out every week to the audience, all this fake traffic may have a direct impact on costs.

Calisthenics to mitigate Microsoft's odious implementation

Given the security tool's behavior, we've decided to treat it like an Internet Asshole, rather than a decent internet neighbour. So we have put in place strategies for each impact:

Extreme spike load on servers - somewhat mitigated by blocking API loads at the very edge. The load of the front-end site itself has not been blocked, which is the next step.

Incorrect site analytics - automatically solved with above. Since the site doesn't proceed very far, analytics are not triggered.

Misleading email marketing metrics - we are detecting these fake opens and clicks and removing such tracking results from the metrics - this leads to more accurate results.

Here's a recent campaign comparison:

	Raw metrics	Spurious removed
Sent	345	345
Opens	297 (86%)	107 (31%)
At least 1 click	26	10
Total clicks	165	33

Certainly far more believable!

But I'm not 100% sure that everything will work hunky dory - the impact of treating MS like a bot may well screw up something else altogether. Perhaps the Microsoft email system will stop delivering emails to recipients altogether!

PS1: The correct method instead of Microsoft's shameful and FUBAR implementation

This could have been done elegantly by Microsoft - once the 1st email has been received by the Microsoft system, and the 15 links have been loaded and found to be non malicious or whatever floats their boat, then the status of these 15 links should be cached and when the same link is seen again (and again and again), there is no need to do the entire rigmarole again.

This is basic engineering 101. So fuck you Microsoft. Is it any wonder, that companies like Microsoft are so roundly despised?

PS2: Email "open" metrics are the worst

The only silver lining to Microsoft's idiotic implementation is that click counts affect only those organizations which are 1. using Microsoft 365 as their email system, AND 2. have turned on Office 365 Advanced Threat Protection (ATP) Safe Links. Of course, that's a LOT of organizations, but individuals (eg outlook.com) users are not affected.

But email "open" metrics are another story where both Microsoft and Google are equal opportunity abusers. Microsoft and Google bots sometimes "open" mails, and this will again lead to misleading email marketing metrics.

As a digression, email newsletter platforms like Beehiv, Substack etc all know of this issue and choose to look the other way. Obviously! If newsletter platforms show that your REAL open rate is low, you would be discouraged and stop using this channel - which means a loss of business & revenue for these platforms.

Email sending platforms like Mailchimp, Brevo, Active Campaign / Postmark are somewhat better. Depending on the plan, they may allow one to filter out bot traffic.

Few years ago, I had a discussion with Postmark support on this issue when I was evaluating them:

Q: One of the things Ive found is that any email sent to gmail instantly triggers an open when they do whatever jugglery with images. However this is a spurious open since its not been actually opened by a hooman. Q1 - do you take care of this? Q2 - how then do you distinguish between this spurious open and a real open if it happens nearly simultaneously?

Response from Postmark:

Hi Amit,

Thanks for getting in touch!

As you correctly noted, these are false opens. After Gmail/G Suite accepts a message for delivery, they're doing something in their filtering process which triggers our tracking pixel to register the email as opened. Most likely they're either scanning the message for spam filtering or they're uploading our tracking pixel to their CDN for faster email loading.

The challenge is that the list of IPs that could cause this changes over time. Because of this, we don't have something in place to tackle this right now, though we have a feature request on our end to try and detect this behavior.

I'll also mention filtering true opens is hard, because opens as-a-whole is sometimes an unreliable metric. We already know that there are tons of scenarios where opens happen that can't be detected, and there are plenty of situations even outside of Gmail where bots, spam filters, apps, etc. fetch open tracking images in a way we can't easily detect. That means it's already expected you'll need to take engagement data with a grain of salt.

If you're concerned about behaviour like this, the best defence I know against accidentally counting them is to consider excluding opens that occur immediately after sending.

If you have any other questions around this, please let me know!