Engineering

Beating Spam Detection Bots (Again)

December 5, 2021

Written byShil Sinha

Beating Spam Detection Bots (Again)

When we last blogged about our issues with spam detection bots four years ago, we talked about our success in reducing the number of false meeting confirmations received by our customers. Our solution then was to have the links in our embedded email applications point to an API that serves a static HTML page. The static HTML page would run Javascript, which then made a request to the actual email app API.

And yet in 2021, spam detection bots continue to impact one of our core business services. To recap the issue, Mixmax provides the ability to embed interactive applications like polls, surveys, and calendars within emails. These applications either include a <form> element or buttons with links that encode a particular response and point to Mixmax APIs. And as these URLs are present in the email’s body, they’re often visited by email scanning software. If it’s not identified as a bot, the scanning software’s visit results in non-legitimate responses recorded for our email apps.

We knew back in 2017 that our approach to our spam detection bot solution wasn’t perfect. But now, we’ve seen a recent surge in reports of false meeting confirmations, indicating that some bots do load HTML and run Javascript (and therefore work around our 2017 solution). We needed another, more robust way to ignore bot traffic. For our fifth post in our Mixmax Advent series, we’ll share our newest idea for dealing with spam detection bots.

A New Approach

To get a better sense of the kind of traffic we’d need to filter out, we took a closer look at requests we knew to be bot traffic based on customer reports. It turned out that the vast majority of these requests originated from EC2 instances (the call was coming from inside the house!).

A common strategy when dealing with bot traffic is to use a WAF, or a web application firewall, to filter out unwanted requests to the API serving load balancers. Conveniently for us, Amazon’s WAF service provides a rule that matches traffic originating from cloud hosting providers and covers EC2 instances. Inconveniently for us, the APIs we were trying to protect were also used by other services. We realized that enabling the aforementioned WAF rule would also block our internal requests. 

To get around this, we added a second load balancer to serve all requests originating from within our VPC. We could then enable the WAF rule for only the external-facing load balancer, leaving internal traffic unaffected. 

 

 

After a few infrastructure changes using Terraform and a few ECS service re-deploys, we were ready to enable the WAF for our newly configured external-facing load balancer. At first, our approach appeared to be successful until we started getting reports of recipients unable to respond to emailapps.mixmax.com. After disabling the WAF, we returned back to the drawing board. Again.

Next Steps

Another common strategy of blocking unwanted traffic is to use a CAPTCHA. Unfortunately, CAPTCHAs are often as frustrating for humans as they are for robots, and would certainly have a negative impact on our customers’ calendaring. 

 

For the last time, I’M NOT A ROBOT

 

Alternatively, we considered using a confirmation page with a button that would make an AJAX request to the emailapp’s API when clicked. While this wouldn’t be as effective as a CAPTCHA at blocking bot traffic, it would be easier to implement (and not as annoying for our human users). But we knew that the addition of a confirmation page would still add significant friction to what is otherwise a one-click process to book a meeting, especially since most requests don’t appear to come from bots. Instead, we decided that our ideal solution should detect probable bot traffic first, and then only serve the confirmation page in those cases. 

Fortunately, WAFs are not limited to blocking requests that only match a given set of conditions, and they can also modify the request before passing it along to the load balancer. In our case, this means that we could configure the WAF to add a header to all requests suspected of originating from bots and update our APIs to only return the confirmation page for requests with the header. 

While we’ve yet to deploy these changes to production, we’re optimistic about the approach. Delegating the decision to block or allow a request to our APIs gives us more flexibility than relying entirely on a WAF, and the confirmation page can be iterated on to make the experience easier for people, and harder for robots. (However, check back in with us in four years.)

Interested in working on problems like these? Check out our careers page!

 

Get Mixmax