Protecting your Rails App from Spam & Bots

Written by Ari Summer·
·Last updated July 16, 2023

These days, it’s common for SaaS products to offer free trials or free tiers without requiring a credit card up front. This can be great for conversions and low-touch sales, but comes with the cost of increased exposure to spam, unwanted signups, and platform abuse.

There’s no one-size-fits-all solution for fighting spam, and the details will often depend on your application and features. However, over the years I’ve learned that there are some general strategies you can use to reduce your exposure. Spammers will go after easy targets and the harder you make it for them, the better.

To prevent spam effectively, you’ll need:

  1. Observability through monitoring
  2. The ability to block spam manually and/or automatically

Preventing spam can be an evolving game of whack-a-mole and what you learn from observing, will drive what and how you block certain behaviors as well as what you monitor.

Fighting Spam Flow Diagram

Monitoring & Observing

The first step in preventing spam is identifying spam. Not all spam is immediately evident and having the systems in place to monitor behavior and identify patterns will make it much easier to react when it happens. Let’s start with logging!

Logging

Logs are often an integral part to an observable system and it’s no different here. You’ll want to add metadata, such as IP address, geographic location (if possible), user identifier (if the user is signed in), user agent, and referrer, to every log statement. Paired with the ability to filter and query logs, you’ll be able to investigate behavior based on these attributes. For example, you could answer the following questions:

  • Are spammers originating from certain geographic locations?
  • Is spam coming from the same IP addresses?
  • Is spam coming from a specific referrer? Maybe there’s an ad campaign that is driving spam traffic to the site.
  • What actions do spammers take? Are there actions that stand out compared to normal users?
  • Are there commonalities in request headers, such as the user agent?
  • If a form submission is involved, how quickly do they submit the form after first visiting the page? If it’s very fast, it could be a bot.

Custom Metrics

The ability to track and record custom metrics or events can be really helpful. You can use something like Cloudwatch Metrics Filters to turn log data into metrics or statsd to track arbitrary statistics. As an example, if you’re using rack-attack to block or throttle requests, you can subscribe to events and monitor them by writing to a log with metric filters. The same goes for other preventative measures - you should log and record each event.

Once you have the metrics in your monitoring system, you can create graphs, dashboards, and alerts. With graphs and dashboards in place, you can quickly see if something deserves your attention. For example, let’s hypothetically say you’re throttling signup form submissions based on IP address, graphing each throttle event, and the graph looks like this:

Signup Form Throttle Events

We can immediately see that at the top of every hour, we’re throttling 100 signup events. In this hypothetical example, we could have a bot that’s attempting to create a bunch of users via our signup form at the top of every hour. Now that we are aware, we can inspect our logs to see if we can gather any information about the requests being made. Using this information we can add more protection or adjust our throttling limits. For example, if the same IP address is being used, we could block that IP with WAF rules.

Alerts

Obviously, you won’t be able to stare at dashboards and graphs all day. Setting up alerts such that you’re notified (via slack, email, text message, etc.) of suspicious or abnormal activity will allow you to act quickly before the problem gets worse. Using the signup form example from above, you could create an alert that gets triggered when there are more than N number of throttle events.

Data Analysis

Whether it’s an admin dashboard or SQL client, you’ll want the ability to analyze user-generated content in your system. Spam from user-generated content often has patterns such as the same link or text being used over and over again. Armed with this knowledge, you can suspend users, remove the harmful content, and put some systems (blocking, monitoring, etc.) in place to prevent it from happening again.

Blocking Spam

As you learn and gain more insight over time, your methods for blocking spam and abusive behavior will evolve. However, there are some standard techniques that will come in handy.

Suspending Users

If you have spam users or bots signing up for user accounts, you’ll want the ability to block account access. In order to achieve this, I’d suggest adding the ability to “suspend” accounts such that they still exist in your system/database but are inaccessible (i.e. can’t be used to access the site or sign in while they are in a suspended state).

There are some advantages to this approach over hard deleting accounts. First, it’s less destructive and if you make a mistake by suspending a legitimate user, you can always unsuspend them, allowing them to regain access to their account and data. Second, it’s common for email addresses to be used as unique usernames. If you delete an account, the email for that account could be reused to create another account. By suspending an account, you’re preventing them from reusing that email address to create another account. Email addresses are relatively cheap, but you’re still making it harder for spammers, because you’re requiring that they use a new email address every time an account gets suspended.

Email Domain Validation

Most SaaS businesses require an email to sign up. In this case, you can run some validations on email domains to help prevent spam. Spam users or bots will often use disposable email addresses because they’re easy and cheap to create. It can be a good idea to block disposable email domains. There are public lists and services out there that will help you identify disposable email domains.

Another useful strategy is validating that the domain has a valid MX record. By validating that the domain has an MX record, you’re ensuring that the user has taken the time to configure email for the domain. You’re also improving your email deliverability because sending to a domain without an MX record will inevitably result in a bounce.

Along the same lines, you can require users to confirm their account via an email before they’re able to access the site. The downside of this approach is that it adds some friction to the signup process for legitimate users and therefore needs to be considered carefully.

Honeypots

Honeypots can be an effective way to protect against spambots. Honeypots consist of a hidden field in your forms that won’t be apparent to normal users. Bots, however, have difficulty distinguishing between the hidden, honeypot fields and normal fields. If the hidden field is filled out, it’s likely a bot and you can reject the submission. When a submission is rejected, it’s best to return a successful status code to trick the bot into thinking the submission was successful. If you’re using Ruby on Rails, the invisible_captcha gem makes it easy to add honeypots to your forms. It also gives you the ability to time form submissions which can be another great way to identify bots (see next section).

Time-Sensitive Form Submissions

Bots will often submit a form faster than is humanly possible. Using this to your advantage, you can time form submissions and reject the submission if it’s faster than some time threshold (like 4 seconds). A simple implementation of this could involve storing the current timestamp in the user’s session on page load. When a form is submitted, you can compute the difference between the current time and the timestamp stored in the session to determine how quickly it took to submit the form. If the time difference is less than the time you’d expect a human to fill out the form, the submission can be rejected.

Throttling & Limiting Email Sending Actions

Invites, notifications, password resets, email confirmations - all of these things typically involve sending email. Actions that result in sending email are prime targets for spammers. A malicious actor can abuse these forms to send large volumes of email on your dime. If you’re including user generated content in the emails, they can also use this to their advantage to spread spammy content via your service, tarnishing your domain reputation and deliverability. With this in mind, it’s often a good idea to add throttles or limits to some forms.

As an example, it’s common to provide a way for users to reset forgotten passwords. This often involves a publicly available form that accepts an email address. If the submitted email address exists in the system, an email is sent to the address with a reset password link. Without throttles or limits in place, this could be abused by malicious actors to send large amounts of email. You can see an example of throttling being done to prevent this scenario in Gitlab source code.

Most of the time, spammers are trying to use your service as a vehicle for distributing spam, often in the form of links to malicious or abusive content. Preventing links in user-generated content that shouldn’t contain links (username, first name, last name, etc.) will go a long way. This is especially true if you’re sending emails with this content or displaying it publicly. For example, it’s common to use first names in email salutations, making it easy to exploit.

Hi #{user.first_name},

reCAPTCHA

I personally don’t have a ton of experience with this solution, but there are a lot of sites that use Google reCAPTCHA to identify and block bots. reCAPTCHA v3 provides a mechanism for scoring interactions on your website. The score represents how likely it is that the traffic originates from an actual human. If interactions have a low score (likely a bot), you can decide to take some action to thwart the bots efforts.

reCAPTHA v2 provides the challenges that you’ll often see on websites, consisting of “I’m not a robot” checkboxes and “Select all images with…” puzzles. As I think we’ve all experienced, these puzzles can be quite frustrating for real users, so you’ll want to weigh this against the potential benefits a strategy like this may provide.

Web Application Firewall (WAF)

WAFs, such as Cloudflare WAF or AWS WAF, give you the ability to prevent traffic from reaching your site based on rules that you define. Rules can match on attributes of the request, such as IP address, user agent, or geolocation. WAF providers will often have managed rulesets that you can use to protect against common vulnerabilities, spambots, and other unwanted traffic. WAFs are an excellent protection mechanism and allow you to act quickly against attacks.

Wrapping Up

I hope some of these techniques are helpful in your fight against spam. If you’ve also dealt with spam, I’d love to hear from you and learn about the strategies you’ve used!