Easy way to block some bots

If you run a website, you will sooner or later notice that quite a few visits are being made by bots. Some of them are "polite" enough to respect your robots.txt, but some of them aren't. Most often in the latter category you would see vulnerability scanners, odd indexers and SEO spam-bots. Blocking by IP or user-agent name will work for some but not others. For example, SEO spammers will vary user-agent and connect via TOR or proxy services (so for those you may need to block by known "bad" referers).

For static pages it's often doesn't really matter whether it's a bot that accesses it or not, but what if you are concerned about bots accessing something you actually render with your code and you don't want to spend any CPU cycles on processing those in your application? In your script or application you can make a call before processing starts and immediately return for example if you believe that it's a bot. Some frameworks have an easy and clean way to make a check like this. For example, in Mojolicious you could use a hook without changing anything in the actual code of controllers:

use strict;
use warnings;

use Mojolicious::Lite;

hook around_action => sub {
	my ($next, $c, $action, $last) = @_;
	return if ($last and is_bot($c));
	return $next->();
};

...

sub is_bot {
	my $c = shift;
	# Implement the checks here or call a specific module
	# to determine whether it's a bot. If it is, render a
	# response and return true. Otherwise return false.
	
	$c->render(text => 'Bots are not welcomed.', status => 401);
	...
	return 1;
}

In your bot-checking code you could combine checking for specific user-agent strings, IPs or source domains (if you resolve IP), spam-bots referrers, etc. Additionally, you can check if there are any 'Accept' headers for example - a lot of bots don't even bother setting those and only rely on a fake user-agent.

© Do-Know.com