Comprehensive Analysis of Good Bots, and how to Verify and Manage good bots with a comprehensive Bot Automated Traffic Policy
Managing Good Bots
“Today, very few companies have an established policy for automated traffic. Often they are not even aware of any issues surrounding bot traffic. Given bot’s are now around 50% of all the internet it’s hard to ignore the size of the problem and the potential implications of getting this wrong. Yet most people don’t have any idea of what the ‘good’ bots are doing, and why they are targeting you.
What are good Bots?
“Good bots” are best defined as the legitimate bots you actually want crawling your site. Major search engines are top of the list, and many webmasters and SEO specialist spend time ensuring the search bots crawl as often as possible, to maximise the search indexing results. Many major platforms rely on automated bots. For example, SEO tools for identifying site audit, back-linking and anchor links work by crawling your site. Over the past few years, crawlers have been using web content to feed AI generative models. All of these bots are managed by a simple robots.txt file - what could possibly go wrong?
When Good Bots go Bad
As it turns out there are plenty of reasons why just leaving the so called good bots to crawl can be very damaging for your business. Let’s have a quick summary of the dark side of good bots:
- Fake Good Bots: Malicious bots impersonate the good bots and are whitelisted. Once whitelisted, they can run amock with impunity. It’s a good strategy for the bots- no-one wants to block a legitimate search engine, and the search engines will be expected to crawly extensively. Once whitelisted, it’s never looked at again.
- Bad Bot’s Don’t Respect Robots.txt: Robots.txt relies on the good behaviour of bots, it doesn’t enforce anything. The bad bots don’t obey its instructions.
- Adding Do not Follows acts as a signpost to sensitive data: Managing critical paths in robots.txt is like leaving your house keys in the flowerpot next to the door. Sooner or later you will be getting unwelcome visits. Protecting sensitive paths is a critical to maintaining your site security. Just as every house has a weak access point, every endpoint has them too. Adding Do Not Follow commands to robots.txt effectively signposts to the bots where they should be crawling. Don’t do it.
- Crawler Bots, Web data collection feeding Generative AI Models: At VerifiedVisitors we've been warning of the dangers of leaving your site exposed to the crawler bots. Crawlers have been feeding AI models with personal data and website content for years, so the news that OpenAI is facing a class-action lawsuit comes as no surprise. The entire area is fraught with privacy and legal issues around IP and content, and it’s already leading to some massive lawsuits.
- Infrastructure Bots: On the illegitimate side, you can easily see the opportunity for hackers who can target known infrastructure vulnerabilities with infrastructure bots. The bots appear as crawlers and are programmed to quickly and easily find compromised versions and weak tech stacks across the web, which in turn can then be exploited.
Taking Control over the Good Bots:
VerifiedVisitors allows you to take control over all bot traffic, and decide exactly who gets access to your valuable data. VerifiedVisitors provides the discovery to allow you to find out exactly which bots are hitting your site, and more importantly, why. We examine your existing robots.txt file and match up with our Bot database to automate the entire process so you don’t have to dig into weblogs. Once the business understands how their own data is being used, they often are absolutely not OK with it, and actively want to block access.
Once you know the nature of the automated traffic, we provide a recommendation engine that guides you through the best policy per class of bot. You decide on your security policies once.
VerifiedVisitors allows you to apply a security policy that get’s applied according to the actual threats and risks on each of the sites that policy is tied too. As the risk changes, the policy adapts to cover the risk. This is all automated for you - all you need to do is to set the policy once in the command and control console, and VerifiedVisitors does the rest.
VerifiedVisitors has 42 categories of bots in our recommendation engine, to ensure you can manage your good bots, just as you manage your actual human visitors today. We verify and authenticate the bots, match them up to our database, and then give you recommendations on allowing them or not.
Now you have one set of clear policies it’s much easier to manage your security at the policy layer. You have one simple set of clear security standards, that you can update centrally as the risks change over time.
VerifiedVisitors dynamically applies the actual rules automatically at each endpoint. Policy applied. Problem solved.
Bot Discovery
VerifiedVisitors gives you detailed information on each category of bot as you can see in the search engine panel above. The filters at the top allow you to sync with your existing Robots.txt file, so you can pick up all the bots you want to allow or disallow. Instead of relying on robots.txt, VerifiedVisitors will now enforce your instructions, and validate the bots to eliminate fakes or imposters. Additional filters allow you to match the entire database up with bots that have crawled, so you can see at a glance the actual bots you need to take care of. The bot panel also shows the latest crawling activity. You can use the search engine panel to see the last crawled dates, requests made and crawl volume, which can be helpful to see how often your site is indexed.
Bot Recommendation Engine
While customers can fine tune their bot policy to be as granular as they like, most choose to just apply the recommended setttings. This allows us to apply the entire bot verified allow list instantly. Customers can then fine tune the allow list, or indeed at their own custom bots, for services they know are crawling or have developed themselves for internal reasons.
Bot Categories Included in the VerifiedVisitors Bot Database
- Major Search Engine Bots
- Minor Search Engine Bots
- Image & Video Search Engines Bots
- International Search Engines Bots
- Ad Page Checking Bots/ Ad Quality Bots
- Bots for Fraud /Historical Web Indexing
- Accessibility Bots
- Social Media Platform Bots
- Finance Bots
- Web Site Speed Testing Web Bots
- Screenshot Creators/Grabbers content bots
- Internet Metrics & Reach bot agents
- Rich Media Embeds / Framing bot agents
- Team Collaboration Tool bot agents.
- Vendor Security & AI Research bots
- CDN / Caching Bots
- Site Monitoring Bots
- SEO Tools / Ad Content Marketing Bots
- RSS / Feed Bots
- News Aggregation Bots
- Dev Tools for WebMaster Bots
- Social Listening & Brand Reputation Bots
- Affiliate Marketing Bots
- Media & Social Content Analytics / Aggregators
- IP / Brand Protection Bots
- Email services Bots
- Academic / Lexicography / Plagiarism Bot agents
- Penetration Testers & Vulnerability Agent bots
- Jobs & Career Bots
- Influencer & Sales Lead Contact Bots
- Price ScrapersScraping Tools / Scraping as a Service Bots
- Web Site Data Collection Bots
Related Blogs For API Endpoint Security for Bots
What does Ticket Scalping mean?
Understanding Ticket Scalping: A comprehensive Guide
Isabelle Arnfeld
Bot ThreatsPrice Scraping Bots: How to Stop Them Spying on ECOM Sites
Revealing the Secret Undercover Lives of Price Scraping Bots