With hipe on useless AI technologies, including ChatGPT or some other inefficient AI models, Lot of malicious companies, thinking they will found El Dorado, try to take profit of all datas they can obtain on the web, without respecting Netiquette conventions.
Suchir Balaji, a whistleblower that described OpenAI illegal reusage of other creator content has been found dead in its US partment, see "Police rules out foul play in death of OpenAI whistleblower Suchir Balaji"
As a result, more and more web service see their traffic grow insanely with most part of these bots, often saturing services like a DDoS. Search Engine crawlers generally limit their request to 1 time every few seconds to avoid overload of web servers, but recent AI bot crawlers scan as fast as possible. ClaudeAI is one of these famous bots. Until recently they at least respected "User Agent" field, that describe the user agent used, allowing web services to select which user agent they want to serve to. But as most services are blocking these chat bots due to their non-respect of crawling conventions, some newcomers to Internet, think that hiding or forging User Agent as a user Browser would be a good solution to still thieve datas to train their models.
As a result, services start to block too much things and Web users start to complain to not be able to visit regular web sites, detected as AI bots, they are blocked. And a fun sideffect, is that ibots train on bader and bader result after every new generation, degrading AI database quality that degrade the content etc....
Here are the methods I used to block an efficient way a large part of badly managed AI bots without blocking end Users.