logo for basementcommunity

basement
community

search

wall of shame

general cloudflare is offering a tool to block AI scrapers

joined dec 4, 2022

avatar

joined dec 4, 2022

and it's not using the robots.txt file.

Cloudflare is a tool, mostly used as a CDN for your website, but also has a lot of other cool shit like email forwarding, firewalls, IP blocking, etc, etc. They have a free tier (which is what basement community uses) and they just released a AI blocker for sites. @starbreaker might be interested in this because I know you're using robots.txt to block unwanted traffic.

Does anyone use cloudflare here? I know they had some ethical concerns the past few years because of sites they allowed on their platform like Kiwifarms, but it's still tough to find anyone who beats Cloudflare at what they do

We see website operators completely block access to these AI crawlers using robots.txt. However, these blocks are reliant on the bot operator respecting robots.txt and adhering to RFC9309 (ensuring variations on user against all match the product token) to honestly identify who they are when they visit an Internet property, but user agents are trivial for bot operators to change. \
We leverage Cloudflare global signals to calculate our Bot Score, which for AI bots like the one above, reflects that we correctly identify and score them as a “likely bot.”

https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click

posted 7/7/2024, 4:45 pm

joined sep 22, 2023

avatar

lives in a pineapple under the sea

joined sep 22, 2023

Im not crazy about Cloudflare, since sites with cloudflare (captcha?) are not supported on my phone, and I check the basement community out on my phone to see if there are new threads.

posted 7/8/2024, 9:36 am

joined aug 16, 2023

avatar

non serviam

joined aug 16, 2023

quoting orchids:

Cloudflare is a tool, mostly used as a CDN for your website, but also has a lot of other cool shit like email forwarding, firewalls, IP blocking, etc, etc. They have a free tier (which is what basement community uses) and they just released a AI blocker for sites. @starbreaker might be interested in this because I know you're using robots.txt to block unwanted traffic.

Thanks for thinking of me, but I've learned how to block bad user agents and referrers in .htaccess. Whether a search crawler or AI scraper honors robots.txt is no longer of interest to me. If they don't, and their operators are foolish enough to identify themselves, they get redirected to https://nocommercialuse.org.

robots.txt is a polite suggestion, but I am not obligated to be polite or settle for mere suggestion on my own website. As a webmaster I am sovereign, and it is my prerogative as such to command.

posted 7/8/2024, 7:42 pm

general cloudflare is offering a tool to block AI scrapers