basement
community
search
wall of shame
joined dec 4, 2022
ohhhh this is the gunch!
joined dec 4, 2022
and it's not using the robots.txt
file.
Cloudflare is a tool, mostly used as a CDN for your website, but also has a lot of other cool shit like email forwarding, firewalls, IP blocking, etc, etc. They have a free tier (which is what basement community uses) and they just released a AI blocker for sites. @starbreaker might be interested in this because I know you're using robots.txt
to block unwanted traffic.
Does anyone use cloudflare here? I know they had some ethical concerns the past few years because of sites they allowed on their platform like Kiwifarms, but it's still tough to find anyone who beats Cloudflare at what they do
We see website operators completely block access to these AI crawlers using robots.txt. However, these blocks are reliant on the bot operator respecting robots.txt and adhering to RFC9309 (ensuring variations on user against all match the product token) to honestly identify who they are when they visit an Internet property, but user agents are trivial for bot operators to change. \
We leverage Cloudflare global signals to calculate our Bot Score, which for AI bots like the one above, reflects that we correctly identify and score them as a “likely bot.”
posted 7/7/2024, 4:45 pm
joined sep 22, 2023
lives in a pineapple under the sea
joined sep 22, 2023
Im not crazy about Cloudflare, since sites with cloudflare (captcha?) are not supported on my phone, and I check the basement community out on my phone to see if there are new threads.
posted 7/8/2024, 9:36 am
joined aug 16, 2023
non serviam
joined aug 16, 2023
quoting orchids:
Cloudflare is a tool, mostly used as a CDN for your website, but also has a lot of other cool shit like email forwarding, firewalls, IP blocking, etc, etc. They have a free tier (which is what basement community uses) and they just released a AI blocker for sites. @starbreaker might be interested in this because I know you're using robots.txt
to block unwanted traffic.
Thanks for thinking of me, but I've learned how to block bad user agents and referrers in .htaccess. Whether a search crawler or AI scraper honors robots.txt is no longer of interest to me. If they don't, and their operators are foolish enough to identify themselves, they get redirected to https://nocommercialuse.org.
robots.txt is a polite suggestion, but I am not obligated to be polite or settle for mere suggestion on my own website. As a webmaster I am sovereign, and it is my prerogative as such to command.
posted 7/8/2024, 7:42 pm