#dev 2025-07-01

2025-07-01 UTC
Octetus, [tw2113], troojg, bugliker0, Pixi`, paotsaq, jak2k, [Jo], [lazcorp], GuestZero, ttybitnik, Xe, axxuy and okroshka joined the channel
#
[social]
Four days of faffing with DockerDesktop on Mac to just get it to open (as well as properly update) and finally gave in and restarted my Mac. With 10 to 12 years of Docker it still is one of the most fickle and fragile tools I know. Yet, it is also quite helpful when it works.
[Murray], barnaby, grufwub and [mattl] joined the channel
#
[mattl]
I see Cloudflare is stepping up their blocking efforts for AI crawlers. This is a good thing. I just wish I could host it myself.
[schmarty] joined the channel
#
[schmarty]
what is anubis?
#
Loqi
It looks like we don't have a page for "anubis" yet. Would you like to create it? (Or just say "anubis is ____", a sentence describing the term)
#
[mattl]
Yeah, I don’t love anubis.
#
[schmarty]
i'm not a user of it, but it's probably relevant for indieweb folks struggling with bot traffic and willing to do the dev work of standing it up in front of their site.
#
[mattl]
Proof of work stuff is too close to cryptocurrency stuff
#
[schmarty]
sure, not every tool is to everyone's tastes. folks are still finding it useful.
[manton] joined the channel
#
[manton]
I’m actually a little concerned about the Cloudflare change. They have a lot of power. Blogged more thoughts here: https://www.manton.org/2025/07/01/cloudflare-is-on-the-offensive.html
#
Loqi
[preview] [Manton Reece] Cloudflare is on the offensive against AI bots: manton.org
#
[mattl]
Yeah, I saw that.
#
[manton]
As you said, if people could self-host this kind of control, that’s better.
#
[mattl]
I want to block every crawler at this point.
#
[mattl]
Search engines are doing AI bullshit too.
#
[manton]
I think most legitimate AI crawlers do respect robots.txt. Of course, there will be rogue bots that don’t.
#
[mattl]
I don’t think there such a thing as a legitimate AI crawler
#
[mattl]
or indeed, legitimate AI.
#
[mattl]
I’m blocking lots and lots of IP addresses and lots and lots of user agents, that’s helping.
#
capjamesg
I am too.
#
[tantek]
do we have a page where we are gathering such blocking techniques?
#
[mattl]
No and I won’t contribute to one.
#
[tantek]
like something separate from the generic /LLM page
#
[mattl]
because I don’t want them to figure it out.
#
[tantek]
that's totally fine [mattl] 🙏
#
capjamesg
Yeah that's one of the risks 😦
#
[tantek]
I mean even listing options like "here's what you get if you use Cloudflare"
#
[mattl]
I’m also looking at legal options
#
[tantek]
plus we can document the "standards" efforts that are already openly discussed to block crawlers like at IETF, the CC proposal etc.
#
capjamesg
Wow Cloudflare has a lot of AI blog posts today: https://blog.cloudflare.com/
#
capjamesg
[edit] Wow Cloudflare has a lot of AI blog posts today: https://blog.cloudflare.com/
#
[tantek]
we can also document that concern about "don’t want them to figure it out"
#
[manton]
Would be nice to have a page with bot names. I have this page, but I know it’s not complete: https://help.micro.blog/t/blocking-ai-bots/2905
#
[mattl]
I want to sue some of these companies
#
[manton]
[capjamesg] Yeah! I read all of their posts today and it took forever. 🙂
#
[tantek]
right, this is my point. people already have public pages here and there, we might as well make it easier for our community
#
[manton]
Agreed.
#
[mattl]
[manton] typo on the end of that post.. says micro.bog
#
[manton]
Haha, thanks! Fixing.
#
[manton]
There should be a .bog TLD. 🤪
#
[mattl]
where to publish source code in a way that it cannot be crawled too… no more GitHub, etc.
#
[mattl]
https://libre.fm/robots.txt shows you what the cloudflare stuff today adds.
#
[mattl]
but they’re doing more than just that, but it’s not enough (or maybe I need to pay them but not doing that)
gRegor joined the channel
#
gRegor
Can't remember the site, but I recall one being shared in an HWC that lists bots for LLMs
#
gRegor
thank you!
#
gRegor
I kept thinking "dark" was in it, but got stuck on "dark patterns" :)
#
gRegor
LLM << [https://darkvisitors.com/agents List of LLM bot user agents]
#
gRegor
robots_txt << [https://darkvisitors.com/agents List of LLM bot user agents]
#
Loqi
ok, I added "[https://darkvisitors.com/agents List of LLM bot user agents]" to the "See Also" section of /robots_txt https://indieweb.org/wiki/index.php?diff=102589&oldid=94144
[KevinMarks] joined the channel