rubenwardy[tantek]: re publisher Vs plumbing, how would you have responded differently? As a publisher, I check my implementation with a validator to make sure tools will interpret it correctly. It's pointless to have an implementation that matches in theory but not in practice
[snarfed]interesting. for a while now, I'd accepted the gospel that the fediverse blocks web crawlers in robots.txt. out of curiosity, I looked at a few robots.txt files today, and evidently some servers do block by default, but many don't. eg evidently Mastodon and Lemmy allow web crawlers by default, eg https://mastodon.social/robots.txt , https://lemmy.ml/robots.txt
[manton]Related side rant on robots.txt, I’ve been trying to form a coherent strategy around AI, and I wish there was more work on re-introducing Creative Commons to blog and social content. The web should be open by default and blocking all crawlers isn’t a great solution.
[snarfed]personally I question the desire to block AI at all. I mean, fine, if you want to, for whatever reason, but I haven't yet grokked the concrete harm to a specific person from training on their specific content
[snarfed](there are maybe narrow cases, eg notable artists, where the model then generates art very similar to theirs...but that doesn't seem to be what's happening when the average person says "I hate AI! don't let it use my stuff!")
[Murray]I think there's an inherent feeling of violation of social contracts that some people are pushing back on, and to others the whole AI model is just icky. Both the method of extraction and the resultant "product" can be seen as unethical, so I can understand why people want to abstain from that
[Murray]It doesn't help that a lot of the companies getting the most attention are also the most problematic, because whilst I will personally be blocking AI crawlers at some point (I have other back end issues making it harder than anticipated right now), I actually think there are applications I would be happy with. They just aren't the ones OpenAI, Google et al are engaged in
[Murray]I'll just add onto this that the way AI crawlers are working has made me reconsider some aspects of bookmarking/digital note archives on my own site. The only place my site comes up when I search ML sources is actually for image generation "AI"s like Stable Diffusion, where the images I have saved on my site have been scraped.
Loqi[preview] [Tantek Çelik] Blog as if there’s an #AI being trained¹ to be you based on your blog posts.
And what if it was trained on your Universal Outbox²?
#IndieWeb #OpenAI #ChatGPT
This is day 34 of #100DaysOfIndieWeb #100Days
← Day 33: https://tantek.com/2023/0...
[Murray]Most of these come from bookmark posts where I've reuploaded some images for my own archive. I'm aware that this may have always been a breach of copyright, but I never saw direct harm. Now there is. If the original authors block AI crawlers on their sites, I'm now exposing their work on my reuploads, which isn't great. Another reason I'm considering taking large swathes of my site offline/behind a login wall
[manton]Personally I do want AI to crawl my blog because I want to be able to ask it things like “has Manton ever blogged about topic XYZ?” And every once in a while maybe I blog about something interesting that could help the model and help other people.
[schmarty]I miss the days when the future was Intelligent Agents, "AI" models that run on your own hardware. Bring back the personal in personal computing, etcetera.
[manton]It does seem likely that we’ll end up with lots of models for specific purposes. Stack Overflow should have their own bot. The help site for my product should have its own bot. Etc.
[KevinMarks]There is a difference between being crawled by a search engine to help people find your writing, and being crawled by a text synthesis engine to average your writing with everyone else's and regurgitate an amalgam of it without crediting or linking to any of you.
[schmarty]the "reinforcement learning" squishes what the AI already knows about how bad it is. GPT-3 (i think) was great at roleplaying dystopian AI horror until un/underpaid humans labeled enough data to curb that behavior.
[timothy_chambe], [KevinMarks], [catgirlinspace], [pfefferle], saptaks, gRegor, rocto, AramZS and ajr joined the channel
GWGcapjamesg: First, we need to do a bit of gardening on the wiki page for the spec. [schmarty] and angelo both gave feedback suggesting we need more clarity on a few things. I want to give that a go, but would like a partner.
GWGBasically, [schmarty] pointed out the need for better user stories, and observationally, we need to enhance the... Why Ticket Auth explanation on the page.
capjamesgThe two user stories I see right now are Alice has something she wants to share with Bob, privately, and Alice has something she wants to share with a group of people, privately.
LoqiA story is a singular (one per profile) time stream collection post, that consists of ephemeral photo and video posts that are shown in sequence one at a time and disappear from the collection some time after being added, usually 24 hours https://indieweb.org/story
[snarfed]hey [manton], following up from yesterday, any chance you have an example of a http://micro.blog post that successfully mentions someone in the fediverse?
[tantek]just links [snarfed], in this case links to profiles. the "is it a mention of a person" question is answered by, "is the destination of the link a person"
[snarfed]yes, that's the indieweb-native definition. I'm hoping for a heuristic to identify *fediverse* mentions that's a bit lighter than, try to fetch every single link in a post as AS2
[tantek]that's the IndieWeb perspective, and I need to figure out how to scope / define the broader "@-mention" across IndieWeb and silos or little Chads garages
Loqi[preview] [Aumetra Ⓐ :nonbinary:] The best proof that ActivityPub doesn't need JSON-LD: I have written three or so implementations and I still got no real clue how the fuck JSON-LD works
[tantek][snarfed] I'm not doing anything special with the auto-link markup (other than class="autolink") for @-mentions, whether @-name (Twitter) or @-domain (https link to that domain) or @-@ (https link to domain/@user ), though I could 🤷♂️
[tantek]I'm hesitant to though because I feel "extra markup to make it a mention" will be one more thing publishers could get wrong and thus more fragile
[tantek][snarfed] here's a really simple suggested heuristic: if the linktext starts with '@' then assume it's an @-mention of some kind (and you have no idea what kind the destination accepts without doing discovery on that @-link first).
Loqi[preview] [Tantek Çelik] The answer to “How should you @-mention someone you are replying to?”¹ can depend on many factors or a few. Context matters somewhat, sometimes.
We can simplify it to 2 questions, based on 1 directive:
* Elevate #IndieWeb domains, above any si...