gerben, gRegorLove_, bterry, geoffo, [0x3b0b], [schmarty], hertavein, jonnybarnes, Xe and monoob5 joined the channel
#rubenwardy[tantek]: re publisher Vs plumbing, how would you have responded differently? As a publisher, I check my implementation with a validator to make sure tools will interpret it correctly. It's pointless to have an implementation that matches in theory but not in practice
#[tantek]Best practices for publishers should center the publisher's desires for design & presentation, adapt accordingly.
#[tantek]Best practices for plumbing or consumers involve defensively handling a variety of practices, including many edge cases
jeremycherfas and [snarfed] joined the channel
#[snarfed]interesting. for a while now, I'd accepted the gospel that the fediverse blocks web crawlers in robots.txt. out of curiosity, I looked at a few robots.txt files today, and evidently some servers do block by default, but many don't. eg evidently Mastodon and Lemmy allow web crawlers by default, eg https://mastodon.social/robots.txt , https://lemmy.ml/robots.txt
Xe, AramZS and [manton] joined the channel
#[manton]Related side rant on robots.txt, I’ve been trying to form a coherent strategy around AI, and I wish there was more work on re-introducing Creative Commons to blog and social content. The web should be open by default and blocking all crawlers isn’t a great solution.
#[manton]Curious to hear anyone’s thoughts on that.
#[snarfed]personally I question the desire to block AI at all. I mean, fine, if you want to, for whatever reason, but I haven't yet grokked the concrete harm to a specific person from training on their specific content
#[snarfed](there are maybe narrow cases, eg notable artists, where the model then generates art very similar to theirs...but that doesn't seem to be what's happening when the average person says "I hate AI! don't let it use my stuff!")
#[snarfed](also I already regret jumping into this debate 😁)
#[Murray]I think there's an inherent feeling of violation of social contracts that some people are pushing back on, and to others the whole AI model is just icky. Both the method of extraction and the resultant "product" can be seen as unethical, so I can understand why people want to abstain from that
#[Murray]It doesn't help that a lot of the companies getting the most attention are also the most problematic, because whilst I will personally be blocking AI crawlers at some point (I have other back end issues making it harder than anticipated right now), I actually think there are applications I would be happy with. They just aren't the ones OpenAI, Google et al are engaged in
#[Murray]I'll just add onto this that the way AI crawlers are working has made me reconsider some aspects of bookmarking/digital note archives on my own site. The only place my site comes up when I search ML sources is actually for image generation "AI"s like Stable Diffusion, where the images I have saved on my site have been scraped.
#Loqi[preview] [Tantek Çelik] Blog as if there’s an #AI being trained¹ to be you based on your blog posts.
And what if it was trained on your Universal Outbox²?
#IndieWeb #OpenAI #ChatGPT
This is day 34 of #100DaysOfIndieWeb #100Days
← Day 33: https://tantek.com/2023/0...
#[Murray]Most of these come from bookmark posts where I've reuploaded some images for my own archive. I'm aware that this may have always been a breach of copyright, but I never saw direct harm. Now there is. If the original authors block AI crawlers on their sites, I'm now exposing their work on my reuploads, which isn't great. Another reason I'm considering taking large swathes of my site offline/behind a login wall
#aaronpkit's fun to be collectively mad at large companies!
#[manton]Personally I do want AI to crawl my blog because I want to be able to ask it things like “has Manton ever blogged about topic XYZ?” And every once in a while maybe I blog about something interesting that could help the model and help other people.
#[schmarty]I'm more interested in training and querying my own little model on my site, outside of the surveillance models of the big "AI" providers.
#[schmarty]I miss the days when the future was Intelligent Agents, "AI" models that run on your own hardware. Bring back the personal in personal computing, etcetera.
#[manton]It does seem likely that we’ll end up with lots of models for specific purposes. Stack Overflow should have their own bot. The help site for my product should have its own bot. Etc.
#Loqipersonalcomputing has 1 karma over the last year
#[schmarty]Haha thanks tantek (and Loqi). I'm curious to see the translation feature experience!
[tw2113] joined the channel
#[KevinMarks]There is a difference between being crawled by a search engine to help people find your writing, and being crawled by a text synthesis engine to average your writing with everyone else's and regurgitate an amalgam of it without crediting or linking to any of you.
#[schmarty]the "reinforcement learning" squishes what the AI already knows about how bad it is. GPT-3 (i think) was great at roleplaying dystopian AI horror until un/underpaid humans labeled enough data to curb that behavior.
[timothy_chambe], [KevinMarks], [catgirlinspace], [pfefferle], saptaks, gRegor, rocto, AramZS and ajr joined the channel
#capjamesgGWG What way forward do you have in mind?
#GWGcapjamesg: First, we need to do a bit of gardening on the wiki page for the spec. [schmarty] and angelo both gave feedback suggesting we need more clarity on a few things. I want to give that a go, but would like a partner.
#capjamesgI'm happy to read over anything you change.
#GWGBasically, [schmarty] pointed out the need for better user stories, and observationally, we need to enhance the... Why Ticket Auth explanation on the page.
#GWGIf we don't... I think it may hinder people being interested in building implementations
#capjamesgOne thing that comes to mind is how Instagram works.
#capjamesgYou can have BFFs who are able to see some Stories that nobody else can.
#capjamesgWe don't have a way to do that on personal websites.
#GWGcapjamesg: That is a user story that is an example of, Alice wants to see Bob's private stuff
#capjamesgThat opens up a lot of opportunities for creative expression.
#capjamesgI think Alice wants to see Bob's private stuff is too vague?
#GWGAlthough every time I write private stuff I feel like there should be a less suggestive generic term
#capjamesgAlice had a weird experience that she wants to share with her friends, but doesn't want to be publicly available.
#capjamesgThen Alice had a weird experience she wants to share with Bob.
#capjamesgI think 1:1 is likely to be less interesting than group distribution.
#capjamesgAlice may write a post for Bob but then we get into the talk about how does this compare to DMs.
#GWGWell, the spec doesn't care what you are sharing
#capjamesgThe spec doesn't care, but we should have auxillery info that shares some detailed use cases.
#GWGBut I think we leaned into that a bit too much
#aaronpkThe Instagram use case is creating a "circle" of friends and sharing with that subset of people instead of publicly
#capjamesgThe two user stories I see right now are Alice has something she wants to share with Bob, privately, and Alice has something she wants to share with a group of people, privately.
#GWGaaronpk: I may ask your opinion as an experienced explainer of auth stuff after I take a crack at it
#aaronpkI am also interested in solving this because I am about this close to making most of the content on my site non-public
#LoqiA story is a singular (one per profile) time stream collection post, that consists of ephemeral photo and video posts that are shown in sequence one at a time and disappear from the collection some time after being added, usually 24 hours https://indieweb.org/story
#[snarfed]hey [manton], following up from yesterday, any chance you have an example of a http://micro.blog post that successfully mentions someone in the fediverse?
#[tantek]just links [snarfed], in this case links to profiles. the "is it a mention of a person" question is answered by, "is the destination of the link a person"
#[snarfed]yes, that's the indieweb-native definition. I'm hoping for a heuristic to identify *fediverse* mentions that's a bit lighter than, try to fetch every single link in a post as AS2
#[tantek]that's the IndieWeb perspective, and I need to figure out how to scope / define the broader "@-mention" across IndieWeb and silos or little Chads garages
#Loqi[preview] [Aumetra Ⓐ :nonbinary:] The best proof that ActivityPub doesn't need JSON-LD: I have written three or so implementations and I still got no real clue how the fuck JSON-LD works
#[tantek][snarfed] I'm not doing anything special with the auto-link markup (other than class="autolink") for @-mentions, whether @-name (Twitter) or @-domain (https link to that domain) or @-@ (https link to domain/@user ), though I could 🤷♂️
#[tantek]I'm hesitant to though because I feel "extra markup to make it a mention" will be one more thing publishers could get wrong and thus more fragile
#[tantek][snarfed] here's a really simple suggested heuristic: if the linktext starts with '@' then assume it's an @-mention of some kind (and you have no idea what kind the destination accepts without doing discovery on that @-link first).
#Loqi[preview] [Tantek Çelik] The answer to “How should you @-mention someone you are replying to?”¹ can depend on many factors or a few. Context matters somewhat, sometimes.
We can simplify it to 2 questions, based on 1 directive:
* Elevate #IndieWeb domains, above any si...