#dev 2023-10-12

2023-10-12 UTC
gerben, gRegorLove_, bterry, geoffo, [0x3b0b], [schmarty], hertavein, jonnybarnes, Xe and monoob5 joined the channel
# 10:18 
rubenwardy [tantek]: re publisher Vs plumbing, how would you have responded differently? As a publisher, I check my implementation with a validator to make sure tools will interpret it correctly. It's pointless to have an implementation that matches in theory but not in practice
# 10:35 
[tantek] Best practices for publishers should center the publisher's desires for design & presentation, adapt accordingly.
# 10:36 
[tantek] Best practices for plumbing or consumers involve defensively handling a variety of practices, including many edge cases
jeremycherfas and [snarfed] joined the channel
# 13:42 
[snarfed] interesting. for a while now, I'd accepted the gospel that the fediverse blocks web crawlers in robots.txt. out of curiosity, I looked at a few robots.txt files today, and evidently some servers do block by default, but many don't. eg evidently Mastodon and Lemmy allow web crawlers by default, eg https://mastodon.social/robots.txt , https://lemmy.ml/robots.txt
Xe, AramZS and [manton] joined the channel
# 14:02 
[manton] Related side rant on robots.txt, I’ve been trying to form a coherent strategy around AI, and I wish there was more work on re-introducing Creative Commons to blog and social content. The web should be open by default and blocking all crawlers isn’t a great solution.
# 14:02 
[manton] Curious to hear anyone’s thoughts on that.
# 14:07 
[snarfed] personally I question the desire to block AI at all. I mean, fine, if you want to, for whatever reason, but I haven't yet grokked the concrete harm to a specific person from training on their specific content
# 14:08 
[snarfed] (there are maybe narrow cases, eg notable artists, where the model then generates art very similar to theirs...but that doesn't seem to be what's happening when the average person says "I hate AI! don't let it use my stuff!")
# 14:09 
[snarfed] (also I already regret jumping into this debate 😁)
# 14:13 
[Murray] I think there's an inherent feeling of violation of social contracts that some people are pushing back on, and to others the whole AI model is just icky. Both the method of extraction and the resultant "product" can be seen as unethical, so I can understand why people want to abstain from that
# 14:15 
[Murray] It doesn't help that a lot of the companies getting the most attention are also the most problematic, because whilst I will personally be blocking AI crawlers at some point (I have other back end issues making it harder than anticipated right now), I actually think there are applications I would be happy with. They just aren't the ones OpenAI, Google et al are engaged in
# 14:18 
[Murray] I'll just add onto this that the way AI crawlers are working has made me reconsider some aspects of bookmarking/digital note archives on my own site. The only place my site comes up when I search ML sources is actually for image generation "AI"s like Stable Diffusion, where the images I have saved on my site have been scraped.
# 14:19 
[tantek] [manton] my thoughts on that: https://tantek.com/2023/072/t1/blog-as-if-ai-trained-posts
# 14:19 
Loqi [preview] [Tantek Çelik] Blog as if there’s an #AI being trained¹ to be you based on your blog posts.
And what if it was trained on your Universal Outbox²?
#IndieWeb #OpenAI #ChatGPT
This is day 34 of #100DaysOfIndieWeb #100Days
← Day 33: https://tantek.com/2023/0...
# 14:19 
[Murray] Most of these come from bookmark posts where I've reuploaded some images for my own archive. I'm aware that this may have always been a breach of copyright, but I never saw direct harm. Now there is. If the original authors block AI crawlers on their sites, I'm now exposing their work on my reuploads, which isn't great. Another reason I'm considering taking large swathes of my site offline/behind a login wall
# 14:19 
aaronpk it's fun to be collectively mad at large companies!
# 14:19 
[manton] [tantek] Thanks, I had missed that post.
# 14:20 
[tantek] [aaronpk] lol see also https://www.youtube.com/watch?v=q118B_QdP2k
# 14:21 
[manton] Personally I do want AI to crawl my blog because I want to be able to ask it things like “has Manton ever blogged about topic XYZ?” And every once in a while maybe I blog about something interesting that could help the model and help other people.
# 14:22 
[manton] I get the distrust people have, though.
[schmarty] joined the channel
# 14:24 
[schmarty] I'm more interested in training and querying my own little model on my site, outside of the surveillance models of the big "AI" providers.
# 14:26 
[schmarty] I miss the days when the future was Intelligent Agents, "AI" models that run on your own hardware. Bring back the personal in personal computing, etcetera.
# 14:26 
[manton] It does seem likely that we’ll end up with lots of models for specific purposes. Stack Overflow should have their own bot. The help site for my product should have its own bot. Etc.
# 14:26 
[snarfed] aaronpk don't you love tech politics? 😁
# 14:26 
Loqi misses the days when the future was Intelligent Agents too
# 14:28 
[tantek] [schmarty]++ of course I agree with all those points.
# 14:28 
Loqi [schmarty] has 15 karma in this channel over the last year (49 in all channels)
# 14:29 
[tantek] Speaking of "models" that run on your own hardware: BSP: Latest Firefox has *local* (yes all in your browser) translation of web pages.
# 14:29 
[tantek] personalcomputing++
# 14:29 
Loqi personalcomputing has 1 karma over the last year
# 14:31 
[schmarty] Haha thanks tantek (and Loqi). I'm curious to see the translation feature experience!
[tw2113] joined the channel
# 14:54 
[KevinMarks] There is a difference between being crawled by a search engine to help people find your writing, and being crawled by a text synthesis engine to average your writing with everyone else's and regurgitate an amalgam of it without crediting or linking to any of you.
# 14:56 
c​apjamesg I want that on a t-shirt.
# 15:12 
[tantek] [KevinMarks] I'm specifically talking about deliberately influencing the latter use-case.
# 15:13 
[tantek] capjamesg, too much text for a t-shirt
[aciccarello] joined the channel
# 15:31 
[aciccarello] I look forward to our AI's explaining how bad AI is.
# 15:33 
[tantek] the more you blog about how bad AI is, the more you train AIs to better explain how bad it is
# 15:34 
[aciccarello] ^this
# 15:38 
[schmarty] the "reinforcement learning" squishes what the AI already knows about how bad it is. GPT-3 (i think) was great at roleplaying dystopian AI horror until un/underpaid humans labeled enough data to curb that behavior.
[timothy_chambe], [KevinMarks], [catgirlinspace], [pfefferle], saptaks, gRegor, rocto, AramZS and ajr joined the channel
# 18:56 
capjamesg GWG What way forward do you have in mind?
# 18:57 
capjamesg (for the logs: this is re: TicketAuth)
# 18:58 
GWG capjamesg: First, we need to do a bit of gardening on the wiki page for the spec. [schmarty] and angelo both gave feedback suggesting we need more clarity on a few things. I want to give that a go, but would like a partner.
# 18:59 
capjamesg I'm happy to read over anything you change.
# 19:00 
GWG Basically, [schmarty] pointed out the need for better user stories, and observationally, we need to enhance the... Why Ticket Auth explanation on the page.
# 19:00 
GWG If we don't... I think it may hinder people being interested in building implementations
# 19:00 
capjamesg One thing that comes to mind is how Instagram works.
# 19:00 
GWG Go on
# 19:00 
capjamesg You can have BFFs who are able to see some Stories that nobody else can.
# 19:01 
capjamesg We don't have a way to do that on personal websites.
# 19:01 
GWG capjamesg: That is a user story that is an example of, Alice wants to see Bob's private stuff
# 19:01 
capjamesg That opens up a lot of opportunities for creative expression.
# 19:02 
capjamesg I think Alice wants to see Bob's private stuff is too vague?
# 19:02 
GWG Although every time I write private stuff I feel like there should be a less suggestive generic term
# 19:02 
capjamesg Alice had a weird experience that she wants to share with her friends, but doesn't want to be publicly available.
# 19:02 
capjamesg Then Alice had a weird experience she wants to share with Bob.
# 19:02 
capjamesg I think 1:1 is likely to be less interesting than group distribution.
# 19:03 
capjamesg Alice may write a post for Bob but then we get into the talk about how does this compare to DMs.
# 19:03 
GWG Well, the spec doesn't care what you are sharing
# 19:04 
capjamesg The spec doesn't care, but we should have auxillery info that shares some detailed use cases.
# 19:04 
GWG But I think we leaned into that a bit too much
# 19:04 
aaronpk The Instagram use case is creating a "circle" of friends and sharing with that subset of people instead of publicly
# 19:04 
capjamesg The two user stories I see right now are Alice has something she wants to share with Bob, privately, and Alice has something she wants to share with a group of people, privately.
# 19:05 
GWG aaronpk: I may ask your opinion as an experienced explainer of auth stuff after I take a crack at it
# 19:05 
aaronpk I am also interested in solving this because I am about this close to making most of the content on my site non-public
# 19:05 
GWG I know you are out and about this week.
# 19:05 
GWG So that gives me time
# 19:06 
capjamesg aaronpk If you are willing to share more I'd love to know your thought process (no pressure).
# 19:06 
capjamesg GWG I think one thing that needs documented is ideal consumers too.
# 19:07 
capjamesg Feed readers are a big one.
# 19:10 
aaronpk Does Instagram show the user viewing a post that a post is audience limited?
# 19:10 
aaronpk that's going to be important too
# 19:12 
capjamesg In the BFF Story case, yes.
# 19:12 
gRegor I know it does with stories. Green circle on the profile if there are new friends-only stories
# 19:12 
capjamesg ^^
# 19:12 
gRegor and the individual story has a green... checkmark I think?
# 19:12 
capjamesg Yep.
# 19:12 
gRegor some green icon
# 19:12 
gRegor what is story
# 19:12 
Loqi A story is a singular (one per profile) time stream collection post, that consists of ephemeral photo and video posts that are shown in sequence one at a time and disappear from the collection some time after being added, usually 24 hours https://indieweb.org/story
# 19:18 
[snarfed] hey [manton], following up from yesterday, any chance you have an example of a http://micro.blog post that successfully mentions someone in the fediverse?
# 19:25 
[tantek] 👀
# 19:37 
[snarfed] so indieweb mentions are just links, right? maybe with class=h-card but not required...? https://indieweb.org/mention
# 19:37 
[snarfed] did we ever figure out a heuristic for identifying fediverse mentions in an indieweb post?
# 19:39 
[snarfed] I guess maybe any linked @-@?
# 19:41 
[tantek] just links [snarfed], in this case links to profiles. the "is it a mention of a person" question is answered by, "is the destination of the link a person"
# 19:41 
[tantek] what is a homepage mention
# 19:41 
Loqi person mention is a homepage webmention sent to a person's homepage https://indieweb.org/homepage_mention
# 19:42 
[snarfed] yes, that's the indieweb-native definition. I'm hoping for a heuristic to identify *fediverse* mentions that's a bit lighter than, try to fetch every single link in a post as AS2
# 19:43 
[tantek] that's the IndieWeb perspective, and I need to figure out how to scope / define the broader "@-mention" across IndieWeb and silos or little Chads garages
# 19:43 
[tantek] oh interesting
# 19:43 
[tantek] hmm what is my auto-linker doing today
# 19:43 
[tantek] might as well check an existing example 🙂
# 19:47 
[KevinMarks] SocialWG job done: https://corteximplant.com/@0x0/111222855349470100
# 19:48 
Loqi [preview] [Aumetra Ⓐ :nonbinary:] The best proof that ActivityPub doesn't need JSON-LD: I have written three or so implementations and I still got no real clue how the fuck JSON-LD works
# 19:50 
[snarfed] omg the comments are hilarious
# 19:50 
[snarfed] "JSONld is really only useful with extremely large amounts of data or web sockets or something" 😂😂
# 19:50 
[schmarty] 🍿
# 19:51 
[tantek] 🍿
# 19:51 
capjamesg Haha
# 19:57 
[tantek] [snarfed] I'm not doing anything special with the auto-link markup (other than class="autolink") for @-mentions, whether @-name (Twitter) or @-domain (https link to that domain) or @-@ (https link to domain/@user ), though I could 🤷‍♂️
# 19:57 
[tantek] I'm hesitant to though because I feel "extra markup to make it a mention" will be one more thing publishers could get wrong and thus more fragile
# 19:59 
[snarfed] yup
# 19:59 
[tantek] [snarfed] here's a really simple suggested heuristic: if the linktext starts with '@' then assume it's an @-mention of some kind (and you have no idea what kind the destination accepts without doing discovery on that @-link first).
# 19:59 
[snarfed] yup
# 20:00 
[tantek] here is a list of @-mention patterns I came up with for publishers, in case that helps your thinking/coding on this: https://tantek.com/2023/018/t1/elevate-indieweb-above-silo
# 20:00 
Loqi [preview] [Tantek Çelik] The answer to “How should you @-mention someone you are replying to?”¹ can depend on many factors or a few. Context matters somewhat, sometimes.
We can simplify it to 2 questions, based on 1 directive:
* Elevate #IndieWeb domains, above any si...
gRegorLove_ and kleb joined the channel