#dev 2023-06-02

2023-06-02 UTC
gRegorLove_ joined the channel
#
[tantek]
Direct server-local link: https://mozilla.social/@jamesid/110470858327528256 — and I strongly encourage anyone that this appeals to apply! (feel free to DM me if you have any questions)
#
superkuh
All that wasted tech and attack surface just to replicate a fraction of what .html+cached images can do.
#
epoch
dunno if anyone here saw yet, but Web Share API became w3c recommendation. https://www.w3.org/blog/news/archives/9931
#
Loqi
[preview] The Web Applications Working Group has published Web Share API as a W3C Recommendation. This specification defines an API for sharing text, links and other content to an arbitrary destination of the user’s choice. The available share targets are no...
eitilt1 joined the channel
#
epoch
updates the wiki page about it
gerben, IWSlackGateway, [tantek], bterry, Xe, gRegor, srushe, vikanezrimaya, ancarda, eb, capjamesg, alecjonathon, wagle, holiday_medley and [KevinMarks] joined the channel
#
capjamesg
I need some security expertise.
#
capjamesg
I have built a tool (spa.js) that lets you turn a website into a SPA.
[Murray] joined the channel
#
capjamesg
I was going to run an eval() on all JS that the script finds on your site so that JS on different pages can load when you click a link.
#
vladimyr
Not an expert but I can try :)
#
capjamesg
I can assume on my personal website that the JS is trusted since I put it there. But if a site has subpaths run by different services, spa.js may present problems?
#
capjamesg
Suppose I run jamesg.blog and I have jamesg.blog/wp/ that runs WordPress. If my WP is compromised and I have a SPA that runs scripts on the site, the parent jamesg.blog could also be compromised since it could load JS from /wp/.
#
[KevinMarks]
there's a lot of caeful sandboxing and execution contexts involved
#
vladimyr
What are you trying to achieve? Pre-execute scripts from other pages?
#
vladimyr
May I ask why?
#
vladimyr
I mean what's the motivation behind it
#
capjamesg
I want my website to be a SPA. But some pages have scripts that aren't rendered on every page. For example, my maps pages have scripts that only render on maps pages.
#
capjamesg
If the site is a SPA, I need a way to execute those scripts when I go on a map page, otherwise the page will not render.
#
capjamesg
[KevinMarks] say more?
#
capjamesg
Whitelisted route handling?
#
vladimyr
capjamesg: security-wise there is no issue in executing scripts from other pages you host cause attack surface is exactly the same as you navigated to that page
#
vladimyr
Also you just reinvented pjax
#
vladimyr
^ don't eval scripts, find all script tags and clone/recreate them inside host document
#
capjamesg
What about scripts without an src? <script>...</script>
#
Loqi
SPA has -1 karma over the last year
[jacky] joined the channel
#
[jacky]
I have the wild idea of moving my filesystem for my site's backing into WebDav instead of some object storage setup
#
[jacky]
mainly to use something based on community standards
#
[jacky]
but there's an added benefit where I can slide in RSVPs and events automatically into a calendar
gnoo and Seirdy joined the channel
#
IWDiscordRelay
<c​apjamesg#4492> Why [tantek]?
#
vladimyr
Doesn't work with disabled js, transfers way more bytes over the wire than it is usually necessarily, often breaks back button and at the end of day not everything needs to be an app
#
vladimyr
Nothing wrong with plain old sites
#
[tantek]
vladimyr++
#
Loqi
vladimyr has 1 karma over the last year
#
[tantek]
what is a SPA
#
Loqi
A single-page application (SPA) is a web application or web site that fits on a single web page with the goal of providing a user experience similar to that of a desktop application https://indieweb.org/SPA
#
aaronpk
SPAs are best only if you actually need the features an SPA provides, which is it turns out not very often
[schmarty] joined the channel
#
[schmarty]
capjamesg: why do you want to make your site function as an SPA?
#
[schmarty]
several features of SPAs are available to regular websites!
#
aaronpk
Now I'm curious what you're referring to!
gRegor joined the channel
#
[tantek]
with decent service workers / offline support, you don't need to do anything "SPA" for like 99% of use-cases
#
[jacky]
htmx supremacy (haha)
#
[tantek]
proof by example: here's a "SPA" that needs zero SPA-tech: https://asin.cc/
#
Loqi
[preview] Tantek Çelik
#
[tantek]
yes it's very simple yet I think demonstrates the point
#
[tantek]
lol maybe I need an h-app there so Loqi has something better to say
#
[tantek]
I suppose I should make it "installable" now that iOS supports that
#
[schmarty]
yeah service workers covers many big features like offline and caching. adactio has done some good posts on this, including one apparent pro-SPA argument about "i need smooth transitions between pages" is getting browser support that should also work across pages. https://adactio.com/tags/spas
#
Loqi
[preview] [Jeremy Keith] Rich Harris: Hot takes on the web 🌶️ - YouTube I don’t agree with all of these takes-of-varying-spiciness, but Rich Harris is always worth paying attention to.
#
vladimyr
[tantek]: this thing still has code on the backend too? I don't see where do you process asin query param clientside 🤔
#
[tantek]
vladimyr, yes it has literally the same computational code on the backend as the frontend. On the clientside (with JS) there is no need to generate a query param so there is no need to "process" it. The frontend short circuits that and just handles the form interaction directly and produces a result.
moose333 joined the channel
#
vladimyr
Same way that you've shortcircuited form submission you could read query param clientside and always do computation locally? 💡
#
vladimyr
Makes it truly offline first 🙃
#
[tantek]
inspired by petermolnar asking in #indieweb-chat do we need a "notrain" in addition to "noindex" for meta robots / robots.txt to communicate that site / content MUST not be used for training AI models?
#
[tantek]
though "notrain" may need disambiguation from "not rain"
#
[tantek]
or a better suggestion
#
[schmarty]
we have an RFC for websites that are teapots now let's figure out which websites are "rain"
#
[tantek]
vladimyr, read query param clientside could be interesting for "completeness" just to show it could be done, however it's not something you'd likely run into as the only UI path to generate a query param URL is to use the site w/o JS, and then load that query param URL sometime later *with* JS enabled
#
[schmarty]
i wonder how many ai training sets blatantly contain material that is marked as denied by robots.txt or noindex. a "new" spec without some kind of enforcement weight doesn't feel particularly useful given we already have tools.
#
gRegor
My first thought was "but I like 🚆" until I got to the AI bit
#
[tantek]
lol yes needs new name
#
vladimyr
[tantek]: what about bookmarklet?
#
[tantek]
vladimyr a bookmarklet can construct the url with "?asin="
#
vladimyr
That sounds like DNT happening all over again
#
[tantek]
without* "?asin="
#
gRegor
What is DNT
#
Loqi
Do Not Track (also known as DNT) is a technology and policy proposal that enables users to opt out of tracking by websites they do not visit, including analytics services, advertising networks, and social platforms https://indieweb.org/DNT
#
[tantek]
vladimyr, except that many companies that are training AIs already support robots.txt / noindex in general (in their crawlers, apps etc.) e.g. Google supports it both for search and things like Google Calendar crawling/subscribing
#
[tantek]
not all sure, but you'd cover the "big ones" at least
#
[tantek]
it would be a good incremental start
#
[schmarty]
what is Global Privacy Control?
#
Loqi
It looks like we don't have a page for "Global Privacy Control" yet. Would you like to create it? (Or just say "Global Privacy Control is ____", a sentence describing the term)
#
[tantek]
vladimyr, the suggestion is still interesting to me (to handle the query param client side explicitly as a "just in case"). I created a stub repo feel free to file an issue requesting that! https://github.com/tantek/asin.cc/issues
#
[schmarty]
DNT << [https://globalprivacycontrol.org/ Global Privacy Control] (GPC) is a remarkably similar more recent spec.
#
Loqi
ok, I added "[https://globalprivacycontrol.org/ Global Privacy Control] (GPC) is a remarkably similar more recent spec." to the "See Also" section of /Do_Not_Track https://indieweb.org/wiki/index.php?diff=88131&oldid=68148
#
gRegor
I see on there that Reddit dropped DNT support. Did it get dropped more widely or am I misremembering? Need to update the dfn?
#
gRegor
According to enwp doesn't have wide adoption. Apple discontinued it and Firefox still supports it in private browsing mode
#
IWDiscordRelay
<c​apjamesg#4492> [schmarty] I want to allow someone to take a video call while navigating between pages.
#
[tantek]
gRegor, yeah worth updating the DNT page and probably worth creating a GPC page to link to instead / see also
#
[tantek]
DNT is quite dead
#
[schmarty]
i did some reading on this when we implemented GPC at work for CCPA compliance. it seems like most sites just ignore DNT now for several reasons but mostly that it doesn't have teeth and the surveillance ad industry was like "what are you gonna do about it?"
#
[schmarty]
technically speaking GPC is nearly identical to DNT
#
gRegor
I'll let someone else more familiar give it a go. Linking to enwp could be useful too for more details.
#
[schmarty]
capjamesg: good luck; have fun! :zany_face:
#
IWDiscordRelay
<c​apjamesg#4492> angelo has this working!
#
[tantek]
has what working?
#
IWDiscordRelay
<c​apjamesg#4492> Video conferencing across pages via SPA.
#
[tantek]
the first time I saw that working was in another discord server. it was kinda surreal tbh
#
[tantek]
so that's a good 1% use-case then 🙂
#
[tantek]
SPA << Use-case: continuous uninterrupted seamless video conferencing even when navigating across "pages" on the same site
#
Loqi
ok, I added "Use-case: continuous uninterrupted seamless video conferencing even when navigating across "pages" on the same site" to the "See Also" section of /single-page_application https://indieweb.org/wiki/index.php?diff=88133&oldid=88127
#
vladimyr
To broaden [schmarty]'s point here is what ublock origin's author says about GPC and possibility of ublock sending GPC signal by default https://reddit.com/comments/qlj2kl/comment/hj5j1zn?context=3
#
[tantek]
well well well there's apparently an ai.txt proposal for this
#
[tantek]
nope nope nope. P3P--
#
Loqi
P3P has -1 karma over the last year
[snarfed] joined the channel
#
[snarfed]
never heard of GPC, but it sounds like P3P from ages ago?
#
[tantek]
GPC has zero RDF
#
Loqi
[schmarty]: lol
#
[schmarty]
sorry, i worked on a P3P parsing project in academia years ago back when search engine API terms of service didn't forbid you from reranking search results.
#
[schmarty]
i don't recall even once encountering "RDF". at best sites just published a single `P3P` HTTP header with short tokens that we used as flags to indicate things they collected/shared.
#
[tantek]
lol ai.txt cook for your self fail
#
[schmarty]
(mostly sites just published an invalid P3P header so that IE would choke on it and allow third-party cookies)
#
[tantek]
well-known << ai.txt proposer ^ fails to implement it themselves, https://site.spawning.ai/ai.txt returns a 404
#
Loqi
ok, I added "ai.txt proposer ^ fails to implement it themselves, https://site.spawning.ai/ai.txt returns a 404" to the "See Also" section of /well-known https://indieweb.org/wiki/index.php?diff=88135&oldid=88134
#
gRegor
Are we sure an AI didn't make that site?
#
[tantek]
well-known << another citation for launch of ai.txt: https://twitter.com/spawning_/status/1663635132761219073
#
[schmarty]
the arguments around copyright and licensing to restrict use in AI training feels very muddy to me right now. i am not aware of anything like what petermolnar asked for, but i'd love a copyleft "if you train an AI on my data then you also have to release your training set and trained models freely" or other "poison pill" licenses. 😈
#
[snarfed]
in practice this sounds very vague and underdefined. spam filters are models. search indexing and ranking are models. link previews that guess which parts of your page are title and content are models. (meta tags notwithstanding.) locking them all out sounds like...overkill
#
[tantek]
right snarfed, there's "noindex" specifically for indexing, and thus it seems reasonable to pick something similarly discrete (specific) for "no LLM training" or perhaps "no neural net training" to generalize for media
#
[tantek]
that may impact some spam filters, though unlikely to impact all
#
petermolnar
I just want to lock out generative ais
#
petermolnar
Poisoning does sound fun, but locking out is better.
#
[tantek]
presumably for text, images, all media/content types?
#
[tantek]
anyone know if there is or has been a proposed "follow your nose" alternative to robots.txt? bonus points if there are any implementations
holiday_medley joined the channel
#
superkuh
I had "anthropic ai" useragent spend 3-4 days last week mirror my entire site. I threw in some documents specially for them.
#
superkuh
Came from amazon ec2 ip space.
#
[tantek]
superkuh you've seen this right ... ?
#
[tantek]
what is chicken
#
Loqi
🐔 chicken is a type of post supported by idno https://indieweb.org/chicken
#
[jacky]
ooh that's a good idea
#
[jacky]
tbh it's mean but I'd 100% make my site respond to known bad user agents
#
[jacky]
okay nvm about adding WebDav support
#
gRegor
❌ WebDav ✔️ chicken
holiday_medley joined the channel
#
[tantek]
^ needs Drake meme
#
gRegor
Considered it, but was lazy :D
mouse[d], IWDiscordRelay, petermolnar, holiday_medley and rubenwardy joined the channel
#
[tantek]
hence see #indieweb-chat 🙂
geoffo, mouse[d], capjamesg[d], Dr_DinoMight[d], rattroupe[d], kongaloosh[d], Silicon[d], shaunix[d], tracydurnell[d], darylsun[d], Favicon[d], IWDiscord, aaronpk[d], ms_boba[d], eddev[d], Grayson[d], IndieWebCamp_IRC, jacky[d], Ramon[d], Murray[d], Nezteb[d], angelo_, xyzzy[d], DJ_[dj_je][d], steinaech[d] and isellsoap[d] joined the channel
#
aaronpk
capjamesg: one of my thoughts that's been in the back of my mind for a while is to automatically generate a slug for my posts. Right now, I've been finding myself manually choosing slugs that are 1-2 words, based on the most significant words in the post
#
aaronpk
I noticed your tagline on linguist.link is "Find the most surprising words and most common n-grams on a web page" which is similar to how i've been thinking about doing it
#
aaronpk
i think my rough heuristic is: use a hashtag for the post if there's a hashtag in the text. then use the most unique proper noun.
#
aaronpk
if no proper noun, then use the most unique term
#
aaronpk
this searches only within the page right now tho right?
#
aaronpk
right, so my thought is to use the entire set of my posts as the search space
#
capjamesg
It didn't work too well.
#
capjamesg
"younger" for example was seen as a surprising word because I rarely used it.
#
capjamesg
It was a meaningful word in context, but not enough to be considered representative of the post content.
#
aaronpk
what if you prioritized searching only proper nouns first?
#
capjamesg
That could work.
#
capjamesg
I haven't tried that.
#
capjamesg
nltk does proper noun identification.
#
capjamesg
Process text -> find proper nouns -> chunk them (so "taylor" and "swift" become one, rather than two separate values), then use the most common proper nouns?
#
capjamesg
(brainstorming)
#
aaronpk
i've been thinking about this for a while, because basically every time i post something, i choose a slug for it, so i've been paying attention to my own rules I use for finding a slug that I want for the post
#
aaronpk
which actually sounds like an interesting ML training data set
geoffo joined the channel
#
vladimyr
Few days ago there was some chat about enwp.org Not sure how well known is it, but there is similar service for MDN docs mdn.io (source code: https://github.com/lazd/mdn.io)
#
Loqi
[preview] [lazd] mdn.io: The "I'm feeling lucky" URL shortener
#
capjamesg
Oh that's nice!
#
vladimyr
Lifesaver for quick api doc refreshers
#
[Murray]
[capjamesg] on the off change they're useful, RE: SPA-like abilities without a full SPA framework, I remember being intrigued by Turbo and Hotwire: https://turbo.hotwired.dev/ Not sure they work for your use-case, but may have some interesting ideas
#
[Murray]
Oh wow I had not heard about this, definitely interesting!
ahappydeath, [benatwork] and gxt__ joined the channel
#
[tantek]
what is Mavo
#
Loqi
It looks like we don't have a page for "Mavo" yet. Would you like to create it? (Or just say "Mavo is ____", a sentence describing the term)
#
[tantek]
huh I thought we had a page on it
#
[tantek]
Mavo is definitely very interesting & clever. I don't know if it got much adoption / deployment though. Also the test suite link in the footer returns a 404
IWSlackGateway joined the channel