#dev 2024-06-12

2024-06-12 UTC
Salt_, AramZS, geoffo, timmarinin, eitilt, thepaperpilot, sp1ff, [jeremycherfas], [qubyte], chimo, sadome, bret, nertzy, [schmarty], barnaby, srijan, sebbu and thepaperpilot_ joined the channel
#
[snarfed]
thepaperpilot I'd be curious what you think of the fediverse and other similar decentralized social networks, since they're often based on "rehosting" architecturally
#
[snarfed]
when you post on Mastodon, your instance sends your post to all of your followers' instances, which rehost it for those followers to see
mahboubine joined the channel
#
[snarfed]
verifiability is important, agreed! in the indieweb, anyone can verify any original content, eg a post that's getting reposted, by fetching it from its source over SSL
mahboubine joined the channel
#
[snarfed]
that makes "rehosted" content verifiable with just HTTPS, which is obviously widely implemented, so we don't need a new signature or other authentication scheme that would be a tough adoption challenge
#
thepaperpilot
Yeah, the fediverse is actually core to my motivations behind wanting to write this spec. I'll clarify in a longer form doc once I'm at home
mahboubine joined the channel
#
[snarfed]
oh and https://docs.joinmastodon.org/spec/security/#ld-sign , which is actually in semi-broad usage in the fediverse already, unlike those two FEPs which afaik are mostly hypothetical
Salt joined the channel
#
thepaperpilot
Thanks. To get into it a bit now, my motivations are actually coming from a place of a client based (rather than server based) re-envisioning of the fediverse, where people locally sign things and send them off to as many servers as they can, rather than sending something to one specific server that they've made an account with and effectively attached their identity to. Under this new system, personal websites could also b
#
aaronpk
sounds like you want scuttlebutt https://scuttlebutt.nz/docs/protocol/
#
thepaperpilot
Actually talking about something in development called "weird"
#
Salt
I thought scuttlebutt was mostly shuttered
#
Loqi
Salt: tantek left you a message on 2017-06-06 at 1:34am UTC: added a bunch more proposed Leaders Summit sessions for your consideration - please take a look and add interested (or not) notes, or other suggestions! https://indieweb.org/2017/Leaders#Sessions
#
Salt
tantek++
#
Loqi
tantek has 23 karma in this channel over the last year (104 in all channels)
#
capjamesg[d]
Wow, a message from 2017!
#
Salt
right?!
#
aaronpk
Loqi never forgets
#
Salt
(my irc bouncer went down around then and only just got my matrix bridge connected to these channels)
#
[snarfed]
thepaperpilot cool! https://codeberg.org/fediverse/fep/src/branch/main/fep/ae97/fep-ae97.md discusses client signing in the fediverse, but activitypub's use of URLs for actor ids does tie fediverse users pretty inextricably to specific instances
#
[snarfed]
aaronpk is right that other protocols are more like what you describe. beyond SSB, which is overly p2p (to its detriment), Nostr and Farcaster in particular have the architecture you describe. key-based identity, client signing, data is sent to multiple servers ("relays," "hubs")
#
thepaperpilot
Yeah, which I see as a problem. Most people can't or won't self host, so they'll end up needing to pick a server, which adds a massive friction that centralized social media doesn't have, and reintroduces a lot of the problems centralization has. The re-envisioning would let you download an app, pick a display name, and you're good to do. No instance to pick, no password to set.
#
thepaperpilot
Nostr is very similar for sure. Haven't heard of farvaster though
#
thepaperpilot
[edit] Nostr is very similar for sure. Haven't heard of farcaster though
#
thepaperpilot
Hmm, farcaster looks like it still has the pick a server problem
#
[snarfed]
does it? I thought you broadcast to multiple hubs in Farcaster, not just one
#
thepaperpilot
Perhaps I need to continue looking into it, but it sounded like you're still making an account with a single server, attaching your identity to it
#
[snarfed]
I thought not, but I'm not sure either, will look
#
[tantek]
thepaperpilot it sounds like you're exploring some pretty cool (user-friendlier than status quo) ideas! appreciate that. thepaperpilot++
#
Loqi
thepaperpilot has 1 karma over the last year
#
[tantek]
[snarfed] it may be worth distinguishing "local rehosting" (which IMO implies a degree of /longevity) vs "local caching", e.g. in the context of your statement that "instance sends your post to all of your followers' instances, which rehost it" — do Masto instances really rehost remote content "forever"? Or do they merely "cache" remote content for some period of time, and expire/abandon it eventually?
#
[tantek]
Like I don't believe every Masto instance caches images from every remote host. That wouldn't be sustainable. Or maybe that explains various Masto instance shutdowns, when an instance exceeds sustainable local hosting setups (without costing the instance admin a bunch more money than they're willing to put out regularly for a hobby)
#
thepaperpilot
My understanding is the posts themselves truly are fully replicated and never expire, but images typically just remain links to the original source, or cached on each instance
#
thepaperpilot
Although I suspect photo and video centric platforms, like pixelfed, may also replicate the media as well
#
[snarfed]
definitely! there are clearly degrees of "rehosting." web and browser caches too, etc
#
[snarfed]
some Mastodon instances do rehost/cache remote media locally. maybe some other projects' too, not sure
#
[tantek]
to me (re)hosting implies a "forever" permalink, that is, it's there until the user deletes it. whereas "cache" implies it will be deleted whenever the code running decides to for whatever reason, and is dependent on being able to re-retrieve from an external source the thing that was "cached" if it's requested in the future
#
[tantek]
so there's a pretty firm boundary between the two. computer commits to keeping it until user removes it vs computer does not commit to keeping it around, and will delete it as necessary for other uses
gRegor joined the channel
#
thepaperpilot
For writing up the proposal, I was thinking about how to handle declaring what content is actually being signed. I think it would be best to have it sign the raw html, meaning if any replicator transforms the content, they'll need to somehow also provide the original html to clients so they can independently verify the signature. The signature block naturally could not be contained within the content being signed, so I was
timmarinin and geoffo joined the channel
#
[tantek]
part of the challenge here is that if the goal is "this is what the author wrote / intended" then a strict signature approach is too fragile
#
[tantek]
e.g. it fails trivially when an author fixes typos etc.
#
jimwins
Even though this sort of re-hosting is key to the architecture of ActivityPub/Mastodon, it also seems to me to be a constant source of confusion for users when suddenly that re-hosted content is used in a way they want to object to, like being bridged to another network, or whatever it is that Maven is doing.
#
[tantek]
or adjusts whitespace or other decorative markup, which should have no effect on the "is this what the author wrote / intended" question
#
[tantek]
yes, this is a problem with any protocol that rehosts without explicit user consent in the UI
#
[tantek]
users understanding caching to some extent, as in stuff that is temporary but goes away pretty quickly. automatic "rehosting" is more of a surprise to users because it violates their expectations about their ownership and control over their published data
#
jimwins
And the temperature goes up quickly when you have a VC-backed company in there, or something that has a whiff of "AI".
#
[tantek]
yes the "will this be used to train an LLM" concern has greatly shifted various debates about rehosting, caching, and even POSSEing
#
thepaperpilot
[tantek]: I think it'd be reasonable to have "rehosters" occasionally re-query the original source to look for updates. Naturally the edited version would have a new signature. You could even include an element within the signed content saying it previously had <old sig> as the signature. That whole thing being signed with the same private key would verify it was an edit by the original author
#
thepaperpilot
[tantek]: Fully agree. That said, no one would be _required_ to implement the signature block. And you could elsewhere describe what you're okay with people doing with your content, e.g. free for all non-commercial use, except training ai models
#
[tantek]
thepaperpilot yes that sounds like a better approach. also signing + versioning might provide a better (more accurate) UX
#
[tantek]
as in, quotation as of a certain date-time with signature to "prove" it, AND link to the original where any reader can verify the "current" version for themselves
#
[tantek]
that would work well with Webmentions and updates, because you could replace a quotation with the updated version
#
thepaperpilot
For sure, that all sounds like a good idea
#
jimwins
"occasionally re-query" is kind of hiding a lot of problems, I think. Mastodon already has a thundering-herd problem with generating link previews, now every post that someone somehow re-hosts will get bombarded with update checks?
#
thepaperpilot
It could be push based then, similar to webmentions
#
jimwins
That is how ActivityPub works now, to my understanding, when you edit a post, although I'm not sure how much signing goes on. What I think is missing is a peer-to-peer way for servers to pass along updates (signed) so that the source doesn't have to track and notify everyone who might care.
#
thepaperpilot
Right, that's fair. Not sure I have an adequate response to that
#
[tantek]
and that's getting into scuttlebut territory as [KevinMarks] noted
#
jimwins
Many years ago, there was a rough sketch of an idea that ended up being called "feedmesh". https://trainedmonkey.com/2004/9/9/decentralized_web_site_log__update_notifications_and_content_distribution
#
[tantek]
I wonder if those are worth capturing somewhere on the wiki
#
thepaperpilot
Even polling the original site is something a static site can't really do on its own, so I'm not a huge fan. Perhaps the rehosted content should just say "I'm specifically responding or liking to the version shown here, made at this timestamp. Check the original link for any updated content". It means less redundancy in the most recent revision of the content, though
#
[tantek]
yes this is good advice for all places where you might quote/cite/display content from another site
#
jimwins
Sometimes what you want to be commenting on may be tied to the particular revision. Like if you're commenting on Pew's recent report about "racial conspiracy theories" you probably want to make sure someone can tell your commentary is tied to the initial publication where they used that language instead of whatever they may end up revising it to be.
#
thepaperpilot
Right. In that sense, automatically updating the rehosted copy would be an anti-features. That simplifies things greatly, then
#
thepaperpilot
[edit] Right. In that sense, automatically updating the rehosted copy would be an anti-feature. That simplifies things greatly, then
jonnybarnes joined the channel
#
[KevinMarks]
Though there is the zombie site problem, where an expired domain is replaced by a scam entity which serves some version of the original site but with injected ad, phishing or crypto mining.
#
[KevinMarks]
C2PA has provisions for editing by first and other parties to indicate what they changed.
#
thepaperpilot
The new site couldn't sign the edited article, unless they somehow got access to the private key
#
thepaperpilot
So the content could still be verified, even if everything around it has been tampered
#
[tantek]
[KevinMarks] C2PA is largely hypothetical right now so I wouldn't use it for anything other than ideas. It's also horribly complex and based on JSON-LD
#
[snarfed]
hmm, afaik multiple hardware and software vendors have shipped C2PA support, right?
#
[snarfed]
(complexity/JSON-LD notwithstanding)
#
capjamesg[d]
I don't think C2PA is hypothetical. Instagram can detect if a photo has been generated with AI and displays a tag with it.
#
capjamesg[d]
I assume they are using C2PA, since one of the tools that causes it is generative fill in Adobe Photoshop.
#
aaronpk
it gets that surprisingly wrong 😂
#
aaronpk
i assume it is using AI to detect that
#
capjamesg[d]
Oooh, fascinating.
#
capjamesg[d]
I'm not sure.
#
capjamesg[d]
> We’re building industry-leading tools that can identify invisible markers at scale – specifically, the “AI generated” information in the C2PA and IPTC technical standards – so we can label images from Google, OpenAI, Microsoft, Adobe, Midjourney, and Shutterstock as they implement their plans for adding metadata to images created by their tools.
#
aaronpk
here's an artist whose photo was flagged as AI incorrectly https://www.instagram.com/p/C7jjJLPo9PU/?img_index=1
#
[snarfed]
technical design aside, I do appreciate that instead of trying to detect AI, which is a difficult non-deterministic arms race, the C2PA people took the opposite (tractable, deterministic) tack and identify "original" content and transformations on it
#
[KevinMarks]
And do domain based signing
#
[snarfed]
oh interesting!
#
[snarfed]
ok arguably C2PA is non-deterministic in a different sense, in that they'll never get 100% adoption from all hardware and software vendors, so C2PA will only ever identify a subset of all content, original or otherwise. still though, seems like a useful building block, I am glad it existss
#
[KevinMarks]
Yes, it's designed to be a trust in authority framework - which does fall down if someone working for the organisation is not trustworthy
#
[KevinMarks]
“Can we erase our history
#
[KevinMarks]
Is it as easy as this
#
[KevinMarks]
Plausible deniability
#
[tantek]
C2PA is another centralizing force though, because it is overly complicated (uses JSON-LD etc.), only large companies can implement it / publish it effectively. It's very much anti-indieweb in that regard
#
[tantek]
like all complex standards, I would advocate for categorically rejecting it as an indieweb developer
#
[snarfed]
it feels pretty orthogonal to me, more like the self hosting purity test fallacy. very few of us implement our own web servers from scratch, or rack our own servers in our basements - or in this case, build our own cameras or photo editing software - but we're still indieweb in both spirit and practice, even if we use hardware and software from other people, even big companies. we can use C2PA devices and software (or not!) and not
#
[snarfed]
necessarily "sacrifice" our indieweb-ness
#
[tantek]
I see the other extreme [snarfed] — C2PA feels like sneaking DRM into everything, just like "trusted computing" tried to do ages ago
#
[tantek]
like I have zero interest in C2PA tools for my HTML.
#
[tantek]
or in any way validating any use of C2PA for HTML or plain text or any of the other "simple" formats we publish
#
[tantek]
keep simple publishing simple
#
[tantek]
no C2PA tax
#
[snarfed]
point taken. sneaking DRM into everything is a fair argument. even if it's optional and small right now, and media-focused, don't want it to be a slippery slope toward expected/required
H4kor and [marksuth] joined the channel
#
[schmarty]
hey folks! i'm looking to improve the indieweb webring by making it self-gardening. (automatically marking sites as active/inactive when the webring links are detected on their page or not)
#
aaronpk
i assumed it already did that haha
#
[schmarty]
aaronpk: lol nah. the gardening method is someone complains aloud that there are a lot of dead sites and i go run the gardener on every site. 😅
#
[schmarty]
i think my tiers will be something like 1 day, 3 days, 7 days, 14 days, 30 days, plus some jitter to spread out in the day things run
#
aaronpk
the tiered thing works pretty well but most sites end up falling into the first or last tier https://media.aaronpk.com/2024/06/12141332-9319.png
#
[schmarty]
aaronpk++ that's good to know!
#
Loqi
aaronpk has 51 karma in this channel over the last year (138 in all channels)
#
[schmarty]
since the web ring links check is just binary (i found links or no), my goal is to actually have everything trend to the lowest (checked least often) tier. if it ever finds a different result than last time, i'll have it bump to the highest (checked soonest) tier.
#
capjamesg[d]
aaronpk What dashboard is that?
barnaby joined the channel
#
capjamesg[d]
Do you have a personal dashboard like that for all your projects?
#
aaronpk
i have a single munin install and i sometimes write plugins for specific projects
timmarinin joined the channel
#
[snarfed]
interesting bimodal distribution, I wouldn't have guessed it would look like that
gRegor joined the channel
#
aaronpk
there's definitely movement between the middle tiers
gRegorLove_, amyiscoolz and wagle joined the channel