#indieweb 2023-06-21

2023-06-21 UTC
moose333, gRegorLove_, btrem, Zegnat, rvalue, ren-, [fluffy], tiim, ren, bret, jan6, oodani, [pfefferle], gaussianblue and [Ana_R] joined the channel
#
c​apjamesg
This evening we have a themed "half year in review" HWC London / Europe (starting 7pm UK time) in which we will be reflecting on the last six months on our websites and looking ahead to the rest of the year.
#
IWDiscord
<c​apjamesg#0>
nertzy joined the channel
#
[tantek]
interesting
#
[tantek]
what is a headless CMS
#
Loqi
headless CMS is a content management system that only handles the backend parts of a traditional CMS (storage, editing interface, APIs, ...), with the visitor-facing side handled by a different system https://indieweb.org/headless_CMS
Xe, BigShip and [jacky] joined the channel
#
[jacky]
has anyone been successful in getting their site's content _redacted_ or pulled from public corpora used for LLMs?
#
[jacky]
I have a license on my site about the content use (share-alike with no attributions) but large organizations rarely seem to care about that with independent and smaller sites
#
[jacky]
perhaps a coalition of smaller sites could make them pay more attention and keep the web as human as possible?
tei_, btrem and rvalue joined the channel
#
omz13
[jacky] your site is licensed by-sa 4.0, so share and adapt open to all, including LLMs... unless they have broken the terms such as i.a. attribution, but that could be hard to prove (IANAL)
#
[jacky]
I'd believe the attribution bit is the part that'd be broken (as there's no way to find out _explicitly_ what parts and pages are included into a LLM)
#
[jacky]
I was considering changing it to prevent commercial use and potentially look into adding an addendum about AI use to that page
#
[jacky]
frankly, this goes back to a larger conversation around consent on the Internet (which also leans into who's given the right to it - that plays into privacy as well)
#
omz13
I suspect the problem is that LLMs are like a hash function: given the output it is hard/impossible to prove what went into (unless the original data is lurking in a database somewhere)
#
omz13
and the cc licence includes the words "reasonable" which means it could be argued (IANAL) that LLMs can't reasonably list all that went into their model as the list would be too long
#
omz13
did you see Black Mirror s6e1 ("Joan Is Awful")? it nicely poked fun at terms and conditions... same could be said for licensing (somebody will always abuse it to their advantage to do thinks you hadn't anticipated)
#
[jacky]
I have to watch it! On my docket for tonight
#
[jacky]
Yeah but that def feels like an escape hatch for those implementing LLMs to _avoid_ accountability (because of that hashing behavior)
seekr, tei_, t0nic, mikeputnam, romangeeko, [jamietanna] and [aciccarello] joined the channel
#
[aciccarello]
I'm afraid LLMs are going to cause a lot of people to turn away from CC licenses.
seekr joined the channel
#
sknebel
not like "AI" companies aren't hovering up material thats not under a clear license, many of them seem to operate under a "licenses dont apply to us" model
bterry and gaussianblue joined the channel
#
[aciccarello]
Yeah, I think the companies believe legally they can use whatever under fair use.
#
[snarfed]
to be fair, short of licenses that explicitly say "you can't use this to train models," that may be an open legal question in most/all jurisdictions
#
[snarfed]
(deeper legal q's like that may be less on topic here 😁)
btrem joined the channel; res0 left the channel
#
[tantek]
also depends if they are making money off their LLMs, the cc-*-nc licenses may be worth considering
#
[tantek]
I specifically use a -nc license on my Flickr photos for similar reasons, though I suspect that didn't stop all the use for training face recognizers 😕
#
omz13
fair use is easy in the US... less so in other jurisdictions... then again, many US companies think US law applies everywhere
gaussianblue and benji joined the channel
#
[tantek]
omz13, or they (e.g. startups) don't care because it won't prevent them from being acquired by deeper pockets who can then go deal with any international legal matters after the fact
tei_ joined the channel
#
omz13
[tantek] I think its more a cultural (US) thing: section 230, fair use; a libertarian attitude; and ignorance of how the rest of the world operates. Of course, it does not help that any damages are peppercorn in nature.
#
omz13
notwithstanding, it would probably be a good idea if CC got around to updating their licences...
#
[tantek]
agreed, we need CC updates to explicitly deal with use as input to LLMs
#
omz13
its been almost 10 years since CC 4.0 and the world has changed (and not for the better)
jonnybarnes joined the channel
#
[tantek]
yeah it's been a mixed bag since CC4
#
capjamesg
HWC London / Europe online starts in 6 mins.
[marksuth] joined the channel
#
[tantek]
oh wow time again!
trwnh and tei_ joined the channel
#
[tantek]
15 days til 20 years since the NYT article about too much data being bad was published: https://www.nytimes.com/2003/07/06/business/the-lure-of-data-is-it-addictive.html?pagewanted=all
#
[tantek]
15 days until 20 years since the NYT article about too much data being bad was published: https://www.nytimes.com/2003/07/06/business/the-lure-of-data-is-it-addictive.html?pagewanted=all
#
Loqi
I added a countdown scheduled for 2023-07-06 1:18pm PDT (#7038)
rvalue, tei_, btrem, eitilt1, angelo, [manton], nertzy, IWSlackGateway and [jacky] joined the channel
#
[jacky]
https://chat.indieweb.org/2023-06-21#t1687367522896900 heh this is true (esp with the crafting of NATO)
#
Loqi
[preview] [omz13] fair use is easy in the US... less so in other jurisdictions... then again, many US companies think US law applies everywhere
#
[jacky]
the only other option I can think of for my site is making _everything_ unlisted and require authentication
[tantek] and [snarfed] joined the channel
#
[snarfed]
oof. responding to AI by taking your whole site private is a hell of a tradeoff 😐
petermolnar and gRegor joined the channel
#
gRegor
Re: https://chat.indieweb.org/2023-06-21#t1687352194346900, is there a way to find out if our sites are in any currently?
#
Loqi
[preview] [[jacky]] has anyone been successful in getting their site's content _redacted_ or pulled from public corpora used for LLMs?
#
[jacky]
it seems to be the only means of ensuring consent of content use can be handled
#
[jacky]
gRegor: there was that _one_ article that you could put your domain in to see its ranking (if it existed)
#
gRegor
Have the link handy? I'm not famliar with that