capjamesgThis evening we have a themed "half year in review" HWC London / Europe (starting 7pm UK time) in which we will be reflecting on the last six months on our websites and looking ahead to the rest of the year.
Loqiheadless CMS is a content management system that only handles the backend parts of a traditional CMS (storage, editing interface, APIs, ...), with the visitor-facing side handled by a different system https://indieweb.org/headless_CMS
[jacky]I have a license on my site about the content use (share-alike with no attributions) but large organizations rarely seem to care about that with independent and smaller sites
omz13[jacky] your site is licensed by-sa 4.0, so share and adapt open to all, including LLMs... unless they have broken the terms such as i.a. attribution, but that could be hard to prove (IANAL)
[jacky]I'd believe the attribution bit is the part that'd be broken (as there's no way to find out _explicitly_ what parts and pages are included into a LLM)
[jacky]frankly, this goes back to a larger conversation around consent on the Internet (which also leans into who's given the right to it - that plays into privacy as well)
omz13I suspect the problem is that LLMs are like a hash function: given the output it is hard/impossible to prove what went into (unless the original data is lurking in a database somewhere)
omz13and the cc licence includes the words "reasonable" which means it could be argued (IANAL) that LLMs can't reasonably list all that went into their model as the list would be too long
omz13did you see Black Mirror s6e1 ("Joan Is Awful")? it nicely poked fun at terms and conditions... same could be said for licensing (somebody will always abuse it to their advantage to do thinks you hadn't anticipated)
sknebelnot like "AI" companies aren't hovering up material thats not under a clear license, many of them seem to operate under a "licenses dont apply to us" model
[snarfed]to be fair, short of licenses that explicitly say "you can't use this to train models," that may be an open legal question in most/all jurisdictions
[tantek]I specifically use a -nc license on my Flickr photos for similar reasons, though I suspect that didn't stop all the use for training face recognizers 😕
[tantek]omz13, or they (e.g. startups) don't care because it won't prevent them from being acquired by deeper pockets who can then go deal with any international legal matters after the fact
omz13[tantek] I think its more a cultural (US) thing: section 230, fair use; a libertarian attitude; and ignorance of how the rest of the world operates. Of course, it does not help that any damages are peppercorn in nature.