moose333, gRegorLove_, btrem, Zegnat, rvalue, ren-, [fluffy], tiim, ren, bret, jan6, oodani, [pfefferle], gaussianblue and [Ana_R] joined the channel
#capjamesgThis evening we have a themed "half year in review" HWC London / Europe (starting 7pm UK time) in which we will be reflecting on the last six months on our websites and looking ahead to the rest of the year.
#Loqiheadless CMS is a content management system that only handles the backend parts of a traditional CMS (storage, editing interface, APIs, ...), with the visitor-facing side handled by a different system https://indieweb.org/headless_CMS
Xe, BigShip and [jacky] joined the channel
#[jacky]has anyone been successful in getting their site's content _redacted_ or pulled from public corpora used for LLMs?
#[jacky]I have a license on my site about the content use (share-alike with no attributions) but large organizations rarely seem to care about that with independent and smaller sites
#[jacky]perhaps a coalition of smaller sites could make them pay more attention and keep the web as human as possible?
tei_, btrem and rvalue joined the channel
#omz13[jacky] your site is licensed by-sa 4.0, so share and adapt open to all, including LLMs... unless they have broken the terms such as i.a. attribution, but that could be hard to prove (IANAL)
#[jacky]I'd believe the attribution bit is the part that'd be broken (as there's no way to find out _explicitly_ what parts and pages are included into a LLM)
#[jacky]I was considering changing it to prevent commercial use and potentially look into adding an addendum about AI use to that page
#[jacky]frankly, this goes back to a larger conversation around consent on the Internet (which also leans into who's given the right to it - that plays into privacy as well)
#omz13I suspect the problem is that LLMs are like a hash function: given the output it is hard/impossible to prove what went into (unless the original data is lurking in a database somewhere)
#omz13and the cc licence includes the words "reasonable" which means it could be argued (IANAL) that LLMs can't reasonably list all that went into their model as the list would be too long
#omz13did you see Black Mirror s6e1 ("Joan Is Awful")? it nicely poked fun at terms and conditions... same could be said for licensing (somebody will always abuse it to their advantage to do thinks you hadn't anticipated)
#[jacky]I have to watch it! On my docket for tonight
#[jacky]Yeah but that def feels like an escape hatch for those implementing LLMs to _avoid_ accountability (because of that hashing behavior)
seekr, tei_, t0nic, mikeputnam, romangeeko, [jamietanna] and [aciccarello] joined the channel
#[aciccarello]I'm afraid LLMs are going to cause a lot of people to turn away from CC licenses.
seekr joined the channel
#sknebelnot like "AI" companies aren't hovering up material thats not under a clear license, many of them seem to operate under a "licenses dont apply to us" model
bterry and gaussianblue joined the channel
#[aciccarello]Yeah, I think the companies believe legally they can use whatever under fair use.
#[snarfed]to be fair, short of licenses that explicitly say "you can't use this to train models," that may be an open legal question in most/all jurisdictions
#[snarfed](deeper legal q's like that may be less on topic here 😁)
btrem joined the channel; res0 left the channel
#[tantek]also depends if they are making money off their LLMs, the cc-*-nc licenses may be worth considering
#[tantek]I specifically use a -nc license on my Flickr photos for similar reasons, though I suspect that didn't stop all the use for training face recognizers 😕
#omz13fair use is easy in the US... less so in other jurisdictions... then again, many US companies think US law applies everywhere
gaussianblue and benji joined the channel
#[tantek]omz13, or they (e.g. startups) don't care because it won't prevent them from being acquired by deeper pockets who can then go deal with any international legal matters after the fact
tei_ joined the channel
#omz13[tantek] I think its more a cultural (US) thing: section 230, fair use; a libertarian attitude; and ignorance of how the rest of the world operates. Of course, it does not help that any damages are peppercorn in nature.
#omz13notwithstanding, it would probably be a good idea if CC got around to updating their licences...
#[tantek]agreed, we need CC updates to explicitly deal with use as input to LLMs
#omz13its been almost 10 years since CC 4.0 and the world has changed (and not for the better)