#dev 2023-08-09

2023-08-09 UTC
jacknemitz, bterry, tei_, [Jo], gerben, wagle, [snarfed]1, [jamietanna]1, IWSlackGateway1, [aciccarello]1, joshproehl, omz13, mooff, aaronpk, t0nic, Kaja, ludovicchabant and lanodan joined the channel
#
c​apjamesg
This is a sad story. I really appreciate the analyses made available on this tool: https://blog.shaxpir.com/taking-down-prosecraft-io-37e189797121
tei_ joined the channel
#
aaronpk
Cute story but if you look at the feedback from other writers it gives a very different picture
gerben, tei_, geoffo, [dave] and jacknemitz joined the channel
#
c​apjamesg
What isn't clear is how he got the data.
tei_ and tei_1 joined the channel
#
[snarfed]
sure it is. "I only ever incorporated books that were published publicly, and whose text could easily be found by crawling the internet."
#
[snarfed]
[aaronpk] got a link on that feedback you saw? he was very vague about them. "...in the meantime, “AI” became a thing. And the arrival of AI on the scene has been tainted by early use-cases that allow anyone to create zero-effort impersonations of artists, cutting those creators out of their own creative process."
btrem joined the channel
#
sebbu
"whose text could easily be found by crawling the internet" means nothing
#
sebbu
the internet includes libgen, zlibrary and archive.org, and all 3 contains commercial stuff that aren't free
eitilt joined the channel
#
c​apjamesg
Herein lies an interesting question of the definition of AI.
#
c​apjamesg
Sentiment analysis with say Transformers is definitely AI in my book.
#
c​apjamesg
But quantitative linguistics depending on counting different types of words? That is more data analysis.
#
c​apjamesg
On a different note I am sort disappointed about the amount of intuitive quantitative linguistics learning resources.
#
c​apjamesg
Perhaps GPT has replaced the need for a lot of the techniques but I am still fascinated in statistical approaches to language analysis.
#
aaronpk
i saw the link in here the other day...now i can't find it
#
[snarfed]
capjamesg sounds like NLP! (natural language processing, not neuro-linguistic programming)
#
[snarfed]
sebbu he addressed licensing/legality too. "I researched copyright laws, mindful of not wanting to hurt or offend the community of authors that I cared so much about. Since I was only publishing summary statistics, and small snippets from the text of those books, I believed I was honoring the spirit of the Fair Use doctrine, which doesn’t require the consent of the original author."
#
c​apjamesg
I wouldn’t refer to all NLP as AI though.
#
c​apjamesg
Bayesian statistics on language are NLP but not really AI.
#
c​apjamesg
Although AI is the nebulous word in the room 😅
#
[snarfed]
sure, of course. at this point the term AI is so broad that there probably isn't a single universal definition. which is fine
#
c​apjamesg
Which I think is a problem?
#
[snarfed]
aaronpk++ thx
#
Loqi
aaronpk has 28 karma in this channel over the last year (91 in all channels)
#
c​apjamesg
> You see, Benji took 23,000 novels — copyrighted novels — and pumped them through (allegedly) some kind of LLM, in order to "analyze" things like word length, emotional points, and tone.
#
c​apjamesg
Is that true?
#
[snarfed]
yeah that seems like a deep misunderstanding of what prosecraft did. it sounded clearly like analysis, not AI and definitely not LLM related. like capjamesg mentioned.
#
[snarfed]
so evidently a vocal minority of writers both willfully misinterpreted it as AI and had a knee jerk "tech evil" response, which...ok... 🤷 but long term, it seems hard to believe they'll be on the right side of history on that
#
c​apjamesg
I haven’t spent a fair bit of time reading about quantitative linguistics and I haven’t really seen “AI” used as an approach.
#
aaronpk
there's also the part about the passive voice detector being wrong
#
c​apjamesg
You can detect that statistically IIRC.
#
[snarfed]
ok sure. if the worst complaint is that it was imperfect...ok. everything is
#
[snarfed]
but that was a small tangent
#
c​apjamesg
All contingent on heuristics.
#
sebbu
c​apjamesg#0, bayesian filters from 20 years ago against spam is already NLP
#
[snarfed]
(they also made some vivid legal/ownership claims - stealing, plagiarism, etc - which seem pretty clearly wrong. I expect he's right that this fell squarely into fair use. but that's also probably a side note, not the root of their unhappiness)
#
sebbu
and markov chains
tei_ joined the channel
#
c​apjamesg
sebbu I agree 😄
#
c​apjamesg
But I wouldn't call it AI.
#
c​apjamesg
> They hate being stolen from, and make no mistake: having their works copied and fed through some tool, available on some web site, with excerpts and word clouds, is stealing.
#
c​apjamesg
Excerpts and word clouds sound like fair use.
#
[snarfed]
people tend to blur the line between legal and moral arguments, but legally, in the US and countries with similar doctrines, definitely
btrem joined the channel
#
sebbu
well, fair use never had a strict, precise, objective definition
#
sebbu
some people consider a synopsis fair use, some don't
#
sebbu
some people consider a full translation fair use, some don't
#
sebbu
(for the synopsis, it's because it might spoil some parts of the book)
#
[snarfed]
again it's worth distinguishing legal vs other. fair use is a specific, concrete legal term that's defined pretty clearly by a ton of case law and precedent
#
[snarfed]
when people say "fair use," that's often what they're referring to
#
[tantek]
didn't Google Books win those lawsuits
#
[snarfed]
and tons of parody, criticism, and similar works that had nothing to do with tech. afaik those were the earlier origins of fair use
gerben joined the channel
#
sebbu
[snarfed], specific maybe, but not with explicitely specified limits (the size of the excerpts, the similarity between the original and the parody, etc...)
#
[snarfed]
sebbu maybe so. I'm not a copyright/fair use lawyer, but the conversation here was much broader and shallower than any of those nuances. if you're right, and if those were the only points under debate, I doubt anyone would have been nearly as worked up
#
[snarfed]
(having said that, my experience with fair use is that it's been broadly upheld. I haven't seen a ton of fair use cases fail because an excerpt was a few words too long, or a parody was too "similar" to the original work. but again IANAL etc)
tei_, tei_1, h4kor, IWSlackGateway, btrem and [timothy_chambe] joined the channel
#
[timothy_chambe]
Threads gets rel=me support today….
#
capjamesg
[timothy_chambe]++
#
Loqi
[timothy_chambe] has 4 karma in this channel over the last year (28 in all channels)
[dave], [benatwork] and [KevinMarks] joined the channel
#
[KevinMarks]
Nicely done Tim. Do they do the other half yet? (look for inbound rel=me to label the outbound links)
tei_ and [Jo] joined the channel
#
[timothy_chambe]
Thanks Kevin. Will look.
[tantek], geoffo, tei_ and [snarfed] joined the channel
#
capjamesg
[snarfed] How do you find BSD?
#
capjamesg
I have only ever used Ubuntu and a few OSes on the Raspberry Pi (Debian, Ubuntu MATE).
#
[snarfed]
capjamesg ??
#
[snarfed]
oh you mean on http://snarfed.org. it's a shared account, I don't admin the OS
#
capjamesg
I want to visit all those places now!
#
Loqi
[preview] [Ryan Barrett] Like all of you, I got a little stir crazy over the last year. I made it outside plenty, but I still felt cooped up now and then, and I dearly missed going new places and seeing new things. On another note, I really, really like breakfast. Like, a lo...
#
[snarfed]
hah thanks!
#
[snarfed]
the google maps list has expanded a lot since then, https://goo.gl/maps/59hsryC8wLh8zbo1A . sadly I still don't have a comparable indie workflow for capturing new places. one of the few things I don't have native on my site
bterry joined the channel