#dev 2022-10-10

2022-10-10 UTC
gRegorLove__, tbbrown, jeremycherfas, gRegorLove_, geoffo, angelo, gxt and mro joined the channel
# 08:20 
capjamesg v0.3.1 of IndieWeb Utils is out. This release has been worked on since February: https://pypi.org/project/indieweb-utils/
# 08:46 
capjamesg [James_Van_Dyne]++
# 08:46 
Loqi [James_Van_Dyne] has 5 karma in this channel over the last year (7 in all channels)
# 08:46 
capjamesg angelo++
# 08:46 
Loqi angelo has 13 karma in this channel over the last year (18 in all channels)
# 08:47 
capjamesg tantek++
# 08:47 
Loqi tantek has 23 karma in this channel over the last year (72 in all channels)
tetov-irc, [campegg], brennan, barnaby, gRegorLove_, geoffo, mro, gRegorLove__, gRegor, AramZ-S[m] and [tantek] joined the channel
# 17:17 
capjamesg [KevinMarks] Can I submit a PR that prepares your cassis autolink function as a Python library w/ docs?
# 17:22 
[KevinMarks] Sounds good
barnaby and nertzy joined the channel
# 19:13 
angelo capjamesg wouldn't it be best integrated into indieweb-utils?
# 19:14 
capjamesg angelo I was thinking about that.
# 19:15 
capjamesg But there might be a licensing issue.
# 19:15 
capjamesg assis-autolink-py is licensed under CC BY-SA 3.0
# 19:15 
capjamesg *cassis
# 19:20 
h4kor[m] I'm currently adding author information to my h-entries. Is there a rule when to use u-photo or u-logo?
AramZS joined the channel
# 19:20 
capjamesg u-logo is not for a h-entry I don't think.
# 19:21 
capjamesg u-photo represents the "main" image in a post. This is a draft property.
# 19:22 
h4kor[m] sorry, I'm talking about the p-author h-card.
# 19:22 
capjamesg Ah! I made a mistake.
# 19:22 
capjamesg u-featured is for the main image in a post.
# 19:23 
capjamesg I'm not sure re: u-logo vs u-photo.
# 19:23 
capjamesg The spec says u-logo is "a logo representing the person or organization (e.g. a face icon)" and u-photo is "a photo of the person or organization".
# 19:24 
capjamesg I add u-photo and u-logo to my image in my h-card.
# 19:24 
h4kor[m] So, no harm in adding both tags if you don't have a logo and a photo for an author?
# 19:26 
capjamesg If you have a photo of yourself (or some icon that applies), I think that can be a logo and a photo.
# 19:27 
capjamesg [tantek] GWG?
# 19:27 
barnaby I don’t know of any implementations which do anything with u-logo other than perhaps using it as a fallback if there’s no u-photo, but I don’t see any reason not to use both for the same image if you consider it both a photo and a logo
# 19:29 
h4kor[m] barnaby: Got the same impression, I'll just add both. Thanks everybody :)
# 19:29 
capjamesg Happy to help!
# 19:29 
capjamesg h4kor[m]++ for working on your h-card!
# 19:29 
capjamesg h4kor[m]++
# 19:29 
capjamesg Loqi?
# 19:30 
barnaby u-logo is used in h-(x-)app, where it’s definitely more appropriate than photo
# 19:30 
capjamesg Loqi, are you lost in the matrix?
# 19:30 
barnaby so there are implementations which consume that property, just not on h-card afaik
# 19:31 
capjamesg I suppose it could also be used in a p-org with a nested h-card representing an organisation?
mro_ joined the channel
# 19:35 
barnaby it could be sensibly used in a variety of places, but that’s the only use-case for which I’m aware of actively-maintained consuming implementations
mro joined the channel
# 20:16 
angelo capjamesg that CC license being a hinderance for adopting a small utility function is precisely the reason why i'd like indieweb-utils to be public-domain-equivalent
# 20:21 
capjamesg I'm in support too.
# 20:21 
[tantek] capjamesg, as an h-card publisher, you can use either u-photo or u-logo if either seems to fit what you're publishing. using both shouldn't be necessary, because any consuming code that requires one should fall back to using the other
# 20:21 
angelo but practically speaking a cassis.py port to indieweb-utils would likely be sufficiently different in structure that i believe the license would no longer apply. you could still attribute the original authors. curious what they think.
# 20:22 
[tantek] angelo, I put the CC license on CASSIS precisely to slow down adoption while it was still "in development" to reduce the chance that bad/buggy versions spread & propagated
# 20:23 
[tantek] something like a rewrite in a different language is less likely to propagate bugs so that seems less of a concern
# 20:24 
[tantek] are there parts of cassis.js that you want to re-use whole like the regex for auto-linking? I could wrap those in a more re-use friendly license like BSD0
# 20:25 
capjamesg [tantek] Did you contribute to indieweb-utils?
# 20:25 
capjamesg I remember you chiming in on some PRs / issues.
# 20:26 
capjamesg We're trying to decide on a new license. BSD-0 is the frontrunner.
# 20:26 
barnaby afaik it’s fine to reuse code under a different license with written permission without the original author having to relicense their project. I do exactly that in taproot/indieauth, using indieweb chat logs as the record of permission https://github.com/Taproot/indieauth/blob/main/src/functions.php#L155
# 20:30 
angelo capjamesg i interpreted our back and forth as MIT0 being the frontrunner https://github.com/capjamesg/indieweb-utils/issues/73 but i couldn't find any arguments in favor of one over the other..
# 20:30 
Loqi [capjamesg] #73 Transition license to CC0
# 20:31 
capjamesg Yep. Sorry.
# 20:33 
[tantek] BSD0++ 🙂
# 20:33 
Loqi BSD0 has 1 karma over the last year
# 20:46 
angelo i don't believe we'd need the regex but i would like to carry over the embeds, especially fragmentions
# 20:46 
[tantek] or should I say 0BSD++
# 20:48 
[KevinMarks] That regex does fell like it needs it's own repo in some ways, so you can have a multiline version that explains it and a compact on that you can include.
# 20:49 
angelo https://gist.github.com/angelogladding/97ea7d469c5bd62701db10e6d8e362a8
# 20:49 
angelo i can go for 0BSD++
# 20:49 
angelo KevinMarks that's the same regex ^
# 20:50 
[KevinMarks] I thought Tantek was adding TLDs when indieweb people used them
# 20:51 
[tantek] pretty much yes
# 20:51 
capjamesg Ah so it's not all TLDs?
# 20:52 
[tantek] nah, because there's a signal to noise issue
# 20:52 
[tantek] if the noise of accidental auto-linking text that wasn't intended as a link is greater than the actual real world use of links to such domains, makes no sense to include it in the regex
# 20:52 
[tantek] i.e. completionism--
# 20:52 
Loqi completionism has -1 karma over the last year
[snarfed] joined the channel
# 20:53 
[snarfed] yup, witness Twitter auto-linking *.app in tweets
# 20:53 
[tantek] user-centered-design++
# 20:53 
Loqi user-centered-design has 1 karma over the last year
# 20:53 
[snarfed] and *.py, and others
# 20:54 
[tantek] "all TLDs" makes very little sense as a design constraint for anything except deep plumbing code and UI for registering domains IMO
# 20:54 
angelo i was going to test the regex against all TLDs, see which it didn't have and go from there..
# 20:56 
capjamesg [James_Van_Dyne] [tantek] I tagged you both in a comment about relicensing indieweb-utils.
# 20:58 
[tantek] IMO it's not only not worth it to  test "against all the TLDs" but would result in a worse quality function
# 20:59 
[tantek] capjamesg, a new comment? didn't see anything in https://github.com/capjamesg/indieweb-utils/issues/73
# 20:59 
Loqi [capjamesg] #73 Transition license to CC0
# 21:03 
[tantek] angelo, I believe for each new TLD I've added to the regex, for many years in the change comment, I put the use-case of the specific domain being linked to, or post linking to it, either way from an IndieWeb person as justification for adding the TLD
# 21:04 
[tantek] I think testing all ccTLDs would be reasonable as that's an i18n issue
# 21:04 
[tantek] but testing all rando new domains? nah, I'd expect them to be mostly garbage and have no desire to support auto-linking to mostly garbage
# 21:05 
angelo by "test against all TLDs" i meant i'll just run the test for myself so i can see a list of the unsupported links to figure out which ones a) are being supported by your regex and b) which are not but need to be (eg. gdn)
# 21:06 
[tantek] why "but need to be"? for what use-case?
# 21:08 
angelo eg. any new ccTLDs? do some research on any new three-letters (eg. lol, gdn)
# 21:11 
angelo i guess i don't fully appreciate the uniqueness of that list. yes, we'll probably want to leverage the work contained in that regex.
# 21:13 
[tantek] re: new ccTLDs yes that's what I said above as an exception for i18n
mro joined the channel
# 21:45 
angelo .ngo/.ong .organic .technology .house .camp .rocks .wtf; i intend to use this auto_link functionality in an editor with automatic preview so i can see solving the issue of false positives with some kind of "cancel" button that escapes the dots in the source
# 21:46 
angelo i just learned that you can search google for eg "site:wtf" and get results for only that suffix
# 21:48 
sknebel most false positives I see are typos, so preview/review/ability to edit also helps already :D
# 21:50 
angelo i'm all for a default mode and then a conservative/exhaustive flag for the other approach. agree that the problem is mostly typos which should be avoided in other ways. but there's also being.able.to.write.like.this sometimes?
# 22:05 
capjamesg chat.sr.ht autolinked being.able.to.write.like
# 22:05 
capjamesg But left of the .this
# 22:05 
angelo omg.lol
# 22:08 
barnaby I have a .wien domain if that’s worth anything :P
# 22:08 
angelo developer of microblog.pub hexa.ninja links to .fyi, .website and .social just above the fold
# 22:10 
angelo there isn't a .this so that false positive is the naive case.
# 22:16 
[tantek] preview doesn't solve the problem of noise / false positives
# 22:17 
[tantek] quite often folks apply auto_link to *past* content, in which case retro-linking text of the past is very likely to introduce noise especially with thing like .dev .app etc.
# 22:17 
[tantek] e.g. any (all?) gTLDs that are three letters that happen to match a file extension
# 22:20 
[tantek] angelo, is ".ngo/.ong .organic .technology .house .camp .rocks .wtf" a list you personally care about or ... ? because I know auto_link has supported .rocks and .wtf for quite some time.
# 22:21 
angelo i can see conservative mode when batch processing and exhaustive mode during real-time processing
gxt joined the channel
# 22:31 
[tantek] "conservative" could mean anything is the problem with that kind of label
# 22:32 
[tantek] if you want to really go that deep, you could offer something like the TZ database and a mode where you pass in a date and only TLDs as of that date are linked
# 22:36 
angelo i'm going off the regex in cassis.py which has clearly become outdated. first thing i'd do is redefine that regex more explicitly so we can see which are ccTLD. then, of the remaining domains, the list can be short (based on some criteria ie. popularity) or long (based on the "full" TLD list).
tetov-irc joined the channel
# 22:37 
[snarfed] but still needs human judgment even after TLD launch dates, because the ones [tantek] is thinking of (app dev py etc) seem to have way more false positives than true positives
# 22:37 
[snarfed] (anecdotal, based on watching Bridgy Twitter wms for many many years)
# 22:37 
[tantek] true [snarfed]
# 22:38 
[tantek] I'm against any framing of "full" or "complete" as more "correct" because the *result* is arguably *not* more correct again because of more noise than signal
# 22:38 
angelo you could also go in the opposite direction; support all tlds but skip a short list of known false positives
# 22:38 
[tantek] a lot of what I tried to encode into the design of auto_link is a sense of safety, that you won't "screw up" content by using it, e.g. the whole thing where it won't auto_link "twice" if you happen to run it on the same thing twice
# 22:39 
[tantek] false positives was only one noise source as an example.
# 22:39 
[tantek] the default is the new domains themselves will be used for noise, spam, grift, phishing etc.
# 22:39 
[tantek] new non-cc gTLDs should be considered harmful until proven otherwise
# 22:45 
[tantek] payment << Beware of the "acceptable use policy" of payment silos / handlers / processors that may or may not impede use of them, e.g. [[PayPal]] briefly had [https://web.archive.org/web/20221008011421/https://www.paypalobjects.com/marketing/ua/pdf/US/en/acceptableuse-full-110322.pdf a policy in 2022] that would fine user for spreading misinformation which resulted in a
# 22:45 
[tantek] [https://www.washingtonpost.com/politics/2022/10/10/paypal-faces-backlash-after-floating-fines-sharing-misinformation/ backlash] (Washington Post article)
# 22:45 
Loqi I couldn't get an edit token for the wiki
# 22:45 
[tantek] ^ oops, new wiki problem?
# 22:45 
gRegor gives Loqi an edit token
# 22:45 
Loqi Thanks, gRegor!
# 22:46 
[snarfed] ^ oh man I gotta try this now, this changes everything
# 22:46 
[snarfed] high fives Loqi
# 22:47 
gRegor [Oprah meme] you get a token! and you get a token!
# 22:48 
aaronpk uhoh, did i break something in the upgrade?
chenghiz_ joined the channel
# 22:52 
[tantek] does Loqi automatically get an edit token upon server restart?
tbbrown joined the channel
# 23:07 
aaronpk i don't even remember how it works tbh
# 23:09 
aaronpk ah right edit tokens are a per-edit thing
jeremy, [schmarty] and tbbrown joined the channel