#dev 2021-09-22

2021-09-22 UTC
[schmarty], nertzy and Ruxton joined the channel
#
@sprucekhalifa
New project: NodeJs implementation of IndieAuth A NodeJs version of IndieAuth service used for authentication, for JavaScript Developers(me) who find it difficult to run the PHP instance Live preview: https://indie.iamspruce.dev #OpenSource #indieWeb
(twitter.com/_/status/1440562075290374162)
#
capjamesg[d]
GWG I have my own Micropub client.
#
capjamesg[d]
It is still in development. I am not quite at the micropub extension stage just yet.
#
capjamesg[d]
[tantek] I like this example from the brainstorm: "likes Aaron Parecki's article "A Little Twitter Developer History"."
#
capjamesg[d]
This is what I do now on my posts (when they work :D).
#
capjamesg[d]
I pull the title of a post on the server using bs4 in Python.
#
capjamesg[d]
If you have time I'd love some feedback on my like page (layout, readability).
#
capjamesg[d]
I would love to add the heart emoji to those pages somewhere.
#
capjamesg[d]
I already do this on the feed at /likes/.
KartikPrabhu1 joined the channel
#
@bnijenhuis
↩️ I wrote it myself (with some references of course). I've explained how in another article I wrote :P ... https://bnijenhuis.nl/notes/2021-05-03-implementing-clientside-webmentions/
(twitter.com/_/status/1440575601979527177)
hendursa1 joined the channel
#
capjamesg[d]
"Itty bitty sites are contained entirely within their own link. "
#
capjamesg[d]
What a strange idea.
#
sknebel
Using url shorteners as storage ;)
tetov-irc joined the channel
#
petermolnar
Ackchyually... the GET parameters can contain a LOT of data. https://technomanor.wordpress.com/2012/04/03/maximum-url-size/
#
capjamesg[d]
Good link.
#
[KevinMarks]
At one of they Yahoo hack days, a friend of mine implemented a file system that used delicious URLs as the basic blocks. Joshua wasn't amused.
#
Loqi
[pluginkollektiv] Description Say Goodbye to comment spam on your WordPress blog or website. Antispam Bee blocks spam comments and trackbacks effectively, without captchas and without sending personal information to third party services. It is free of charge, ad-free...
#
capjamesg[d]
The context is that the search index is rebuilding right now and has been for the last day or so.
#
capjamesg[d]
A lot of *questionable* links were found and could be traced back to one site.
#
capjamesg[d]
I will remove that one site from the index but I wondered if there's any easy wins one can use in spam prevention (without being really naive).
#
capjamesg[d]
thanks petermolnar++
#
Loqi
petermolnar has 3 karma in this channel over the last year (31 in all channels)
#
[KevinMarks]
Blacklisting sites is good. One wrinkle we found at technorati was checking the referrer id in links that generate referral payments - that helped find networks of spammers feeding money to the same person.
#
[schmarty]
What is a blocklist?
#
Loqi
A block list is a list of accounts that a user has blocked on a service or site https://indieweb.org/blocklist
#
capjamesg[d]
Thanks everyone!
akevinhuang joined the channel
#
[tantek]
capjamesg[d], in looking at that like page, I was confused by the "article title" heading, which, without punctuation or markup, reads ambiguously
#
[tantek]
and then further confused as to why I was seeing it twice (a second time below with different capitalization and markup)
#
[tantek]
"under the Like category" seems unnecessary in the context of the rest of the page 🙂
#
@voxpelli
↩️ As far as I know, there hasn’t been any documented webmention spam yet, so far all such spam stems from implementations offering interoperability with eg pingback: https://indieweb.org/spam#Webmention
(twitter.com/_/status/1440699687124668428)
#
@voxpelli
↩️ I have yet to get any reports from anyone who have received spam through my webmention service (and I now see that I need to do some work so I can feel safe in opening up more accounts for it) https://webmention.herokuapp.com/
(twitter.com/_/status/1440700948603568142)
#
[tantek]
there was that one escort spam that adactio kept getting on his Webmention test receiver post
#
voxpelli
[tantek]: yeah, the wiki text seemed quite uncertain whether it actually was spam
#
voxpelli
Has there been any progress in the last couple of years on any spam prevention mechanisms?
#
voxpelli
I remember talks about some trust circle solution
#
[tantek]
voxpelli, hmm, maybe worth clarifying in the wiki then. I think it was a mix, like there was a legitimate test webmention post from the escort site, and then subsequently they started spamming
#
[tantek]
voxpelli, good q re: spam prevention, I think it has come up at nearly every IWS and many IWCs
#
voxpelli
Right, I’m on my phone right now, sitting on a bench in the sun, I might have a look later if I find time
#
[tantek]
makes sense 🙂
#
[tantek]
for me personally, I started going down that path in figuring out how to filter Twitter responses / mentions, and then ended up getting stuck in how should I treat external content (e.g. from webmentions, reply-contexts) in terms of storage policy, expiration (if any), deletion (upon request) etc.
#
voxpelli
I’ve been there as well, like should I see the data from the mention as merely a cache and continuously recrawl it? Or should I keep it as is until I get a ping?
#
aaronpk
IIRC the first webmention he got could be a legitimate test from them, but then they sent a bunch more to other posts which feels less like testing at that point
#
voxpelli
As anyone can ping, if eg a site was to remove all of their mentions, but not pinging as they didn’t want to disrupt other sites, then anyone else could send pings instead if they want to trigger a removal/update
#
voxpelli
So maybe one could just as well recrawl them all slowly to ensure they are up to date, would probably help from a GDPR perspective as well
#
voxpelli
Especially for eg images and names
#
[tantek]
voxpelli, other issues, (e.g from Twitter), do I even want to show author photos of random webmentions which may be offensive?
#
[tantek]
or sometimes their "display name" (especially on Twitter) is modified to be something rude (not anything resembling an actual name or pseudonym)
#
voxpelli
Very true, and if you host an intermediate copy of it (to avoid loading directly from Twitter) then what if it is a copyright infringement even
#
voxpelli
Even more of an issue outside of Twitter where there is no moderation whatsoever
#
[tantek]
right, random uses an image they don't have rights for, cartoon characters etc.
#
[tantek]
a *profile image*
#
@tomayac
↩️ I just wonder why sites don’t ask for content take-down. From the Webmention side it just seems pingback indeed. I think live with it, or create a local blocklist.
(twitter.com/_/status/1440706065146941458)
#
voxpelli
Yeah, I remember the good old phpBB days and such where on one forum someone had converted a large part of The Matrix to a tiny animated gif and uploaded that as his avatar, to the despair of all peoples internet connections which had to download it
hendursaga joined the channel
#
@voxpelli
↩️ If they do ask them for a take-down, then my guess is that the site won’t politely ping you about the take-down. Not even sure Pingback supports any kind of removal / tombstoning. Webmention does however.
(twitter.com/_/status/1440706742392791046)
#
voxpelli
@capjamesg[d]: One hard thing in recrawl is to detect change frequency without false positives
#
voxpelli
Eg. relative timestamps can make a site look updated all the time
#
voxpelli
It’s a similar issue I have with Salmentions
#
aaronpk
same with me for fetching pages in aperture
#
aaronpk
i ended up using a %-of-page-changed metric
#
voxpelli
If I get a circular mention chain, then the Salmentions could end up pinging forever for that one, so at least one actor in the circle would need to decide that the ping isn’t significant enough to forward
#
voxpelli
@aaronpk: percentage of text or including the tags?
#
aaronpk
% of the whole html page
#
voxpelli
Nice, I think my issue with Salmentions was that I wanted to do it on the parsed mf2 data, and especially if the content is implied, then it can contain something like a relative time stamp, but I guess I could do a conservative check and see if at that key is the only changed one, then at least 20-40% of it has to be changed
[fluffy] joined the channel
#
capjamesg[d]
voxpelli interesting.
#
capjamesg[d]
My really simple idea was just to run a cron job every so often (maybe weekly?) to recrawl 100 URLs from each site.
#
capjamesg[d]
The crawler would start at the home page so I would likely pick up all / most new posts.
#
capjamesg[d]
100 might even be too little. 250 could be better.
#
capjamesg[d]
And then run a "full crawl" of all pages on a less frequent cadence.
#
capjamesg[d]
Back when the search engine was just for my site, I used the last-modified HTTP header to determine changes.
#
capjamesg[d]
(not exclusively)
#
capjamesg[d]
But I never wrote a frequency algorithm.
#
capjamesg[d]
Not sure if that helps you 😄
dotslashroot, shoesNsocks, ShinyCyril and pmn joined the channel
#
pmn
hello, in rss dialect, can <link> of <item> be a relative path? is yes, where does the client can get the prefix/host of the URL?
#
aaronpk
you know i've never thought about it before, but it would be relative to the URL the RSS feed was fetched from, just like HTML
#
[tantek]
that makes sense aaronpk. I don't think RSS has anything like HTML's <base> element to alter that either
#
pmn
[specs] doesn't talk about it. does that mean that if the "base" is missing it would be an undefined behaviour and it would be up to the client to figure it out?
hendursa1 joined the channel
#
[tantek]
yes, re: undefined behavior 😕
#
Zegnat
From a quick Google, they did try to make base happen: https://www.rssboard.org/news/151/relative-links
#
[tantek]
pmn, from that spec link, it appears that relative URLs are disallowed by RSS2:
#
[tantek]
"RSS places restrictions on the first non-whitespace characters of the data in <link> and <url> elements. The data in these elements must begin with an IANA-registered URI scheme"
#
aaronpk
oh but rss is perfect so there's no need to change it :eyeroll:
#
[tantek]
lol aaronpk, we can do our best to interpret the existing spec as-is though
#
[tantek]
pmn, thus the short answer to "can <link> of <item> be a relative path?" is no, per the first sentence of https://validator.w3.org/feed/docs/rss2.html#comments
#
[tantek]
aaronpk, at least that spec admits it? https://validator.w3.org/feed/docs/rss2.html#roadmap "RSS is by no means a perfect format ..."
#
[tantek]
Atom OTOH does allow for relative URLs, supported by the xml:base attribute per: "Any element defined by this specification MAY have an xml:base attribute [W3C.REC-xmlbase-20010627]. When xml:base is used in an Atom Document, it serves the function described in section 5.1.1 of [RFC3986], establishing the base URI (or IRI) for resolving any relative references found within the effective scope of the xml:base attribute."
#
pmn
thanks
#
Zegnat
I wonder: is there any reason to do RSS feeds today? I have the impression most services that read feeds will accept Atom. And Atom does clear up a couple of RSS oddities IMO.
#
pmn
Zegnat: podcasts?
#
capjamesg[d]
How do RSS and Atom differ?
#
capjamesg[d]
Zegnat I see a lot of publications talk about their RSS feeds.
#
capjamesg[d]
I would speculate that Atom is known by far less than RSS, as terms.
#
pmn
Zegnat: oh, i was under the impression that Atom was the older attempt than RSS
#
Murray[d]
was about to say "brand recognition", pmn subtly made the same point 😄
#
Murray[d]
I see RSS being talked about, compared, etc. I never see anyone talking/mentioning Atom
#
Murray[d]
(I mean in the wider internet, not on Indieweb channels, but do include wider tech internet where it's still much rarer than RSS)
#
Murray[d]
*anecdotally
#
Zegnat
I have seen a lot of people use an "RSS" link or icon, yet still link to an Atom feed. But maybe RSS is still the big one.
#
[tantek]
"RSS" has more "brand" recognition because the term is (deliberately?) ambiguous
#
[tantek]
what is RSS
#
Loqi
RSS is a set of XML feed file formats of varying degrees of use for syndicating time-stamped content from web sites, and sometimes used to refer more broadly to feed file formats as a whole including Atom, or even more broadly in vernacular as a synonym for feed file or even feeds or syndication as a concept https://indieweb.org/RSS
#
Zegnat
gpodder talks about Atom for podcast feeds too, but from a quick glance, it looks like Apple only accepts RSS for feeds in iTunes? That would definitely swing podcasts away from Atom.
hendursaga joined the channel
#
[tantek]
pmn asked about "RSS2" which does refer to a specific feed format, so that was a much more answerable question 🙂
#
hendursaga
I've never heard anyone mention JSON feeds, but apparently they exist: https://www.jsonfeed.org/
#
[tantek]
Zegnat, yes, podcasts are "stuck" on RSS because consumption by iTunes is such a "required" use-case, and iTunes hasn't shown any inclination toward consuming "Atom podcasts" — tbh I wouldn't even know how to mark those up? I bet KevinMarks does though
#
[tantek]
hendursaga yes! JSONfeed has been discussed here quite a bit too
#
[tantek]
what is JSONfeed
#
Loqi
JSON Feed is a feed file in JSON format https://indieweb.org/jsonfeed
#
hendursaga
I don't know which I hate more, JSON or XML
#
[tantek]
does adding JSON-LD to that mix change anything? 😉
#
pmn
i'm writing something for taking ini files from user with item tags defined in them and creating a podcast feed out of it. out of Atom/json/rss2 which one would be broader so burden of support is less?
#
capjamesg[d]
I actually encountered JSON feeds today too.
#
capjamesg[d]
I saw it on the wiki a few weeks ago.
#
hendursaga
[tantek]: not particularly, plus the many JSON DSLs and whatnot don't help..
#
capjamesg[d]
I would go Atom, JSON, then RSS2.
#
capjamesg[d]
But that is anecdotal.
#
pmn
going from rss2 (current implementaiton) to atom should be easy (i'm using libxml2) and json should be ok but for first release i think json can wait, no?
#
pmn
and it's really the matter of reader i guess. apple's podcast app (if it still exist) is what matters to me at this point, of all their eco-system. this is more for indie curators rather than big content producers.
#
hendursaga
Now, if only ROME was updated to parse and generate JSON feeds...
#
pmn
Zegnat: relative-link was spot on, https://www.w3.org/TR/xmlbase/ thanks
#
Zegnat
pmn: from above, if you want Apple/iTunes podcast to work well, you are stuck with RSS
#
pmn
Zegnat: and no opus format...
#
pmn
namespace is used to extenting rss2.0; i wonder if that can help
#
pmn
but then i'm back to the question of adaptation by platforms, never mind
hendursa1 joined the channel
#
[KevinMarks]
You can mix namespaces in to add atom and iTunes extensions to rss. I'm pretty sure that universal feedparser has test cases for this
#
[KevinMarks]
Looks like there are tests for relative urls in the atom examples but not the rss ones
#
[tantek]
makes sense since the RSS2 spec doesn't allow for relative URLs
#
[tantek]
though I suppose there could be "negative" tests, making sure that they're properly ignored
maxwelljoslyn[d], hendursaga, [snarfed], tetov-irc and angelo joined the channel
#
[KevinMarks]
The feedparser philosophy is to converge formats, but test where they are defined. So undefined behaviour converges with the defined behaviour in different formats
#
[KevinMarks]
(that's markp's version, it may have drifted)
ShinyCyril joined the channel
#
[snarfed]
^ fascinating nuance
Seirdy and [cleverdevil] joined the channel