#[KevinMarks]At one of they Yahoo hack days, a friend of mine implemented a file system that used delicious URLs as the basic blocks. Joshua wasn't amused.
#Loqi[pluginkollektiv] Description
Say Goodbye to comment spam on your WordPress blog or website. Antispam Bee blocks spam comments and trackbacks effectively, without captchas and without sending personal information to third party services. It is free of charge, ad-free...
#capjamesg[d]The context is that the search index is rebuilding right now and has been for the last day or so.
#capjamesg[d]A lot of *questionable* links were found and could be traced back to one site.
#capjamesg[d]I will remove that one site from the index but I wondered if there's any easy wins one can use in spam prevention (without being really naive).
#Loqipetermolnar has 3 karma in this channel over the last year (31 in all channels)
#[KevinMarks]Blacklisting sites is good. One wrinkle we found at technorati was checking the referrer id in links that generate referral payments - that helped find networks of spammers feeding money to the same person.
#[tantek]capjamesg[d], in looking at that like page, I was confused by the "article title" heading, which, without punctuation or markup, reads ambiguously
#[tantek]and then further confused as to why I was seeing it twice (a second time below with different capitalization and markup)
#[tantek]"under the Like category" seems unnecessary in the context of the rest of the page 🙂
#[tantek]there was that one escort spam that adactio kept getting on his Webmention test receiver post
#voxpelli[tantek]: yeah, the wiki text seemed quite uncertain whether it actually was spam
#voxpelliHas there been any progress in the last couple of years on any spam prevention mechanisms?
#voxpelliI remember talks about some trust circle solution
#[tantek]voxpelli, hmm, maybe worth clarifying in the wiki then. I think it was a mix, like there was a legitimate test webmention post from the escort site, and then subsequently they started spamming
#[tantek]voxpelli, good q re: spam prevention, I think it has come up at nearly every IWS and many IWCs
#voxpelliRight, I’m on my phone right now, sitting on a bench in the sun, I might have a look later if I find time
#[tantek]for me personally, I started going down that path in figuring out how to filter Twitter responses / mentions, and then ended up getting stuck in how should I treat external content (e.g. from webmentions, reply-contexts) in terms of storage policy, expiration (if any), deletion (upon request) etc.
#voxpelliI’ve been there as well, like should I see the data from the mention as merely a cache and continuously recrawl it? Or should I keep it as is until I get a ping?
#aaronpkIIRC the first webmention he got could be a legitimate test from them, but then they sent a bunch more to other posts which feels less like testing at that point
#voxpelliAs anyone can ping, if eg a site was to remove all of their mentions, but not pinging as they didn’t want to disrupt other sites, then anyone else could send pings instead if they want to trigger a removal/update
#voxpelliSo maybe one could just as well recrawl them all slowly to ensure they are up to date, would probably help from a GDPR perspective as well
#[tantek]voxpelli, other issues, (e.g from Twitter), do I even want to show author photos of random webmentions which may be offensive?
#[tantek]or sometimes their "display name" (especially on Twitter) is modified to be something rude (not anything resembling an actual name or pseudonym)
#voxpelliVery true, and if you host an intermediate copy of it (to avoid loading directly from Twitter) then what if it is a copyright infringement even
#voxpelliEven more of an issue outside of Twitter where there is no moderation whatsoever
#[tantek]right, random uses an image they don't have rights for, cartoon characters etc.
#@tomayac↩️ I just wonder why sites don’t ask for content take-down. From the Webmention side it just seems pingback indeed. I think live with it, or create a local blocklist. (twitter.com/_/status/1440706065146941458)
#voxpelliYeah, I remember the good old phpBB days and such where on one forum someone had converted a large part of The Matrix to a tiny animated gif and uploaded that as his avatar, to the despair of all peoples internet connections which had to download it
hendursaga joined the channel
#@voxpelli↩️ If they do ask them for a take-down, then my guess is that the site won’t politely ping you about the take-down.
Not even sure Pingback supports any kind of removal / tombstoning.
Webmention does however. (twitter.com/_/status/1440706742392791046)
#voxpelli@capjamesg[d]: One hard thing in recrawl is to detect change frequency without false positives
#voxpelliEg. relative timestamps can make a site look updated all the time
#voxpelliIt’s a similar issue I have with Salmentions
#aaronpksame with me for fetching pages in aperture
#aaronpki ended up using a %-of-page-changed metric
#voxpelliIf I get a circular mention chain, then the Salmentions could end up pinging forever for that one, so at least one actor in the circle would need to decide that the ping isn’t significant enough to forward
#voxpelli@aaronpk: percentage of text or including the tags?
#voxpelliNice, I think my issue with Salmentions was that I wanted to do it on the parsed mf2 data, and especially if the content is implied, then it can contain something like a relative time stamp, but I guess I could do a conservative check and see if at that key is the only changed one, then at least 20-40% of it has to be changed
dotslashroot, shoesNsocks, ShinyCyril and pmn joined the channel
#pmnhello, in rss dialect, can <link> of <item> be a relative path? is yes, where does the client can get the prefix/host of the URL?
#aaronpkyou know i've never thought about it before, but it would be relative to the URL the RSS feed was fetched from, just like HTML
#[tantek]that makes sense aaronpk. I don't think RSS has anything like HTML's <base> element to alter that either
#pmn[specs] doesn't talk about it. does that mean that if the "base" is missing it would be an undefined behaviour and it would be up to the client to figure it out?
#[tantek]pmn, from that spec link, it appears that relative URLs are disallowed by RSS2:
#[tantek]"RSS places restrictions on the first non-whitespace characters of the data in <link> and <url> elements. The data in these elements must begin with an IANA-registered URI scheme"
#aaronpkoh but rss is perfect so there's no need to change it :eyeroll:
#[tantek]lol aaronpk, we can do our best to interpret the existing spec as-is though
#[tantek]Atom OTOH does allow for relative URLs, supported by the xml:base attribute per: "Any element defined by this specification MAY have an xml:base attribute [W3C.REC-xmlbase-20010627]. When xml:base is used in an Atom Document, it serves the function described in section 5.1.1 of [RFC3986], establishing the base URI (or IRI) for resolving any relative references found within the effective scope of the xml:base attribute."
#ZegnatI wonder: is there any reason to do RSS feeds today? I have the impression most services that read feeds will accept Atom. And Atom does clear up a couple of RSS oddities IMO.
#LoqiRSS is a set of XML feed file formats of varying degrees of use for syndicating time-stamped content from web sites, and sometimes used to refer more broadly to feed file formats as a whole including Atom, or even more broadly in vernacular as a synonym for feed file or even feeds or syndication as a concept https://indieweb.org/RSS
#Zegnatgpodder talks about Atom for podcast feeds too, but from a quick glance, it looks like Apple only accepts RSS for feeds in iTunes? That would definitely swing podcasts away from Atom.
hendursaga joined the channel
#[tantek]pmn asked about "RSS2" which does refer to a specific feed format, so that was a much more answerable question 🙂
#hendursagaI've never heard anyone mention JSON feeds, but apparently they exist: https://www.jsonfeed.org/
#[tantek]Zegnat, yes, podcasts are "stuck" on RSS because consumption by iTunes is such a "required" use-case, and iTunes hasn't shown any inclination toward consuming "Atom podcasts" — tbh I wouldn't even know how to mark those up? I bet KevinMarks does though
#[tantek]hendursaga yes! JSONfeed has been discussed here quite a bit too
#hendursagaI don't know which I hate more, JSON or XML
#[tantek]does adding JSON-LD to that mix change anything? 😉
#pmni'm writing something for taking ini files from user with item tags defined in them and creating a podcast feed out of it. out of Atom/json/rss2 which one would be broader so burden of support is less?
#pmngoing from rss2 (current implementaiton) to atom should be easy (i'm using libxml2) and json should be ok but for first release i think json can wait, no?
#pmnand it's really the matter of reader i guess. apple's podcast app (if it still exist) is what matters to me at this point, of all their eco-system. this is more for indie curators rather than big content producers.
#hendursagaNow, if only ROME was updated to parse and generate JSON feeds...
#[KevinMarks]Looks like there are tests for relative urls in the atom examples but not the rss ones
#[tantek]makes sense since the RSS2 spec doesn't allow for relative URLs
#[tantek]though I suppose there could be "negative" tests, making sure that they're properly ignored
maxwelljoslyn[d], hendursaga, [snarfed], tetov-irc and angelo joined the channel
#[KevinMarks]The feedparser philosophy is to converge formats, but test where they are defined. So undefined behaviour converges with the defined behaviour in different formats
#[KevinMarks](that's markp's version, it may have drifted)