#dev 2021-11-02

2021-11-02 UTC
akevinhuang joined the channel
#
[schmarty]1
capjamesg: here's another deep dive into (particularly formative) browser development https://webdevelopmenthistory.com/1990-programming-the-world-wide-web/ 😅
akevinhuang2 joined the channel
#
GWG
aaronpk: Thanks for the recommendations, committed
#
GWG
aaronpk: Thanks for the recommendations, committed again
#
aaronpk
i just realized a problem
#
aaronpk
this may need more discussion. requiring that the issuer URL is a prefix of the two endpoints means indieauth.com and tokens.indieauth.com are not valid anymore. i don't mind too much, since i'm not planning on continuing those services, but it does mean you can't split the two endpoints into different domains
#
aaronpk
so it might be okay in practice, but...
#
aaronpk
if that is a requirement, then we should probably also mention that the client should verify that the issuer URL is a prefix of the two endpoints too, otherwise the requirement isn't really doing anything
#
GWG
aaronpk: Only the two endpoints, or all endpoints as the introspection and revokation endpoint comes next?
#
aaronpk
also a good question
#
aaronpk
maybe this limitation isn't needed
#
GWG
aaronpk: The requirement in RFC8414 is that the identifier is related to the location of the authorization server metadata, not the authorization endpoint
#
GWG
In the as yet unnumbered Authorization Server Issuer Identification, it ties the issuer identification returned by the authorization endpoint to the one issued by the authorization metadata
#
aaronpk
then i'm thinking we should match that. seems like adding the requirement of the endpoints might not be adding any value
#
GWG
So, that would be it must be a prefix of the authorization server metadata location?
#
GWG
I think it's called earlier the indieauth-metadata endpoint.
#
aaronpk
right, since RFC8414 requires that the metadata URL is the issuer+.well-known, this requirement would be that the metadata URL is the issuer+something
#
GWG
It is so committed
#
GWG
Makes sense though, and avoids a breaking change we don't need.
hendursaga and gRegor joined the channel
#
capjamesg[d]
hendursaga TrustRank is actually a separate algorithm (despite having such a close name to PageRank). TrustRank is used for identifying spam.
#
capjamesg[d]
I am still trying to understand it.
#
capjamesg[d]
Thanks for sharing [schmarty]1!
hendursa1 joined the channel
#
nekr0z
<capjamesg[d]> "Maybe looking at what sites have..." <- Not just any links, likes and bookmarks! Wouldn't taking into account what IndieWeb thinks about a page be a proper thing for IndieWeb Search to base its ranking on? ;-)
#
capjamesg[d]
nekr0z There is already code to distinguish those link types so I can do an aggregate count to see which ones are most popular in the index.
#
capjamesg[d]
i.e. u-quotation-of, u-like-of.
#
capjamesg[d]
But I don't use that for ranking.
#
capjamesg[d]
What did you have in mind?
#
nekr0z
Exactly that. Google uses (somehow, at least in principle) the relationships defined by links to rank the pages and assess relevance and stuff, which makes sense since the Web is all about hyperlinks.
#
nekr0z
IndieWeb is all about IndieWeb-style interactions, u-like-of, u-bookmark-of, etc, etc. So those are what should count in IndieWeb Search ranking, right?
#
capjamesg[d]
Indeed they do!
#
capjamesg[d]
I could add a boost for certain link types.
#
capjamesg[d]
Since a u-like-of is more of a vouch for a source than a regular <a> link.
#
nekr0z
That way, the spam pages end up at the very bottom of the list automatically ^)
#
nekr0z
capjamesg[d]: Yep, that's exactly what I was trying to say here.
#
capjamesg[d]
Indeed. I'll see what I can do. Maybe a .5 extra weight per link for a special mf2 link?
#
capjamesg[d]
(examples: like, bookmark)
#
capjamesg[d]
Maybe excluding replies though because those don't necessarily indicate a vouch for a resource.
#
capjamesg[d]
One could write a reply complaining about the accuracy of a post.
#
nekr0z
Also, if we treat u-like-of as some kind of an upvote, does this mean we need downvotes, too? u-diss-of? u-hate-of? u-meh-of? :-)
#
capjamesg[d]
I think that would open up too many opportunities for abuse 😄
#
capjamesg[d]
I have seen two instances of spam so far.
#
capjamesg[d]
The most recent was this site: learningwithmoocs.com
#
capjamesg[d]
It came in through the IndieMap domains list (I'll file an issue to report [snarfed] so that you don't include it any crawls you do in the future).
#
nekr0z
I think I'll implement u-meh-of just for the sake of it. I sometimes like to devalue things ;)
#
capjamesg[d]
Will you have a /meh page on your site?
#
nekr0z
capjamesg[d]: That's an idea, too!
#
nekr0z
capjamesg++
#
Loqi
capjamesg has 16 karma in this channel over the last year (36 in all channels)
#
capjamesg[d]
Spam has been hard to catch so far. I am almost certain there is at least one spam site in the index right now.
#
capjamesg[d]
Even though all sites have been manually curated from the wiki / IndieMap / important community resources, old domains are sometimes bought and turned into spam sites.
#
capjamesg[d]
So I need... a plan for spam.
#
capjamesg[d]
(and not the cooked meat)
#
Ruxton
just neg score it
#
Ruxton
give it a spam score and multiply that score by some deduction to your ranking
gRegor joined the channel
#
capjamesg[d]
The issue right now is more identification.
#
capjamesg[d]
I have excluded some sites from the index that I have found are spam but they have still, in some cases, gotten to page one (mostly on search queries for which there was no other information available, but it's still page one).
#
capjamesg[d]
Because there are only about 400k documents in the index, not every page + site has reputable incoming links.
#
capjamesg[d]
Or, in some cases, any incoming links.
#
nekr0z
capjamesg so we need to make IndieWeb bigger? Ok, I'm on it. May take some time, tho. ;)
#
capjamesg[d]
Yep, that's what we need 🙂
#
capjamesg[d]
We have a lovely surface area of queries covered though.
#
capjamesg[d]
We just can't do everything.
#
nekr0z
capjamesg[d]: Doesn't hurt to try, tho, does it? ;)
#
capjamesg[d]
Oh no! There is no limit to how big the IndieWeb should become 🙂
#
capjamesg[d]
Not one link to a Google resource.
#
capjamesg[d]
TIL Google experimented with a "related" query that apparently still works.
#
capjamesg[d]
I found it from a blog post written in 2003.
#
Ruxton
"mostly on search queries for which there was no other information available" <-- and this should be how it works
#
Ruxton
Google and Bing dont stop showing it, they just kill it's score so it only turns up at the bottom of the search or when the search lacks other results
#
Ruxton
re: related - they used to have a link under pages to show related pages to that page inside your search term
#
capjamesg[d]
That's cool. Now they seem to be going more down the direction of "here are some other queries that you might like to do next" or however they say it.
#
capjamesg[d]
I have seen this a lot on desktop and on mobile (especially if you tap on a search result and then go back to the results page).
#
capjamesg[d]
Good point re: spam. Ruxton++
#
Loqi
Ruxton has 1 karma in this channel over the last year (2 in all channels)
#
capjamesg[d]
nekr0z I have considered crawling any site that has a second-order link to the IndieWeb.
#
capjamesg[d]
*any page, not site
#
capjamesg[d]
So instead of getting a link to Aaron's like of Cloudflare's BGP article, you would actually get the article: https://indieweb-search.jamesg.blog/results?query=bgp
#
Ruxton
capjamesg[d]: my site shows twice in your results O_o
#
capjamesg[d]
Fixing that is on my to-do list.
#
capjamesg[d]
My deduplication was not as sophisticated at the beginning.
#
@pulodev
Konten developer acak hari ini: Apa itu Webmention - https://jurnal.dev/webmention/
(twitter.com/_/status/1455460890745077767)
#
[KevinMarks]
Sounds like you're reinventing vote links
#
capjamesg[d]
How did they go in the first iteration [KevinMarks], whatever that may have been?
kogepan joined the channel
#
capjamesg[d]
How does this sound for spam prevention: look for domains with no incoming links from IndieWeb sites. If there are none, flag as review.
#
capjamesg[d]
Of course, this will not catch everything, but it's a start.
#
[KevinMarks]
We were using them explicitly for politics as well as ranking. The endpoint was treating rel="nofollow" as vote-abstain
#
[KevinMarks]
Also, I was told by Google search that PageRank doesn't stabilise with negative links. They treat all links as positive except nofollow ones which are neutral.
schmudde, marksuth[d], tetov-irc and jamietanna joined the channel
#
jamietanna
Should rel=alternate discovery for i.e. RSS only be for `<link>` or should it also be from `<a>`? I tried it earlier with an `<a>` and it didn't work in Aperture :thinking:
kogepan_, P1000[d], [tantek], hs0ucy and hendursaga joined the channel
#
[KevinMarks]
weird twitter bug - when I click though on this tweet it hides it and only shows a reply https://twitter.com/kevinmarks/status/4033539664
#
@kevinmarks
@dickc I thought it was a good point - a lot of the 'virtual currencies' are non-fungible by design. Roach Motel Money.
(twitter.com/_/status/4033539664)
schmudde joined the channel
#
GWG
jamietanna: Did you see the change to the metadata PR that was discussed here last night?
#
aaronpk
hmm foursquare is changing their api and are going to start charging for it, but also give a $200 free credit
kogepan joined the channel
#
[KevinMarks]
$200 how often?
kogepan joined the channel
#
aaronpk
a month
#
jamietanna
GWG I'll have a look later, I'd seen some convos but not read through changes yet
#
hs0ucy
a month? 0_o
#
capjamesg[d]
How many API calls will that get?
#
aaronpk
unclear. they don't seem to have that info available yet
#
jamietanna
Re the rel=alternate question, looks like it has to be a `<link>` https://www.rssboard.org/rss-autodiscovery
#
jamietanna
aaronpk any idea why Aperture doesn't like `https://www.api.gov.uk/resources/links/` for `rel=alternate`?
#
aaronpk
not sure what you mean
akevinhuang, [Sam_Butler] and gRegor joined the channel
#
GWG
aaronpk: Hate to ask you again...but with jamietanna's approval, and your feedback, can you approve that PR, or do you want to have a look again when you have time?
[fluffy], akevinhuang2, barryf[d] and kogepan joined the channel
#
capjamesg[d]
nekr0z I took into account your suggestion. Links now have special weight if they are marked up with certain mf2 properties.
#
Loqi
nekr0z has 8 karma in this channel over the last year (9 in all channels)
#
capjamesg[d]
Ruxton I fixed the duplication issues. You should only see your site once now.
#
Loqi
Ruxton has 2 karma in this channel over the last year (3 in all channels)
#
capjamesg[d]
Thanks [KevinMarks] for the link to the microformats wiki post about vote links.
akevinhuang joined the channel
#
[KevinMarks]
The microformats approach is to prefer a over link. (visible over invisible); webmention I think checks link first and only a if link is not there
#
jamietanna
Sorry aaronpk I mean that I'm trying to get Aperture to discover feeds from that URL, but it doesn't work, even though I've got a `<link rel=alternate>`
angelo, [calumryan], akevinhuang, schmudde, tetov-irc, [chrisaldrich] and sebbu joined the channel