#dev 2021-11-02
2021-11-02 UTC
akevinhuang joined the channel
# [schmarty]1 capjamesg: here's another deep dive into (particularly formative) browser development https://webdevelopmenthistory.com/1990-programming-the-world-wide-web/ 😅
akevinhuang2 joined the channel
# aaronpk this may need more discussion. requiring that the issuer URL is a prefix of the two endpoints means indieauth.com and tokens.indieauth.com are not valid anymore. i don't mind too much, since i'm not planning on continuing those services, but it does mean you can't split the two endpoints into different domains
hendursaga and gRegor joined the channel
# capjamesg[d] hendursaga TrustRank is actually a separate algorithm (despite having such a close name to PageRank). TrustRank is used for identifying spam.
# capjamesg[d] I am still trying to understand it.
# capjamesg[d] Thanks for sharing [schmarty]1!
hendursa1 joined the channel
# capjamesg[d] nekr0z There is already code to distinguish those link types so I can do an aggregate count to see which ones are most popular in the index.
# capjamesg[d] i.e. u-quotation-of, u-like-of.
# capjamesg[d] But I don't use that for ranking.
# capjamesg[d] What did you have in mind?
# capjamesg[d] Indeed they do!
# capjamesg[d] I could add a boost for certain link types.
# capjamesg[d] Since a u-like-of is more of a vouch for a source than a regular <a> link.
# capjamesg[d] Indeed. I'll see what I can do. Maybe a .5 extra weight per link for a special mf2 link?
# capjamesg[d] (examples: like, bookmark)
# capjamesg[d] Maybe excluding replies though because those don't necessarily indicate a vouch for a resource.
# capjamesg[d] One could write a reply complaining about the accuracy of a post.
# capjamesg[d] I think that would open up too many opportunities for abuse 😄
# capjamesg[d] I have seen two instances of spam so far.
# capjamesg[d] The most recent was this site: learningwithmoocs.com
# capjamesg[d] It came in through the IndieMap domains list (I'll file an issue to report [snarfed] so that you don't include it any crawls you do in the future).
# capjamesg[d] Will you have a /meh page on your site?
# capjamesg[d] Spam has been hard to catch so far. I am almost certain there is at least one spam site in the index right now.
# capjamesg[d] Even though all sites have been manually curated from the wiki / IndieMap / important community resources, old domains are sometimes bought and turned into spam sites.
# capjamesg[d] So I need... a plan for spam.
# capjamesg[d] (and not the cooked meat)
gRegor joined the channel
# capjamesg[d] The issue right now is more identification.
# capjamesg[d] I have excluded some sites from the index that I have found are spam but they have still, in some cases, gotten to page one (mostly on search queries for which there was no other information available, but it's still page one).
# capjamesg[d] Because there are only about 400k documents in the index, not every page + site has reputable incoming links.
# capjamesg[d] Or, in some cases, any incoming links.
# capjamesg[d] Yep, that's what we need 🙂
# capjamesg[d] We have a lovely surface area of queries covered though.
# capjamesg[d] We just can't do everything.
# capjamesg[d] Oh no! There is no limit to how big the IndieWeb should become 🙂
# capjamesg[d] I love this search page: https://indieweb-search.jamesg.blog/results?query=google+search
# capjamesg[d] Not one link to a Google resource.
# capjamesg[d] TIL Google experimented with a "related" query that apparently still works.
# capjamesg[d] I found it from a blog post written in 2003.
# capjamesg[d] That's cool. Now they seem to be going more down the direction of "here are some other queries that you might like to do next" or however they say it.
# capjamesg[d] I have seen this a lot on desktop and on mobile (especially if you tap on a search result and then go back to the results page).
# capjamesg[d] Good point re: spam. Ruxton++
# capjamesg[d] nekr0z I have considered crawling any site that has a second-order link to the IndieWeb.
# capjamesg[d] *any page, not site
# capjamesg[d] So instead of getting a link to Aaron's like of Cloudflare's BGP article, you would actually get the article: https://indieweb-search.jamesg.blog/results?query=bgp
# capjamesg[d] Fixing that is on my to-do list.
# capjamesg[d] My deduplication was not as sophisticated at the beginning.
# @pulodev Konten developer acak hari ini:
Apa itu Webmention - https://jurnal.dev/webmention/ (twitter.com/_/status/1455460890745077767)
# [KevinMarks] Sounds like you're reinventing vote links
# capjamesg[d] How did they go in the first iteration [KevinMarks], whatever that may have been?
kogepan joined the channel
# capjamesg[d] How does this sound for spam prevention: look for domains with no incoming links from IndieWeb sites. If there are none, flag as review.
# capjamesg[d] Of course, this will not catch everything, but it's a start.
# [KevinMarks] We were using them explicitly for politics as well as ranking. The endpoint was treating rel="nofollow" as vote-abstain
# [KevinMarks] Also, I was told by Google search that PageRank doesn't stabilise with negative links. They treat all links as positive except nofollow ones which are neutral.
schmudde, marksuth[d], tetov-irc and jamietanna joined the channel
# jamietanna Should rel=alternate discovery for i.e. RSS only be for `<link>` or should it also be from `<a>`? I tried it earlier with an `<a>` and it didn't work in Aperture :thinking:
kogepan_, P1000[d], [tantek], hs0ucy and hendursaga joined the channel
# [KevinMarks] weird twitter bug - when I click though on this tweet it hides it and only shows a reply https://twitter.com/kevinmarks/status/4033539664
# @kevinmarks @dickc I thought it was a good point - a lot of the 'virtual currencies' are non-fungible by design. Roach Motel Money. (twitter.com/_/status/4033539664)
schmudde joined the channel
kogepan joined the channel
# [KevinMarks] $200 how often?
kogepan joined the channel
# jamietanna GWG I'll have a look later, I'd seen some convos but not read through changes yet
# hs0ucy a month? 0_o
# capjamesg[d] How many API calls will that get?
# jamietanna Re the rel=alternate question, looks like it has to be a `<link>` https://www.rssboard.org/rss-autodiscovery
# jamietanna aaronpk any idea why Aperture doesn't like `https://www.api.gov.uk/resources/links/` for `rel=alternate`?
akevinhuang, [Sam_Butler] and gRegor joined the channel
[fluffy], akevinhuang2, barryf[d] and kogepan joined the channel
# capjamesg[d] nekr0z I took into account your suggestion. Links now have special weight if they are marked up with certain mf2 properties.
# capjamesg[d] nekr0z++
# capjamesg[d] Ruxton I fixed the duplication issues. You should only see your site once now.
# capjamesg[d] Ruxton++
# capjamesg[d] Thanks [KevinMarks] for the link to the microformats wiki post about vote links.
akevinhuang joined the channel
# [KevinMarks] The microformats approach is to prefer a over link. (visible over invisible); webmention I think checks link first and only a if link is not there
# jamietanna Sorry aaronpk I mean that I'm trying to get Aperture to discover feeds from that URL, but it doesn't work, even though I've got a `<link rel=alternate>`
angelo, [calumryan], akevinhuang, schmudde, tetov-irc, [chrisaldrich] and sebbu joined the channel