#dev 2020-03-07

2020-03-07 UTC
pablinos and gRegorLove joined the channel
# 01:21 
jacky https://stork-search.net/
# 01:21 
jacky I'm very interested in search tooling
# 01:21 
jacky but like since my site's consistently dynamic, doing that is going to be hard
# 01:22 
jacky I'd have to like build an index and update it every time it changes
[tantek] joined the channel
# 01:29 
[tantek] could be a lot more efficient though, if you're able to do incremental index rebuilds rather than full recrawls
sullenbode and [jgmac1106] joined the channel
# 01:53 
[jgmac1106] jacky are you going to end up with a static copy of your site just for crawl and search purposes? Store just json from mf2 parser for searching?....I just use Google, Bridgy, and twitter
# 01:53 
[jgmac1106] only way I can find anything on my website
plut4rch joined the channel
# 01:54 
[jgmac1106] ...come to think about it I do also have a gSheet of every tweet, which also comes from Twitter
# 01:55 
[jgmac1106] maybe what we need a bridgy like index service, not that I even knwo what that would be
# 02:06 
jacky no I won't
# 02:06 
jacky tbh I might end up toying with this
# 02:06 
jacky but I'd want something that consumed my public 'master' feed and made that searchable
# 02:06 
jacky I think indiemap kinda did that with its corpora
# 02:07 
jacky but I can also ping sites like duckduckgo and superfeeder (which I think hits other public indicies) to index posts
# 02:18 
jacky [tantek]: it could!
# 02:33 
[tantek] Wait how do you ping DuckDuckGo? I want to do that!
shrysr and [chrisaldrich] joined the channel
# 02:40 
jacky hmm I thought I did; it looks like duckduckgo does it 'automatically'
# 02:41 
jacky kinda silly that I can't let them know
# 02:42 
jacky but now it makes me curious as to when they do re-indexing (or how)
# 02:42 
jacky found https://lmddgtfy.net/ tho
[KevinMarks], pablinos and [Michael_Beckwit joined the channel
# 04:00 
@mrkrndvs ↩️ Interested in your q @lucas_gonze about whether webmentions are worth time & effort? I think it is important to be mindful about expectation. Webmentions are useful, but still have limitations. Personally though, they've revolutionised my web experience https://collect.readwriterespond.com/lucas-gonze-on-webmentions/ (twitter.com/_/status/1236139564026269696)
SpencerDub, pablinos, nickodd, plut4rch, [Michael_Beckwit, leg, [jeremycherfas], [chrisaldrich] and cweiske joined the channel
# 07:19 
cweiske so aaronpk is websub.rocks running the latest code?
[KevinMarks], jenelizabeth, cweiske, pablinos, [jeremycherfas], [jgmac1106], plut4rch, jjuran, Guest96452 and Guest964522 joined the channel
# 13:47 
GWG Playing with JSON-LD to MF2 conversion for link previews
leg joined the channel
# 14:09 
beko GWG++
# 14:09 
Loqi GWG has 25 karma in this channel over the last year (139 in all channels)
# 14:09 
beko I'd love this feature :D
# 14:11 
GWG Why?
# 14:12 
GWG I am doing it for sites that don't support mf2
# 14:13 
beko GWG: yeah. I link to a lot of sites and not many support microformats at all but most have json-ld cuz google.
# 14:13 
beko So I usually check the source and if I find json-ld I understand it as the wish of the site owner to use this data.
# 14:14 
beko ott Medium totally lost it by now. I get Service Unavailable on trying to pre-fetch and checking the website in a browser I get a JS _app_. It totally broke the web by now o0
# 14:20 
GWG If you have any URLs, add to the issue on the Parse This repo
deltab, [KevinMarks], Guest9645225 and Scarecr0w joined the channel
# 15:22 
@HackerNews_Inc IndieAuth – Sign in with your domain name: https://indieauth.com/ (twitter.com/_/status/1236311199412817924)
# 15:36 
aaronpk Oh no lol
# 15:45 
aaronpk cweiske: I can take a look this morning.
nickodd, [tantek] and cweiske_ joined the channel
# 17:23 
GWG beko: No links?
# 17:25 
beko meh. broken pipe ._.
# 17:28 
beko GWG: In this specific case I tried to bookmark https://medium.com/@mdiluz/linux-gaming-at-the-dawn-of-the-2020s-a941dd602f61 - and that's not a webpage but an app for me. Knowing Medium I may have run into some sort of A/B testing here. Anyway Parse This gets "Service Unavailable" from this and I suspect it's a message from their webserver and not the  HTTP error code 503.
# 17:29 
GWG I am getting 404 on my phone
# 17:29 
beko wild.
# 17:32 
GWG Otherwise I would try to parse it
[schmarty] joined the channel
# 17:52 
[schmarty] I found a limitation with Hugo on the small EC2 server I was running it on. It was running out of memory due to 120mb of YouTube JSON metadata in the data folder.
# 17:53 
sknebel oops. non-streaming parser?
# 17:53 
[schmarty] (We have a little microsite/service that backs up and mirrors our YouTube channel and playlists)
# 17:54 
[schmarty] sknebel: it loads everything in the data folder that it can parse into memory. 😅
# 17:55 
GWG [schmarty]: How does it parse YouTube?
# 17:56 
[schmarty] GWG: it's the JSON metadata output from youtube-dl
# 17:57 
GWG [schmarty]: I am trying to get more metadata from YouTube. Will have to read more closely
# 17:57 
[schmarty] I was being lazy and had my templates pulling directly from those files even though they are mostly very long lists of URLs to chunked files in various formats
# 17:58 
GWG My problem is that I have, assuming a single URL...the ability to parse OGP, MF2, and now JSON-LD. I need to write a protocol to figure out which has the most useful data on a site without wasting resources
# 17:58 
[schmarty] So I changed my download script to move those bits of data into the markdown files instead, making it possible to delete the raw info files once an episode has been grabbed
# 17:59 
GWG [schmarty]: What fields does YouTube-dl find?
# 18:00 
[schmarty] Easiest way to find out is to use it!
# 18:00 
[schmarty] The fields my templates were using were title, description, categories, and duration.
# 18:00 
GWG [schmarty]: Will have a look
# 18:01 
GWG I am not getting duration
# 18:13 
jacky spending some time today rewriting some of my tests for koype around webmentions and gollllllllly I'm sorry lol
# 18:13 
jacky I can now tell how many times I hit remote endpoints and I hammer sites like 6x times just for one webmention
# 18:14 
[tantek] GWG maybe you can start documenting your /multiauthor experience in the wild?
# 18:14 
jacky once to make sure it's up, another to do a rep h-card check and once more to actually get the mf2 from it
# 18:14 
[tantek] Moving here from #microformats
# 18:14 
jacky but there's a bug that it'll miss the mf2 from the third pull and fetches it _again_
# 18:14 
jacky I think I know how to fix this though :)
[snarfed] joined the channel
# 18:16 
[snarfed] [yo] yuuup. 😆 you're forgiven
# 18:17 
aaronpk GET requests are cheap, it's fine :)
# 18:20 
[snarfed] https://chat.indieweb.org/dev/2020-02-10#t1581306795007500 😂
# 18:20 
Loqi [[snarfed]] so jacky evidently when you use bridgy publish, elixir sends four identical webmentions to bridgy for it, each just 100ms or so after the last, and bridgy publish's transactionality is a bit brittle. it doesn't do anything wrong, it only POSSEs once,...
# 18:20 
[snarfed] totally harmless, and fixed now on bridgy's side anyway. just funny
# 18:21 
jacky ^ that's been causing a bug on my end though where I can't grab the resulting URL (at times, it'll have to wait until the last webmention completes)
[fluffy] joined the channel
# 18:22 
GWG [tantek]: That is my plan.
# 18:23 
GWG Last week it was a different aspect of my link preview and citation code
# 18:24 
GWG Citing other forms of markup seems like a difficult move
[Jeff_Hawkins] and cweiske joined the channel; nickodd left the channel
# 19:58 
cweiske aaronpk, I just tried to subscribe chat search engine to https://chat.indieweb.org/dev/ and it failed on my side with "No hub URL found for topic". You send the hub link header on the /dev/ url, but also redirect it to the current date. Since i configured my client to follow redirects, I do not see the link headers. using the link header on the redirect url is a nice hack, but it's problematic IMO. do hubs
# 19:58 
cweiske actually support that?
# 20:01 
cweiske websub.rocks even has a test for this: https://websub.rocks/subscriber/201
# 20:01 
cweiske so I guess that the subscriber should look at the new url. the spec does not mention this case AFAIK
# 20:01 
aaronpk i thought that was in the original pubsubhubbub spec too
# 20:03 
aaronpk at the very least if there's a test for it in websub.rocks it's definitely in the websub spec
# 20:04 
cweiske https://www.w3.org/TR/websub/#x6-1-subscription-migration
# 20:04 
cweiske so your chat.indieweb.org hack does violate the spec
# 20:05 
aaronpk hm, chat.indieweb.org isn't trying to do a subscription migration
# 20:05 
cweiske but the topic I want to subscribe to sends a redirect
# 20:05 
aaronpk the intent is that "https://chat.indieweb.org/dev/" is the URL being subscribed to, both one that you should bookmark in your browser, and the topic used in the websub notification
# 20:06 
aaronpk it just so happens that when viewing it in a browser, it sends a redirect to the current day
# 20:06 
aaronpk but it would be wrong to subscribe to the URL of the current day, since that page stops updating at the end of the day
# 20:06 
cweiske so how does my subscriber know when to follow topic redirects?
# 20:06 
aaronpk i guess based on whether it sees a hub?
# 20:07 
aaronpk as soon as it sees a hub URL it can stop following the redirects
# 20:08 
cweiske I bet 0% of subscribers implement that behavior
# 20:08 
aaronpk there's a similar thing described for IndieAuth
# 20:24 
cweiske is it allowed to have the hub in the header, but the self in HTML or vice versa?
# 20:24 
aaronpk ooh good question, that would be messy
# 20:25 
cweiske especially in combination with redirects :)
# 20:25 
aaronpk indeed
# 20:26 
aaronpk i'm inclined to say no, they have to be specified in pairs, but i can't find any text that actually confirms or disagrees
# 20:26 
aaronpk oh wait
# 20:26 
aaronpk maybe
# 20:26 
aaronpk https://www.w3.org/TR/websub/#discovery
# 20:26 
aaronpk The protocol currently supports the following discovery mechanisms. Publishers MUST implement at least one of them:
# 20:26 
aaronpk the publisher SHOULD include at least one Link Header [RFC5988] with rel=hub (a hub link header) as well as exactly one Link Header [RFC5988] with rel=self (the self link header)
# 20:27 
aaronpk that implies that they must be specified together
# 20:27 
cweiske thanks.
# 20:43 
[tantek] Might be worth a /WebSub#FAQ?
# 20:43 
[tantek] seemed like quite the subtle implementation detail
[fluffy] joined the channel
# 20:52 
cweiske what is websub
# 20:52 
Loqi WebSub (previously known as PubSubHubbub or PuSH, and briefly PubSub) is a notification-based protocol for web publishing and subscribing to streams and legacy feed files in real time https://indieweb.org/WebSub
[LewisCowles] and [Michael_Beckwit joined the channel
# 21:18 
aaronpk btw confirmed websub.rocks is running the latest code from github
# 21:20 
cweiske hm :/
# 21:20 
cweiske then i have no idea what could be wrong
# 21:23 
aaronpk is this issue 19?
# 21:24 
cweiske yes
# 21:24 
cweiske btw, you should approve https://github.com/w3c/websub/pull/154
# 21:24 
Loqi [dunglas] #154 There are only 2 discovery mechanisms, not three
# 21:28 
aaronpk that's interesting, you got the error about "rel=self" but not "rel=hub"?
# 21:28 
aaronpk oh
# 21:29 
aaronpk i bet somehow the link header is being overridden to only send the last one in your request
# 21:30 
cweiske interesting idea
# 21:30 
aaronpk this code would have returned an error that rel=hub was missing first, so it did get that one
# 21:31 
aaronpk so either on your end it's sending only the last one, or on my end it's only seeing the last one
# 21:31 
aaronpk i see a comment in here that indicates it should be supporting multiple Link headers rather than a combined one, and i do remember trying that out a long time ago
# 21:32 
cweiske btw, I successfully subscribed to https://chat.indieweb.org/dev/ now
# 21:32 
cweiske but I do not get any pushes
# 21:32 
aaronpk ok that's progress! Let me check on this end
# 21:34 
aaronpk looks like the subscription is active
# 21:35 
cweiske should every chat message here trigger a push?
# 21:35 
aaronpk it throttles it to send a push no more than once every 2 minutes
# 21:35 
cweiske that's great
# 21:36 
cweiske just got a push 2 minutes ago
# 21:36 
aaronpk that was my manual one
# 21:36 
aaronpk i'm trying to see if the automatic triggering is working
# 21:38 
aaronpk it loooks like it should be working
# 21:39 
cweiske just got a ping
# 21:39 
aaronpk that was manual again :/
[Jeff_Hawkins] joined the channel
# 21:41 
aaronpk this is weird
# 21:42 
aaronpk test
joepobryant joined the channel
# 21:43 
aaronpk test
# 21:43 
aaronpk ohh
# 21:46 
aaronpk i think i got it
# 21:46 
aaronpk that should have sent you one
# 21:47 
cweiske now I need to find out why the crawler thinks that it got https://chat.indieweb.org/dev/2019-01-08
# 21:48 
aaronpk the redirect?
# 21:48 
aaronpk oh wait 01-08?
# 21:48 
aaronpk caching an old redirect?
# 21:48 
cweiske hm. could be
# 21:49 
cweiske /TODO: what if location redirects change?
# 21:50 
cweiske that's the comment in phinde :/
# 21:50 
cweiske so another FIXME task on my side
# 21:50 
cweiske thanks for the help!
# 21:50 
aaronpk hehe nice
# 21:51 
cweiske and for issue 19, it seems like it's a problem on your side
# 21:51 
aaronpk yeah i'm looking at that now
# 21:51 
aaronpk it's gonna take me a bit to set up a working test for this
# 21:51 
aaronpk could you try something real quick? try using "Link" instead of "link"
# 21:52 
aaronpk wait no it's not that because the rel=hub is making it through
# 21:52 
cweiske sorry, but the code strtolowers the headers somewhere
# 21:52 
aaronpk so it's definitely the fact that they're coming in on separete link headers
# 21:52 
cweiske that's hard to get out
# 21:52 
aaronpk ah
# 21:53 
cweiske I'm going to bed now. bb
# 21:53 
aaronpk 👍
# 22:05 
aaronpk ok definitely confirmed it's only seeing the last link header
# 22:06 
aaronpk weird, $_SERVER drops them totally
Pikseladam joined the channel
# 22:26 
aaronpk i don't understand why i can't find any documentation on this
# 22:33 
aaronpk ok how do other projects handle this? i can't be the only one to have this problem
[Ana_Rodrigues] and geoffo joined the channel
# 22:52 
aaronpk well i'm completely stumped. any thoughts anyone? https://github.com/aaronpk/websub.rocks/issues/19#issuecomment-596145290
# 22:52 
Loqi [aaronpk] Okay I tracked it down and it's only seeing the last `link` header that you're sending. I suspect this is because `$_SERVER` only shows the last header, and the framework it's using loads the headers from there. 
You mentioned this worked under PH...