#dev 2020-03-07

2020-03-07 UTC
pablinos and gRegorLove joined the channel
#
jacky
I'm very interested in search tooling
#
jacky
but like since my site's consistently dynamic, doing that is going to be hard
#
jacky
I'd have to like build an index and update it every time it changes
[tantek] joined the channel
#
[tantek]
could be a lot more efficient though, if you're able to do incremental index rebuilds rather than full recrawls
sullenbode and [jgmac1106] joined the channel
#
[jgmac1106]
jacky are you going to end up with a static copy of your site just for crawl and search purposes? Store just json from mf2 parser for searching?....I just use Google, Bridgy, and twitter
#
[jgmac1106]
only way I can find anything on my website
plut4rch joined the channel
#
[jgmac1106]
...come to think about it I do also have a gSheet of every tweet, which also comes from Twitter
#
[jgmac1106]
maybe what we need a bridgy like index service, not that I even knwo what that would be
#
jacky
no I won't
#
jacky
tbh I might end up toying with this
#
jacky
but I'd want something that consumed my public 'master' feed and made that searchable
#
jacky
I think indiemap kinda did that with its corpora
#
jacky
but I can also ping sites like duckduckgo and superfeeder (which I think hits other public indicies) to index posts
#
jacky
[tantek]: it could!
#
[tantek]
Wait how do you ping DuckDuckGo? I want to do that!
shrysr and [chrisaldrich] joined the channel
#
jacky
hmm I thought I did; it looks like duckduckgo does it 'automatically'
#
jacky
kinda silly that I can't let them know
#
jacky
but now it makes me curious as to when they do re-indexing (or how)
[KevinMarks], pablinos and [Michael_Beckwit joined the channel
#
@mrkrndvs
↩️ Interested in your q @lucas_gonze about whether webmentions are worth time & effort? I think it is important to be mindful about expectation. Webmentions are useful, but still have limitations. Personally though, they've revolutionised my web experience https://collect.readwriterespond.com/lucas-gonze-on-webmentions/
(twitter.com/_/status/1236139564026269696)
SpencerDub, pablinos, nickodd, plut4rch, [Michael_Beckwit, leg, [jeremycherfas], [chrisaldrich] and cweiske joined the channel
#
cweiske
so aaronpk is websub.rocks running the latest code?
[KevinMarks], jenelizabeth, cweiske, pablinos, [jeremycherfas], [jgmac1106], plut4rch, jjuran, Guest96452 and Guest964522 joined the channel
#
GWG
Playing with JSON-LD to MF2 conversion for link previews
leg joined the channel
#
beko
GWG++
#
Loqi
GWG has 25 karma in this channel over the last year (139 in all channels)
#
beko
I'd love this feature :D
#
GWG
Why?
#
GWG
I am doing it for sites that don't support mf2
#
beko
GWG: yeah. I link to a lot of sites and not many support microformats at all but most have json-ld cuz google.
#
beko
So I usually check the source and if I find json-ld I understand it as the wish of the site owner to use this data.
#
beko
ott Medium totally lost it by now. I get Service Unavailable on trying to pre-fetch and checking the website in a browser I get a JS _app_. It totally broke the web by now o0
#
GWG
If you have any URLs, add to the issue on the Parse This repo
deltab, [KevinMarks], Guest9645225 and Scarecr0w joined the channel
#
aaronpk
Oh no lol
#
aaronpk
cweiske: I can take a look this morning.
nickodd, [tantek] and cweiske_ joined the channel
#
GWG
beko: No links?
#
beko
meh. broken pipe ._.
#
beko
GWG: In this specific case I tried to bookmark https://medium.com/@mdiluz/linux-gaming-at-the-dawn-of-the-2020s-a941dd602f61 - and that's not a webpage but an app for me. Knowing Medium I may have run into some sort of A/B testing here. Anyway Parse This gets "Service Unavailable" from this and I suspect it's a message from their webserver and not the HTTP error code 503.
#
GWG
I am getting 404 on my phone
#
beko
wild.
#
GWG
Otherwise I would try to parse it
[schmarty] joined the channel
#
[schmarty]
I found a limitation with Hugo on the small EC2 server I was running it on. It was running out of memory due to 120mb of YouTube JSON metadata in the data folder.
#
sknebel
oops. non-streaming parser?
#
[schmarty]
(We have a little microsite/service that backs up and mirrors our YouTube channel and playlists)
#
[schmarty]
sknebel: it loads everything in the data folder that it can parse into memory. 😅
#
GWG
[schmarty]: How does it parse YouTube?
#
[schmarty]
GWG: it's the JSON metadata output from youtube-dl
#
GWG
[schmarty]: I am trying to get more metadata from YouTube. Will have to read more closely
#
[schmarty]
I was being lazy and had my templates pulling directly from those files even though they are mostly very long lists of URLs to chunked files in various formats
#
GWG
My problem is that I have, assuming a single URL...the ability to parse OGP, MF2, and now JSON-LD. I need to write a protocol to figure out which has the most useful data on a site without wasting resources
#
[schmarty]
So I changed my download script to move those bits of data into the markdown files instead, making it possible to delete the raw info files once an episode has been grabbed
#
GWG
[schmarty]: What fields does YouTube-dl find?
#
[schmarty]
Easiest way to find out is to use it!
#
[schmarty]
The fields my templates were using were title, description, categories, and duration.
#
GWG
[schmarty]: Will have a look
#
GWG
I am not getting duration
#
jacky
spending some time today rewriting some of my tests for koype around webmentions and gollllllllly I'm sorry lol
#
jacky
I can now tell how many times I hit remote endpoints and I hammer sites like 6x times just for one webmention
#
[tantek]
GWG maybe you can start documenting your /multiauthor experience in the wild?
#
jacky
once to make sure it's up, another to do a rep h-card check and once more to actually get the mf2 from it
#
[tantek]
Moving here from #microformats
#
jacky
but there's a bug that it'll miss the mf2 from the third pull and fetches it _again_
#
jacky
I think I know how to fix this though :)
[snarfed] joined the channel
#
[snarfed]
[yo] yuuup. 😆 you're forgiven
#
aaronpk
GET requests are cheap, it's fine :)
#
Loqi
[[snarfed]] so jacky evidently when you use bridgy publish, elixir sends four identical webmentions to bridgy for it, each just 100ms or so after the last, and bridgy publish's transactionality is a bit brittle. it doesn't do anything wrong, it only POSSEs once,...
#
[snarfed]
totally harmless, and fixed now on bridgy's side anyway. just funny
#
jacky
^ that's been causing a bug on my end though where I can't grab the resulting URL (at times, it'll have to wait until the last webmention completes)
[fluffy] joined the channel
#
GWG
[tantek]: That is my plan.
#
GWG
Last week it was a different aspect of my link preview and citation code
#
GWG
Citing other forms of markup seems like a difficult move
[Jeff_Hawkins] and cweiske joined the channel; nickodd left the channel
#
cweiske
aaronpk, I just tried to subscribe chat search engine to https://chat.indieweb.org/dev/ and it failed on my side with "No hub URL found for topic". You send the hub link header on the /dev/ url, but also redirect it to the current date. Since i configured my client to follow redirects, I do not see the link headers. using the link header on the redirect url is a nice hack, but it's problematic IMO. do hubs
#
cweiske
actually support that?
#
cweiske
websub.rocks even has a test for this: https://websub.rocks/subscriber/201
#
cweiske
so I guess that the subscriber should look at the new url. the spec does not mention this case AFAIK
#
aaronpk
i thought that was in the original pubsubhubbub spec too
#
aaronpk
at the very least if there's a test for it in websub.rocks it's definitely in the websub spec
#
cweiske
so your chat.indieweb.org hack does violate the spec
#
aaronpk
hm, chat.indieweb.org isn't trying to do a subscription migration
#
cweiske
but the topic I want to subscribe to sends a redirect
#
aaronpk
the intent is that "https://chat.indieweb.org/dev/" is the URL being subscribed to, both one that you should bookmark in your browser, and the topic used in the websub notification
#
aaronpk
it just so happens that when viewing it in a browser, it sends a redirect to the current day
#
aaronpk
but it would be wrong to subscribe to the URL of the current day, since that page stops updating at the end of the day
#
cweiske
so how does my subscriber know when to follow topic redirects?
#
aaronpk
i guess based on whether it sees a hub?
#
aaronpk
as soon as it sees a hub URL it can stop following the redirects
#
cweiske
I bet 0% of subscribers implement that behavior
#
aaronpk
there's a similar thing described for IndieAuth
#
cweiske
is it allowed to have the hub in the header, but the self in HTML or vice versa?
#
aaronpk
ooh good question, that would be messy
#
cweiske
especially in combination with redirects :)
#
aaronpk
indeed
#
aaronpk
i'm inclined to say no, they have to be specified in pairs, but i can't find any text that actually confirms or disagrees
#
aaronpk
oh wait
#
aaronpk
The protocol currently supports the following discovery mechanisms. Publishers MUST implement at least one of them:
#
aaronpk
the publisher SHOULD include at least one Link Header [RFC5988] with rel=hub (a hub link header) as well as exactly one Link Header [RFC5988] with rel=self (the self link header)
#
aaronpk
that implies that they must be specified together
#
cweiske
thanks.
#
[tantek]
Might be worth a /WebSub#FAQ?
#
[tantek]
seemed like quite the subtle implementation detail
[fluffy] joined the channel
#
cweiske
what is websub
#
Loqi
WebSub (previously known as PubSubHubbub or PuSH, and briefly PubSub) is a notification-based protocol for web publishing and subscribing to streams and legacy feed files in real time https://indieweb.org/WebSub
[LewisCowles] and [Michael_Beckwit joined the channel
#
aaronpk
btw confirmed websub.rocks is running the latest code from github
#
cweiske
then i have no idea what could be wrong
#
aaronpk
is this issue 19?
#
Loqi
[dunglas] #154 There are only 2 discovery mechanisms, not three
#
aaronpk
that's interesting, you got the error about "rel=self" but not "rel=hub"?
#
aaronpk
i bet somehow the link header is being overridden to only send the last one in your request
#
cweiske
interesting idea
#
aaronpk
this code would have returned an error that rel=hub was missing first, so it did get that one
#
aaronpk
so either on your end it's sending only the last one, or on my end it's only seeing the last one
#
aaronpk
i see a comment in here that indicates it should be supporting multiple Link headers rather than a combined one, and i do remember trying that out a long time ago
#
cweiske
btw, I successfully subscribed to https://chat.indieweb.org/dev/ now
#
cweiske
but I do not get any pushes
#
aaronpk
ok that's progress! Let me check on this end
#
aaronpk
looks like the subscription is active
#
cweiske
should every chat message here trigger a push?
#
aaronpk
it throttles it to send a push no more than once every 2 minutes
#
cweiske
that's great
#
cweiske
just got a push 2 minutes ago
#
aaronpk
that was my manual one
#
aaronpk
i'm trying to see if the automatic triggering is working
#
aaronpk
it loooks like it should be working
#
cweiske
just got a ping
#
aaronpk
that was manual again :/
[Jeff_Hawkins] joined the channel
#
aaronpk
this is weird
joepobryant joined the channel
#
aaronpk
i think i got it
#
aaronpk
that should have sent you one
#
cweiske
now I need to find out why the crawler thinks that it got https://chat.indieweb.org/dev/2019-01-08
#
aaronpk
the redirect?
#
aaronpk
oh wait 01-08?
#
aaronpk
caching an old redirect?
#
cweiske
hm. could be
#
cweiske
/TODO: what if location redirects change?
#
cweiske
that's the comment in phinde :/
#
cweiske
so another FIXME task on my side
#
cweiske
thanks for the help!
#
aaronpk
hehe nice
#
cweiske
and for issue 19, it seems like it's a problem on your side
#
aaronpk
yeah i'm looking at that now
#
aaronpk
it's gonna take me a bit to set up a working test for this
#
aaronpk
could you try something real quick? try using "Link" instead of "link"
#
aaronpk
wait no it's not that because the rel=hub is making it through
#
cweiske
sorry, but the code strtolowers the headers somewhere
#
aaronpk
so it's definitely the fact that they're coming in on separete link headers
#
cweiske
that's hard to get out
#
cweiske
I'm going to bed now. bb
#
aaronpk
ok definitely confirmed it's only seeing the last link header
#
aaronpk
weird, $_SERVER drops them totally
Pikseladam joined the channel
#
aaronpk
i don't understand why i can't find any documentation on this
#
aaronpk
ok how do other projects handle this? i can't be the only one to have this problem
[Ana_Rodrigues] and geoffo joined the channel
#
Loqi
[aaronpk] Okay I tracked it down and it's only seeing the last `link` header that you're sending. I suspect this is because `$_SERVER` only shows the last header, and the framework it's using loads the headers from there. You mentioned this worked under PH...