#jackybut like since my site's consistently dynamic, doing that is going to be hard
#jackyI'd have to like build an index and update it every time it changes
[tantek] joined the channel
#[tantek]could be a lot more efficient though, if you're able to do incremental index rebuilds rather than full recrawls
sullenbode and [jgmac1106] joined the channel
#[jgmac1106]jacky are you going to end up with a static copy of your site just for crawl and search purposes? Store just json from mf2 parser for searching?....I just use Google, Bridgy, and twitter
#[jgmac1106]only way I can find anything on my website
plut4rch joined the channel
#[jgmac1106]...come to think about it I do also have a gSheet of every tweet, which also comes from Twitter
#[jgmac1106]maybe what we need a bridgy like index service, not that I even knwo what that would be
#GWGI am doing it for sites that don't support mf2
#bekoGWG: yeah. I link to a lot of sites and not many support microformats at all but most have json-ld cuz google.
#bekoSo I usually check the source and if I find json-ld I understand it as the wish of the site owner to use this data.
#bekoott Medium totally lost it by now. I get Service Unavailable on trying to pre-fetch and checking the website in a browser I get a JS _app_. It totally broke the web by now o0
#GWGIf you have any URLs, add to the issue on the Parse This repo
deltab, [KevinMarks], Guest9645225 and Scarecr0w joined the channel
#bekoGWG: In this specific case I tried to bookmark https://medium.com/@mdiluz/linux-gaming-at-the-dawn-of-the-2020s-a941dd602f61 - and that's not a webpage but an app for me. Knowing Medium I may have run into some sort of A/B testing here. Anyway Parse This gets "Service Unavailable" from this and I suspect it's a message from their webserver and not the HTTP error code 503.
#[schmarty]I found a limitation with Hugo on the small EC2 server I was running it on. It was running out of memory due to 120mb of YouTube JSON metadata in the data folder.
#[schmarty]GWG: it's the JSON metadata output from youtube-dl
#GWG[schmarty]: I am trying to get more metadata from YouTube. Will have to read more closely
#[schmarty]I was being lazy and had my templates pulling directly from those files even though they are mostly very long lists of URLs to chunked files in various formats
#GWGMy problem is that I have, assuming a single URL...the ability to parse OGP, MF2, and now JSON-LD. I need to write a protocol to figure out which has the most useful data on a site without wasting resources
#[schmarty]So I changed my download script to move those bits of data into the markdown files instead, making it possible to delete the raw info files once an episode has been grabbed
#Loqi[[snarfed]] so jacky evidently when you use bridgy publish, elixir sends four identical webmentions to bridgy for it, each just 100ms or so after the last, and bridgy publish's transactionality is a bit brittle. it doesn't do anything wrong, it only POSSEs once,...
#[snarfed]totally harmless, and fixed now on bridgy's side anyway. just funny
#jacky^ that's been causing a bug on my end though where I can't grab the resulting URL (at times, it'll have to wait until the last webmention completes)
#GWGLast week it was a different aspect of my link preview and citation code
#GWGCiting other forms of markup seems like a difficult move
[Jeff_Hawkins] and cweiske joined the channel; nickodd left the channel
#cweiskeaaronpk, I just tried to subscribe chat search engine to https://chat.indieweb.org/dev/ and it failed on my side with "No hub URL found for topic". You send the hub link header on the /dev/ url, but also redirect it to the current date. Since i configured my client to follow redirects, I do not see the link headers. using the link header on the redirect url is a nice hack, but it's problematic IMO. do hubs
#cweiskeso your chat.indieweb.org hack does violate the spec
#aaronpkhm, chat.indieweb.org isn't trying to do a subscription migration
#cweiskebut the topic I want to subscribe to sends a redirect
#aaronpkthe intent is that "https://chat.indieweb.org/dev/" is the URL being subscribed to, both one that you should bookmark in your browser, and the topic used in the websub notification
#aaronpkit just so happens that when viewing it in a browser, it sends a redirect to the current day
#aaronpkbut it would be wrong to subscribe to the URL of the current day, since that page stops updating at the end of the day
#cweiskeso how does my subscriber know when to follow topic redirects?
#aaronpkThe protocol currently supports the following discovery mechanisms. Publishers MUST implement at least one of them:
#aaronpkthe publisher SHOULD include at least one Link Header [RFC5988] with rel=hub (a hub link header) as well as exactly one Link Header [RFC5988] with rel=self (the self link header)
#aaronpkthat implies that they must be specified together
#LoqiWebSub (previously known as PubSubHubbub or PuSH, and briefly PubSub) is a notification-based protocol for web publishing and subscribing to streams and legacy feed files in real time https://indieweb.org/WebSub
[LewisCowles] and [Michael_Beckwit joined the channel
#aaronpkbtw confirmed websub.rocks is running the latest code from github
#aaronpkthis code would have returned an error that rel=hub was missing first, so it did get that one
#aaronpkso either on your end it's sending only the last one, or on my end it's only seeing the last one
#aaronpki see a comment in here that indicates it should be supporting multiple Link headers rather than a combined one, and i do remember trying that out a long time ago
#Loqi[aaronpk] Okay I tracked it down and it's only seeing the last `link` header that you're sending. I suspect this is because `$_SERVER` only shows the last header, and the framework it's using loads the headers from there.
You mentioned this worked under PH...