#dev 2021-08-28

2021-08-28 UTC
jeremycherfas, nertzy_ and gerben joined the channel
# 05:55 
@dev_nikema Webmentions – let’s go! https://nikemaprophet.com/webmentions-lets-go/?utm_source=ReviveOldPost&utm_medium=social&utm_campaign=ReviveOldPost (twitter.com/_/status/1431494427487444996)
hendursa1 joined the channel
# 08:50 
@RobbiNespu Implementing #webmention #indieweb (client side) on #hugo SSG

https://robbinespu.gitlab.io/posts/webmentions-hugo/ (twitter.com/_/status/1431538537875316736)
# 09:15 
capjamesg[d] Odd question: does anyone know anything about ideal cloud computing for compute-heavy operations (preferably that will not break the bank!).
# 09:15 
capjamesg[d] DO seems a bit pricey.
tetov-irc, rockorager and jamietanna joined the channel
# 12:02 
jamietanna capjamesg[d] give hetzner.cloud or Scaleway a go for potentially cheaper - but often to get compute heavy, it's gonna be fairly expensive :(
rockorager and christin joined the channel
# 12:39 
capjamesg[d] https://cdn.discordapp.com/attachments/866577430886350869/881156095656214609/Screenshot_from_2021-08-28_13-39-33.png
# 12:39 
capjamesg[d] Hoping the Discord-IRC bridge sends a link ^
# 12:40 
capjamesg[d] !tell snarfed ^
# 12:40 
Loqi Ok, I'll tell them that when I see them next
hendursaga joined the channel
# 13:12 
sknebel capjamesg[d]: can you be more specific? do you need something to run 24/7, what limit are you hitting right now, ...
# 13:15 
capjamesg[d] sknebel I am just thinking about scale.
# 13:15 
capjamesg[d] The way things are going I could feasibly index a few thousand pages per hour with the PythonAnywhere server I am using.
# 13:15 
capjamesg[d] (But that server is limited by CPU usage so I'm only using it to relieve my computer of some work)
# 13:16 
capjamesg[d] This is indexing from IndieMap WARC files, not the web itself.
# 13:16 
capjamesg[d] Say I wanted to index 100 IndieWeb sites. That would take at least 10 days 😄 And that's from a file, not the web 🙂
# 13:18 
sknebel ok, but python anywhere probably is fairly limited, so I'd just start with a $5-$10 small-ish VPS somewhere and see how fast that goes
# 13:19 
sknebel (when I hear "compute-heavy" my mind went a few levels larger ;))
# 13:20 
sknebel ingesting WARCs is potentially something you could do more optimized when going deep into the options of some cloud offering, but tbh that sounds like more hassle to me
# 13:21 
sknebel so hetzner cloud or something sounds like a good starting point
# 13:22 
sknebel (if someone knows how they'd do that kind of thing with a more cloud-focussed setup I'd be curious though, I have no real sense how those compare at small scale)
# 13:22 
capjamesg[d] So would I.
# 13:23 
capjamesg[d] Ingesting from WARC is fast(ish) compared to actually crawling documents.
# 13:23 
capjamesg[d] Because then request time / processing needs to be taken into account.
# 13:23 
capjamesg[d] With a bit of Python wizardry I think I can get it down to about 15 mins for 5,000 documents.
rockorager joined the channel
# 13:24 
capjamesg[d] And my image search engine would take everything to another level.
# 13:24 
capjamesg[d] Because then images need to be downloaded, checked, compared, and optimized before being indexed. But I'm not even going to attempt indexing images.
# 13:24 
rockorager capjamesg[d]: Linode has $100 credits so you could test that out for free for a few months (I think you have 60 days to use the $100)
# 13:25 
capjamesg[d] Good call rockorager. Thanks!
# 13:25 
aaronpk If you're working with warc files do you even need this to be in the cloud? Why not use a desktop computer where it's cheaper to get a fast processor?
# 13:26 
capjamesg[d] Good question. It's more preference and about not wanting to leave my desktop running almost maxed out while these operations go on.
# 13:34 
sknebel (I'm also planning a small crawler project, but will just run that on one of my VPSes for a bit and see how that goes)
# 13:58 
[jeremycherfas] I’ve got myself in a bit of a muddle since GitHub deprecated passwords. I have a PAT working fine on my desktop, but not on my laptop. And when I use the PAT on the laptop I cannot authenticate. Is there a simple option? Am I best off to create a new PAT for the laptop only?
chenghiz_ joined the channel
# 14:52 
sknebel token per device sounds good
# 14:52 
sknebel (assuming they are setup to have that, but I think they do)
# 14:55 
[jeremycherfas] I guess I had better try that next. Tomorrow. I’ve had enough excitement for one day.
[chrisaldrich] and maxwelljoslyn[d] joined the channel
# 17:57 
capjamesg[d] Good example sknebel. I have tried not to use too many "luxurious" resources.
# 17:57 
capjamesg[d] (i.e. using smaller language models vs. larger ones with only slightly more accuracy)
jamietan1a joined the channel
# 18:11 
capjamesg[d] jeena.net is now in the index.
# 18:11 
capjamesg[d] There are at least 7,000 records collectively now.
# 18:13 
doosboox capjamesg[d]: how big is your db vs the amount of data you've parsed?
# 18:15 
capjamesg[d] Let's see.
# 18:15 
capjamesg[d] I need to index HTML for featured snippet support.
# 18:16 
capjamesg[d] That doesn't work for multiple suites quite yet because the logic was oriented around my site and its structure. But I will index HTML anyway for when that support comes (i.e. for "who is" queries to show h-cards).
# 18:16 
capjamesg[d] So I expect to have quite a big DB.
# 18:17 
capjamesg[d] 58MB.
# 18:17 
capjamesg[d] With 8329 records.
# 18:20 
@JamieTanna Right now I'm attending the #IndieWeb #IndieAuth pop-up to look at further improving the specifications, and making it easier for folks to implement and integrate (https://www.jvt.me/mf2/2021/08/1jpqv/) (twitter.com/_/status/1431682508140302337)
# 19:57 
jamietanna aaronpk rack-oauth2 (a common Ruby OAuth2 client) checks errors based on status code https://github.com/nov/rack-oauth2/blob/a8f1d2acd698cc13b6e102ba6fbf5a202d3b9d57/lib/rack/oauth2/client.rb#L150-L158
[dmitshur] joined the channel
# 20:44 
GWG The spec says 'expires_in' where did we get 'expires_at'? I'd prefer it. I convert to it now. https://github.com/indieweb/indieauth/issues/81
# 20:44 
Loqi [dshanske] #81 Adopt Expiration and Refresh Tokens into the Spec
# 20:48 
aaronpk I typo'd my comment, should have been expires_in
# 20:48 
aaronpk expires_at isn't a thing
# 21:11 
GWG  aaronpk Do you just manually edit the html for the spec? Or is there a trick to it?
# 21:14 
Zegnat I actually mess up spec edits constantly.You change public/source/index.php (and maybe some other file in /source) you then have to get the output HTML of that file and put it in public/spec/index.html
# 21:15 
Zegnat So for every single change you have to edit at least 2 files
# 21:16 
Zegnat (For an example of me messing up, see https://github.com/indieweb/indieauth/pull/80 where I had to open a secondary PR because I only changed the public file)
# 21:16 
Loqi [Zegnat] #80 Add & to Example 6
KartikPrabhu joined the channel
# 21:19 
aaronpk yes, edit the one in the "source" folder
# 21:20 
aaronpk don't worry about updating the other files until we're ready to publish a version
# 21:21 
GWG Thanks, that probably saved me some confusion
tetov-irc and Rattroupe joined the channel
# 23:50 
[fluffy] Are there any codified standards for what an IndieAuth profile should include? I see that @rattroupe has added support for outgoing profiles but I don’t see any authoritative/canonical list of what fields are expected to be presented, just a vague-ish example in the IndieAuth spec itself.
# 23:50 
[fluffy] Authl has been requesting the profile scope for quite some time now but doesn’t actually make use of any fields returned because it just gets it from the h-card on the profile page.