#dev 2022-08-24

2022-08-24 UTC
[Anton_Podviazn], [jgarber], kloenk, chenghiz_, angelo, downtoearth1, cjw6k_, geoffo, jacky, jjuran_, gxt, jjuran, [marksuth], [aciccarello] and omz13 joined the channel
# 09:19 
capjamesg [Jamie_Tanna] The coffee emoji got turned into a ? in my DNS records.
# 09:21 
sknebel capjamesg: did you just try pasting it in or did you do the escaping by hand?
# 09:21 
sknebel (assuming you wanted to do an emoji-domain)
# 09:22 
capjamesg I pasted it in.
# 09:22 
capjamesg sknebel This was for a TXT record.
# 09:23 
capjamesg The RFC says only ASCII characters are allowed but we wanted to see what would happen if you didn't obey that rule.
# 09:23 
capjamesg Presumably Namecheap transformed the emoji for me.
# 09:24 
[Jamie_Tanna] Oh yes I saw your message and forgot to reply. That's a shame, but probably for the best!
omz13 joined the channel
# 09:25 
sknebel ah ok
geoffo and tetov-irc joined the channel
# 10:58 
capjamesg Anyone able to review this PR from a while ago: https://github.com/indieweb/chat.indieweb.org/pull/56
# 10:59 
Loqi [capjamesg] #56 Add link to our Discord chat
jacky and geoffo joined the channel
# 16:03 
[Jamie_Tanna] Anyone know why https://bitfieldconsulting.com/blog?format=rss may be working in https://monocle.p3k.io/preview?url=https%3A%2F%2Fbitfieldconsulting.com%2Fblog%3Fformat%3Drss but doesn't work when I put it into Aperture? 🤔
# 16:09 
[tantek] seems valid https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fbitfieldconsulting.com%2Fblog%3Fformat%3Drss but the second warning may provide a clue: "line 3, column 1010: description should not contain HTML: a [help]"
# 18:00 
[Jamie_Tanna] Ah interesting - I'd assumed Aperture and Monocle would both parse the same, could be library versions out of sync maybe?
# 18:00 
[Jamie_Tanna] Is anyone doing any reddit webmention backfill or something? I've seen a few of my old submissions (that I manually webmentioned back to my site) getting updates this afternoon
# 18:32 
sknebel snarfed said he'd work on that at some point, so possible
[snarfed] joined the channel
# 18:54 
[snarfed] yup, that's running now
# 18:54 
[snarfed] one wm per reddit submission, source is the reddit submission page
# 18:54 
[snarfed] (separate from Bridgy, which sends wms for comments)
# 18:55 
[snarfed] also excited about the wm discovery dataset, it's run discovery on over 1M domains. I plan to publish that, and the per-wm results, on Indie Map
[campegg], jacky, gRegor and jamietanna joined the channel
# 20:55 
jamietanna snarfed++ thanks that explains it :D
# 20:55 
Loqi snarfed has 24 karma in this channel over the last year (55 in all channels)
sebbu and geoffo joined the channel
# 21:22 
capjamesg angelo Does your crawler download everything on a web page?
# 21:24 
angelo no, just the representative h-card's u-photo and/or site favicon
# 21:24 
angelo yes, it does receive and parse the entire HTTP response
jacky joined the channel
# 21:25 
capjamesg Got it.
# 21:25 
capjamesg Have you thought about how to speed up your crawler?
# 21:28 
angelo i forgot to mention earlier at HWC, it also opens the website in a headless browser for a screenshot. i'm going to try to break the different parts of the process into small background jobs so that i can better profile what's taking the most time.
# 21:30 
capjamesg Ah! That explains it.
# 21:30 
capjamesg I didn't realise you did that.
# 21:30 
capjamesg I found cProfile really helpful in understanding performance bottlenecks in my Python code for IndieWeb Search.
# 21:32 
angelo i'll rerun the crawler now and have it skip downloading photos/favicons/screenshots but still no caching/conditional-get; otherwise, i can open /multiple/ browsers and i currently have the "worker count" set to 5. i think i can ratchet that up much higher..
# 21:35 
capjamesg What are you using to take the screenshot / open the headless browser?
# 21:35 
angelo you can peek at https://indieweb.rocks/jobs to see two functions being repeatedly called in the background: get_media() and get_page(); if i decompose them further there'll be a finer grained lens on seeing the crawler's hot spots
# 21:36 
angelo selenium facilitated by https://pypi.org/project/webdriver-manager
# 21:37 
angelo i actually just got the code in question uploaded to my site right now! https://ragt.ag/code/indieweb.rocks/files/indieweb_rocks/__init__.py
# 21:37 
Loqi Angelo Gladding
# 21:37 
angelo get_page() and get_media() are in there; excuse the poor formatting
# 21:39 
capjamesg This is cool!
# 21:39 
capjamesg angelo++
# 21:39 
Loqi angelo has 11 karma in this channel over the last year (14 in all channels)
geoffo and jacky joined the channel
# 22:05 
angelo oh it's also calling out to pa11y which may or may not be spawning it's own "browser" of unknown gargantuan
jacky, petermolnar, tetov-irc, tbbrown and AramZS_ joined the channel