#dev 2022-08-24

2022-08-24 UTC
[Anton_Podviazn], [jgarber], kloenk, chenghiz_, angelo, downtoearth1, cjw6k_, geoffo, jacky, jjuran_, gxt, jjuran, [marksuth], [aciccarello] and omz13 joined the channel
#
capjamesg
[Jamie_Tanna] The coffee emoji got turned into a ? in my DNS records.
#
sknebel
capjamesg: did you just try pasting it in or did you do the escaping by hand?
#
sknebel
(assuming you wanted to do an emoji-domain)
#
capjamesg
I pasted it in.
#
capjamesg
sknebel This was for a TXT record.
#
capjamesg
The RFC says only ASCII characters are allowed but we wanted to see what would happen if you didn't obey that rule.
#
capjamesg
Presumably Namecheap transformed the emoji for me.
#
[Jamie_Tanna]
Oh yes I saw your message and forgot to reply. That's a shame, but probably for the best!
omz13 joined the channel
geoffo and tetov-irc joined the channel
#
capjamesg
Anyone able to review this PR from a while ago: https://github.com/indieweb/chat.indieweb.org/pull/56
#
Loqi
[capjamesg] #56 Add link to our Discord chat
jacky and geoffo joined the channel
#
[tantek]
seems valid https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fbitfieldconsulting.com%2Fblog%3Fformat%3Drss but the second warning may provide a clue: "line 3, column 1010: description should not contain HTML: a [help]"
#
[Jamie_Tanna]
Ah interesting - I'd assumed Aperture and Monocle would both parse the same, could be library versions out of sync maybe?
#
[Jamie_Tanna]
Is anyone doing any reddit webmention backfill or something? I've seen a few of my old submissions (that I manually webmentioned back to my site) getting updates this afternoon
#
sknebel
snarfed said he'd work on that at some point, so possible
[snarfed] joined the channel
#
[snarfed]
yup, that's running now
#
[snarfed]
one wm per reddit submission, source is the reddit submission page
#
[snarfed]
(separate from Bridgy, which sends wms for comments)
#
[snarfed]
also excited about the wm discovery dataset, it's run discovery on over 1M domains. I plan to publish that, and the per-wm results, on Indie Map
[campegg], jacky, gRegor and jamietanna joined the channel
#
jamietanna
snarfed++ thanks that explains it :D
#
Loqi
snarfed has 24 karma in this channel over the last year (55 in all channels)
sebbu and geoffo joined the channel
#
capjamesg
angelo Does your crawler download everything on a web page?
#
angelo
no, just the representative h-card's u-photo and/or site favicon
#
angelo
yes, it does receive and parse the entire HTTP response
jacky joined the channel
#
capjamesg
Got it.
#
capjamesg
Have you thought about how to speed up your crawler?
#
angelo
i forgot to mention earlier at HWC, it also opens the website in a headless browser for a screenshot. i'm going to try to break the different parts of the process into small background jobs so that i can better profile what's taking the most time.
#
capjamesg
Ah! That explains it.
#
capjamesg
I didn't realise you did that.
#
capjamesg
I found cProfile really helpful in understanding performance bottlenecks in my Python code for IndieWeb Search.
#
angelo
i'll rerun the crawler now and have it skip downloading photos/favicons/screenshots but still no caching/conditional-get; otherwise, i can open /multiple/ browsers and i currently have the "worker count" set to 5. i think i can ratchet that up much higher..
#
capjamesg
What are you using to take the screenshot / open the headless browser?
#
angelo
you can peek at https://indieweb.rocks/jobs to see two functions being repeatedly called in the background: get_media() and get_page(); if i decompose them further there'll be a finer grained lens on seeing the crawler's hot spots
#
angelo
i actually just got the code in question uploaded to my site right now! https://ragt.ag/code/indieweb.rocks/files/indieweb_rocks/__init__.py
#
Loqi
Angelo Gladding
#
angelo
get_page() and get_media() are in there; excuse the poor formatting
#
capjamesg
This is cool!
#
capjamesg
angelo++
#
Loqi
angelo has 11 karma in this channel over the last year (14 in all channels)
geoffo and jacky joined the channel
#
angelo
oh it's also calling out to pa11y which may or may not be spawning it's own "browser" of unknown gargantuan
jacky, petermolnar, tetov-irc, tbbrown and AramZS_ joined the channel