[Jamie_Tanna]Is anyone doing any reddit webmention backfill or something? I've seen a few of my old submissions (that I manually webmentioned back to my site) getting updates this afternoon
[snarfed]also excited about the wm discovery dataset, it's run discovery on over 1M domains. I plan to publish that, and the per-wm results, on Indie Map
[campegg], jacky, gRegor and jamietanna joined the channel
angeloi forgot to mention earlier at HWC, it also opens the website in a headless browser for a screenshot. i'm going to try to break the different parts of the process into small background jobs so that i can better profile what's taking the most time.
angeloi'll rerun the crawler now and have it skip downloading photos/favicons/screenshots but still no caching/conditional-get; otherwise, i can open /multiple/ browsers and i currently have the "worker count" set to 5. i think i can ratchet that up much higher..
angeloyou can peek at https://indieweb.rocks/jobs to see two functions being repeatedly called in the background: get_media() and get_page(); if i decompose them further there'll be a finer grained lens on seeing the crawler's hot spots