#angelo70 minutes to crawl 586 websites, crash-free. i'm going to put it on a schedule. capjamesg how often are you reindexing homepages? (i know you're indexing way more than that)
#[tantek]I'm still hesitant to implement "read-of" because I don't want to separately implement "watch-of" and "listen-of". and whatever is the verb for "looking" at a photo or other static image.
#LoqiIndie Map is a public IndieWeb social graph and dataset. 2300 sites, 5.7M pages, 380GB HTML with microformats2. Social graph API and interac...
mlncn joined the channel
#vikanezrimayatried to implement asynchronous streaming templates in Rust. Started from a stream which throws byte chunks at its consumer. The code is several times larger than its output, which is concerning.
#vikanezrimayaI might need a macro system or something similar to reduce verbosity
#IWDiscordGateway<capjamesg> A few people could take on a portion of the revised domains list.
#[snarfed]capjamesg it's not really a lot of manual work per domain, the crawler runs itself. every now and then a site is unusual and needs debugging, but most of the work will be before and after the crawl
#angeloi seeded indieweb.rocks with 870 sites i got from omz13 in the form of a list of, correct me if i'm wrong, all domains with h-cards from the indiemap. after my own representative h-card parsing i found ~600 domains with representative h-cards. does that sound about right snarfed?
tetov-irc, gxt___ and sp1ff joined the channel
#omz13angelo my list was initially seeded with domains listed in indie-map, chat-names, indieweb-ring; it was then reduced to sites with an h-card on their root page
#omz13at some stage I will run iwstats again but enable representative h-card retrieval which is a bit more network heavy; I was waiting until my fetch library had some more battle-hardening
#omz13and so far its looking quite good (it has some caching atop conditional get which really makes it performant)