#dev 2022-09-10

2022-09-10 UTC
jacky, superkuh, geoffo, angelo, nertzy, bterry, mro and tetov-irc joined the channel
#
IWDiscordGateway
<capjamesg> angelo Do you have a list of the sites you are crawling?
#
IWDiscordGateway
<capjamesg> I need a new seed list for IndieWeb search.
mro, geoffo, vilhalmer and walkah joined the channel
#
capjamesg
angelo++
#
Loqi
angelo has 12 karma in this channel over the last year (14 in all channels)
#
capjamesg
I have just started a new IndieWeb Search crawl.
#
capjamesg
Would anyone be interested in a gzip of the HTML?
#
capjamesg
(as in, if this is something people want, I'll publish it on GitHub)
#
angelo
i'd be interested in any domains with a representative hcard
#
[snarfed]
capjamesg headers are useful too, consider WARC! http://bibnum.bnf.fr/WARC/
#
capjamesg
I think I could write a translation script.
#
capjamesg
I save header information.
#
capjamesg
I'm at 7,000 documents and I am still crawling domains whose first character is between A and C.
#
angelo
whoa you can pack an entire mirror of a site into a single .warc
[jgarber] joined the channel
#
[jgarber]
capjamesg I may have missed this earlier, but what's the IndieWeb search you're working on?
#
capjamesg
It's a search engine for the IndieWeb.
#
[jgarber]
Cool! Thanks for the link.
jjuran, geoffo and [aciccarello] joined the channel
#
capjamesg
angelo Is indieweb.rocks open to contributions? I'd love to help out how I can.
#
angelo
i've wanted to calculate total page weight of indexed sites for a while.. just came up with a wget command to warc a page with its assets and am reporting its file size
#
angelo
aaronpk your homepage is clocking in at 75MB; is that right?
#
aaronpk
i just checked in chrome and a full load is 6mb for me
#
aaronpk
how are you measuring 75mb?
#
angelo
wget -EHkp aaronparecki.com --warc-file="apk.warc" --no-warc-compression
#
angelo
running that will give you a full tree of assets it downloaded
#
angelo
i had an embedded youtube iframe /being inserted at page load/ and wget found it and added it to the warc anyway
#
angelo
so it's probably some kind of unrealistic upper bound
#
aaronpk
oh it's the podcast
#
angelo
i guess that makes sense.. the feature is for archival purposes, not page weight determination
tetov-irc joined the channel
#
angelo
capjamesg that'd be awesome.. let me write some build instructions and you can start by getting it built on your machine
#
angelo
in the meantime get poetry installed
#
capjamesg
Sounds good! I'm super excited by indieweb.rocks!
#
angelo
where there's a will there's a node lib: https://www.npmjs.com/package/page-weight
geoffo joined the channel
#
IWDiscordGateway
<capjamesg> Has anyone used Farcaster?
#
IWDiscordGateway
<capjamesg> It does require a cryptocurrency wallet, which I created for the specific purpose of exploring this.
#
IWDiscordGateway
<capjamesg> They have interoperable clients built by different people for events, news, and other things.
#
IWDiscordGateway
<capjamesg> The main reader is owned by a business but it seems as though anyone can build one.
#
IWDiscordGateway
<capjamesg> Cryptocurrency requirements aside, I am intrigued to see different ways of thinking about data ownership for social media.
#
angelo
that page-weight lib is exactly what i wanted and is working well for all sites so far.. it gets a permalink on your site just fine but it's hanging on your homepage aaronpk
#
angelo
now that's weird
#
aaronpk
my home page uses http long polling to bring in posts in real time
#
aaronpk
so it's probably confused about that
#
aaronpk
it's probably unnecessary but it's kind of cool :) if you're on my home page and I post something it'll just show up right at the top
#
angelo
it did add a "XHR" section to the data returned for one of your permalinks.. do you use the same tech on individual posts?
#
aaronpk
i don't think so?
#
angelo
comments don't slide in?
#
aaronpk
i think i've had a todo for comments
#
angelo
k, then that's probably it.. thanks!
#
IWDiscordGateway
<capjamesg> Long polling?
#
aaronpk
actually i think it's the EventSource API
#
angelo
capjamesg farcaster looks like so many similar projects; what appealed to you?
#
angelo
capjamesg i was going to warn you about crawling my site.. i've got a ridiculous number of generated pages in the form of software releases.. someone's crawling them right now.. there's no bottom to that pit
#
angelo
guess it's time for a robots.txt
#
[snarfed]
reminds me of all the exceptions I found during the indie map crawl. https://indiemap.org/docs.html#exceptions
#
angelo
ah, https://github.com/GoogleChrome/lighthouse is what i've been looking for.. it still hangs on aaronparecki.com but times itself out.. and ultimately returns 6MB :)
#
Loqi
[GoogleChrome] lighthouse: Automated auditing, performance metrics, and best practices for the web.