#dev 2022-08-07

2022-08-07 UTC
#
gRegorLove_
angelo++ very cool, was able to sign in now
#
Loqi
angelo has 8 karma in this channel over the last year (11 in all channels)
#
angelo
tantek i just started using one of those; i know nothing about ARIA roles
geoffo joined the channel
#
angelo
gRegor good to know!
#
angelo
https://indieweb.rocks/jobs if you want to peak at the crawl; t-z domains remaining
#
GWG
gRegor: Reviewed your PR. There is a feature missing, but I saw nothing wrong with the features you added.
#
gRegor
Thanks! Yeah missed that, I'll work on it
#
angelo
70 minutes to crawl 586 websites, crash-free. i'm going to put it on a schedule. capjamesg how often are you reindexing homepages? (i know you're indexing way more than that)
geoffo joined the channel
#
GWG
gRegor: Typo
neceve joined the channel
#
[tantek]
lots discussed at the pop-up, and yet lots more we could have discussed!
#
GWG
[tantek]: Isn't that always the way?
#
[tantek]
I'm still hesitant to implement "read-of" because I don't want to separately implement "watch-of" and "listen-of". and whatever is the verb for "looking" at a photo or other static image.
mlncn joined the channel
#
[tantek]
privacy-policy << INFORMATIONAL 2013 RFC: https://www.rfc-editor.org/rfc/rfc6903.html
petermolnar and alex11 joined the channel
#
[Jamie_Tanna]
Thanks angelo++
#
Loqi
angelo has 9 karma in this channel over the last year (12 in all channels)
IWSlackGateway joined the channel
#
IWDiscordGateway
<capjamesg> angelo++
#
Loqi
angelo has 10 karma in this channel over the last year (13 in all channels)
#
IWDiscordGateway
<capjamesg> What is the goal for IndieWeb rocks?
#
IWDiscordGateway
<capjamesg> Recrawling with Search is kind of in flux right now. I moved to a new architecture and I’m not using it fully yet.
neceve, tetov-irc, [marksuth] and gRegorLove_ joined the channel
#
@acali6
I learnt about a special operator that you can use to google for specific sites today. Enter `site:http://notiz.blog` to see the entries for that domain. You can specifiy your search and do something like `site:http://notiz.blog webmention`.
(twitter.com/_/status/1556256460572704768)
#
jeremycherfas
My favourite operator remains AROUND(n) to find two words or phrases within (n) words of one another in either direction.
mlncn, mlncn_, gRegorLove__, [schmarty], geoffo and AramZS joined the channel
#
omz13
Angelo, when you crawl, are you using conditional get?
[Jamie_Tanna], nertzy, mlncn, mlncn_ and veracioux joined the channel
#
angelo
capjamesg gamify indiemark to rise the tide and real-time index to make waves
#
angelo
consumption and validation tools under the covers
#
angelo
omz13 i am not but i will for the next crawl
geoffo and [snarfed] joined the channel
#
[snarfed]
on the crawling note, https://indiemap.org/ turned 5 years old recently! I don't plan to do a recrawl, but it'd be nice to get the data updated, I'd happily support anyone who wants to help! https://github.com/snarfed/indie-map
#
Loqi
Indie Map is a public IndieWeb social graph and dataset. 2300 sites, 5.7M pages, 380GB HTML with microformats2. Social graph API and interac...
mlncn joined the channel
#
vikanezrimaya
tried to implement asynchronous streaming templates in Rust. Started from a stream which throws byte chunks at its consumer. The code is several times larger than its output, which is concerning.
#
vikanezrimaya
I might need a macro system or something similar to reduce verbosity
#
IWDiscordGateway
<capjamesg> [snarfed] divide and conquer maybe?
#
IWDiscordGateway
<capjamesg> A few people could take on a portion of the revised domains list.
#
[snarfed]
capjamesg it's not really a lot of manual work per domain, the crawler runs itself. every now and then a site is unusual and needs debugging, but most of the work will be before and after the crawl
#
IWDiscordGateway
<capjamesg> I’m happy to help how I can!
#
IWDiscordGateway
<capjamesg> Just let me know what needs done!
geoffo joined the channel
#
angelo
i seeded indieweb.rocks with 870 sites i got from omz13 in the form of a list of, correct me if i'm wrong, all domains with h-cards from the indiemap. after my own representative h-card parsing i found ~600 domains with representative h-cards. does that sound about right snarfed?
tetov-irc, gxt___ and sp1ff joined the channel
#
omz13
angelo my list was initially seeded with domains listed in indie-map, chat-names, indieweb-ring; it was then reduced to sites with an h-card on their root page
#
omz13
at some stage I will run iwstats again but enable representative h-card retrieval which is a bit more network heavy; I was waiting until my fetch library had some more battle-hardening
#
omz13
and so far its looking quite good (it has some caching atop conditional get which really makes it performant)