#microformats 2021-04-23

2021-04-23 UTC
[KevinMarks], [kimberlyhirsh], jeremycherfas, [fluffy], Seirdy, [chrisaldrich] and [tantek] joined the channel
#
[tantek]
yeah, classic linkrot 😕
#
@ChrisAldrich
↩️ Do share a link to your digital garden if it’s public. I love to see what others are doing with respect to design & use. We need to get around to holding a Gardens & Streams II camp session(s) to keep iterating on the idea. Do add yourself to [more...] https://boffosocko.com/2021/04/23/55790385/
(twitter.com/_/status/1385490246586953731)
hendursaga, tomlarkworthy, hendursa1, KartikPrabhu, Seirdy, [KevinMarks], voxpelli, [kimberlyhirsh], TallTed, [jgmac1106], [tantek], [schmarty] and barnabywalters joined the channel
#
barnabywalters
jacky: it looks like someone made a start on a rust mf2 parser, although this log is all I can find https://gist.github.com/flaki/63c3cf04b3d627b54a06404873700f1f
[aciccarello] joined the channel
#
barnabywalters
it looks like https://github.com/servo/html5ever is the best bet for html parsing, but it’s not particuarly clearly documented
#
Loqi
[servo] html5ever: High-performance browser-grade HTML5 parser
#
barnabywalters
and I can’t find a good rust DOM implementation anywhere, which is weird seeing as servo is written in it
#
barnabywalters
so it looks like writing a rust mf2 parser without relying on only questionably maintained html scraping packages will be a bit of a challenge
#
[tantek]
the servo parser should be solid, fairly well tested on real world content
#
jacky
I use it indirectly
#
jacky
this is what I use to pull rel= info on a page https://lib.rs/crates/scraper
#
barnabywalters
the parser looks fine, but the DOM implementation which comes with it is written “only for testing and security related issues will be ignored”
#
jacky
via CSS selectors (bedrock)
#
barnabywalters
yeah I saw scraper, it doesn’t seem to be sustainably maintaned at the moment, so I’m not sure I’d want to build a parser on top of it
#
barnabywalters
might be good for reference and inspiration though
#
jacky
oh TIL
#
jacky
heh it's right at the top too
#
barnabywalters
there’s an issue in the repo about finding a new maintainer, and lots of people responded but it doesn’t look like anyone’s taken it over yet
#
jacky
hm well merges still seem to go through
#
barnabywalters
so I’m cautiously optimistic
#
jacky
it even has an example to implement readability, wow
#
jacky
switches his tool to use this, lol
#
barnabywalters
so many DOM implementations to choose from
#
jacky
heh yeah
#
barnabywalters
I’m not so familiar with the current mf2 parsing model, is it possible to make a streaming parser which doesn’t require a full DOM in memory the whole time?
#
barnabywalters
IIRC the python parser took a recursive approach, although I only worked on it a little bit right at the beginning
#
barnabywalters
but mf2 has so many special cases that it might not be possible to completely parse a document with a single tree traversal
#
barnabywalters
and who knows what additional special cases have been added in the last few years, when I wasn’t paying attention ;)
#
aaronpk
there haven't been too many parsing level changes
#
aaronpk
most of the big changes have been in the interpretation of the parsed data, like post type discovery or authorship discovery
#
barnabywalters
okay, which is out of scope for the parser anyway. good to know
#
aaronpk
but now that i think about it, we're probably due for a blog post describing the actual parsing changes in the last few years
#
barnabywalters
provided the prose algorithm is up to date I can work from that, but it’d definitely be interesting to have a prose summary of what’s changed
#
aaronpk
yeah this is hard to read :)
#
barnabywalters
ooh we include the value of the id attribute now? that’s cool
KartikPrabhu joined the channel
#
aaronpk
looks like the last change was jan 2019 (alt text stuff), and dec 2018 and jul 2018 before that
timculverhouse joined the channel
#
[tantek]
probably need to do another mf2 pop-up to see where we are with current open issue/resolved and applying changes
timculverhouse and [aciccarello] joined the channel