#microformats 2021-04-23

2021-04-23 UTC
[KevinMarks], [kimberlyhirsh], jeremycherfas, [fluffy], Seirdy, [chrisaldrich] and [tantek] joined the channel
# 04:29 
[tantek] yeah, classic linkrot 😕
# 07:10 
@ChrisAldrich ↩️ Do share a link to your digital garden if it’s public. I love to see what others are doing with respect to design & use. We need to get around to holding a Gardens & Streams II camp session(s) to keep iterating on the idea. Do add yourself to [more...] https://boffosocko.com/2021/04/23/55790385/ (twitter.com/_/status/1385490246586953731)
hendursaga, tomlarkworthy, hendursa1, KartikPrabhu, Seirdy, [KevinMarks], voxpelli, [kimberlyhirsh], TallTed, [jgmac1106], [tantek], [schmarty] and barnabywalters joined the channel
# 21:10 
barnabywalters jacky: it looks like someone made a start on a rust mf2 parser, although this log is all I can find https://gist.github.com/flaki/63c3cf04b3d627b54a06404873700f1f
[aciccarello] joined the channel
# 21:14 
barnabywalters it looks like https://github.com/servo/html5ever is the best bet for html parsing, but it’s not particuarly clearly documented
# 21:15 
Loqi [servo] html5ever: High-performance browser-grade HTML5 parser
# 21:15 
barnabywalters and I can’t find a good rust DOM implementation anywhere, which is weird seeing as servo is written in it
# 21:16 
barnabywalters so it looks like writing a rust mf2 parser without relying on only questionably maintained html scraping packages will be a bit of a challenge
# 21:24 
[tantek] the servo parser should be solid, fairly well tested on real world content
# 21:24 
jacky I use it indirectly
# 21:25 
jacky this is what I use to pull rel= info on a page https://lib.rs/crates/scraper
# 21:25 
barnabywalters the parser looks fine, but the DOM implementation which comes with it is written “only for testing and security related issues will be ignored”
# 21:25 
jacky via CSS selectors (bedrock)
# 21:26 
barnabywalters yeah I saw scraper, it doesn’t seem to be sustainably maintaned at the moment, so I’m not sure I’d want to build a parser on top of it
# 21:26 
barnabywalters might be good for reference and inspiration though
# 21:26 
jacky oh TIL
# 21:27 
jacky heh it's right at the top too
# 21:27 
barnabywalters there’s an issue in the repo about finding a new maintainer, and lots of people responded but it doesn’t look like anyone’s taken it over yet
# 21:27 
jacky hm well merges still seem to go through
# 21:28 
barnabywalters so I’m cautiously optimistic
# 21:28 
jacky wait nvm https://lib.rs/crates/nipper
# 21:28 
jacky it even has an example to implement readability, wow
# 21:28 
jacky switches his tool to use this, lol
# 21:31 
barnabywalters so many DOM implementations to choose from
# 21:32 
jacky heh yeah
# 21:32 
barnabywalters I’m not so familiar with the current mf2 parsing model, is it possible to make a streaming parser which doesn’t require a full DOM in memory the whole time?
# 21:33 
barnabywalters IIRC the python parser took a recursive approach, although I only worked on it a little bit right at the beginning
# 21:34 
barnabywalters but mf2 has so many special cases that it might not be possible to completely parse a document with a single tree traversal
# 21:35 
barnabywalters and who knows what additional special cases have been added in the last few years, when I wasn’t paying attention ;)
# 21:36 
aaronpk there haven't been too many parsing level changes
# 21:36 
aaronpk most of the big changes have been in the interpretation of the parsed data, like post type discovery or authorship discovery
# 21:37 
barnabywalters okay, which is out of scope for the parser anyway. good to know
# 21:37 
aaronpk but now that i think about it, we're probably due for a blog post describing the actual parsing changes in the last few years
# 21:38 
barnabywalters provided the prose algorithm is up to date I can work from that, but it’d definitely be interesting to have a prose summary of what’s changed
# 21:38 
aaronpk http://microformats.org/wiki/index.php?title=microformats2-parsing&action=history
# 21:38 
aaronpk or https://github.com/microformats/microformats2-parsing/issues?q=is%3Aissue+is%3Aclosed
# 21:39 
aaronpk yeah this is hard to read :)
# 21:41 
barnabywalters ooh we include the value of the id attribute now? that’s cool
KartikPrabhu joined the channel
# 21:42 
aaronpk looks like the last change was jan 2019 (alt text stuff), and dec 2018 and jul 2018 before that
timculverhouse joined the channel
# 23:03 
[tantek] probably need to do another mf2 pop-up to see where we are with current open issue/resolved and applying changes
# 23:03 
aaronpk 👍
timculverhouse and [aciccarello] joined the channel