#microformats 2023-06-29

2023-06-29 UTC
#
gRegor
GWG, you'll be happy to know I'm finally watching some of that 2020 mf2 session. Just added some thoughts to https://github.com/microformats/microformats2-parsing/issues/16
#
Loqi
[preview] [aaronpk] #16 consider not including img alt text as part of surrounding text properties
#
GWG
gRegor: Maybe I should rewatch it. I was there, but...who remembers
strugee_, btrem, JSharp, ancarda, capjamesg, saptaks, greenfork, angelo and l8tcoder joined the channel
#
Loqi
capjamesg has 5 karma in this channel over the last year (114 in all channels)
#
gRegor
capjamesg++ angelo++ for all the mf2py work!
#
gRegor
angelo++
#
Loqi
angelo has 1 karma in this channel over the last year (14 in all channels)
olaf[m], gRegor, [Ana_R], Loqi__, milkii, Seirdy, Matt1, [tw2113_Slack_], [capjamesg] and btrem joined the channel
#
capjamesg
snarfed sknebel do you want to be added to the Contributors file for mf2py?
#
sknebel
certainly
#
[tantek]
capjamesg++ sknebel++
#
Loqi
capjamesg has 6 karma in this channel over the last year (115 in all channels)
#
Loqi
sknebel has 1 karma in this channel over the last year (48 in all channels)
#
[tantek]
angelo++ gRegor++
#
Loqi
gRegor has 3 karma in this channel over the last year (92 in all channels)
#
Loqi
angelo has 2 karma in this channel over the last year (15 in all channels)
l8tcoder joined the channel
#
capjamesg
Perfect.
#
Loqi
[preview] [sknebel] #136 update setup.py and fix #95: don't add slashes to void elements
#
sknebel
capjamesg: i'll close it for now
#
capjamesg
Thanks.
#
capjamesg
sknebel How do you want to be listed in CONTRIBUTORS.md?
#
capjamesg
I have created a wiki pop up record for a Microformats roundtable with a blank interest list. If y'all are interested, can you add yourselves w/ time availabilities? https://indieweb.org/2023/Pop-ups/Sessions
#
gRegor
I'm about halfway through watching that 2020 microformats session, then reviewing the pre-requisites there.
#
gRegor
Hoping to iterate on some of those in here / on github issues before we have the popup
#
[tantek]
capjamesg, I merged your new proposal for a pop up record for a Microformats roundtable with the prior version, and included the interest section from before. please take a look: https://indieweb.org/2023/Pop-ups/Sessions#Microformats_Roundtable
#
[snarfed]
exciting progress here! capjamesg my vote for prioritizing issues is for https://github.com/microformats/mf2py/issues/181
#
Loqi
[preview] [snarfed] #181 Resolve relative URLs in e-* HTML content
#
[snarfed]
I'd also be interested to see performance benchmarks, especially ones that could be maintained and rerun over time (ideally in CI!) to track optimizations/regressions
#
[snarfed]
and compare between underlying HTML parsers
#
[snarfed]
sknebel did a deep dive on this ~5y ago, https://github.com/microformats/mf2py/issues/122 , and mf2py itself hasn't changed much since, but I wonder if there's more room for optimization, and I wonder how the different HTML parsers compare
#
Loqi
[preview] [snarfed] #122 significant performance regression since 1.0.4
#
sknebel
the built-in one is crap, lxml is fast, html5lib actually speaks HTML5 ;)
#
sknebel
(and python is slow)
#
[snarfed]
lol yes that all sounds right
#
sknebel
so really the latter two are your bargain, I guess one could run some experiments on that to quantify how much "fast" means
#
[snarfed]
the parser benchmarks (ie outside of mf2py) are probably already out there
#
sknebel
I guess 2 more paths: make the processing more customizable (e.g. if you're not going to use html values no point in serializing them), and binding a different HTML parser (although at that point maybe binding a different mf2 parser is the better path)
#
[snarfed]
I'd be curious how much overhead mf2py adds though
#
[snarfed]
probably not a lot, relatively, but still
#
sknebel
I feel like I remember html parsing with html5lib taking about as long as mf2py? but that really could be me misremembering
#
capjamesg
What does dict_class do in mf2py?
#
capjamesg
We map it to dict in the constructor.
#
capjamesg
It seems redundant.
#
[snarfed]
it lets users substitute in eg OrderedDict or others if they want. https://github.com/microformats/mf2py/blob/main/CHANGELOG.md#105---2016-05-09
#
capjamesg
Is it necessary?
#
capjamesg
Are people doing that?
#
capjamesg
Python dictionaries are ordered in > 3.6
#
[jacky]
I can see a usecase for that (if order matters, if you need a faster implementation of a dict, etc)
#
[jacky]
An unordered could be used for one-off parsing like for webmention
#
[snarfed]
also afaik that ordering isn't guaranteed or true in all Python runtimes
#
[snarfed]
capjamesg you could do a github-wide search to see if any mf2py users are using it. if not many, and if it's causing a problem, we could remove it in the next major version. but sounds like a low priority
#
angelo
see the italics at the bottom of this section: https://docs.python.org/3.7/library/stdtypes.html#dict --- "Changed in version 3.7: Dictionary order is guaranteed to be insertion order. This behavior was an implementation detail of CPython from 3.6."
#
angelo
OrderedDict has a few extra methods but i believe the only motivation for using it with mf2py is to retain document order
#
capjamesg
The implementation of dict_class is not ideal: it is passed through function parameters and gets confusing.
#
angelo
the work on dict during that time also optimized it for speed and memory at the C level
#
[snarfed]
ahh ok!