#microformats 2022-02-18

2022-02-18 UTC
sarahd[d], capjamesg[d], antrdnv[d], Myst[d], indieweb-irc-bri, [Zeina], [aciccarello], [tw2113_Slack_], ur5us, [davidmead], KartikPrabhu, Loqi__, cygnoir[d], zack[m], diegov, mambang[m], KartikPrabhu1, [jgmac1106], angelo, IntriguedWow[d], [tantek] and jacky joined the channel
↩️ So what I did was save the audio files to my Internet Archive first, Made an HTML page on @glitch, and then use http://Granary.io to convert my microformats into an xml file for podcast feeds
barnaby joined the channel
andysylvester, ur5us and angelo joined the channel
to be clear, that's just my fork. The canonical version is https://dissolve.github.io/mf2-tester/
thought we could/should eventually move it to a more "official" hostname, either under microformats.org or microformats.io. (I also own microformats.dev)
ah yes :)
all the REAL work was done by ben_thatmustbeme :)
very cool tool! it’d be interesting to see how much better php-mf2 0.5.0 is than 0.4.6, which is being tested there
that's a static site; no? I wonder if that could be added at like https://microformats.io/matrix.html or something
Wait so there is more work to be done on the Python parser?
the img/alt parsing fixes a lot of those
I personally stopped at v2 for now until I had a consuming case for v1 (re: mf2)
and the rest seem to be whitespace issues, which afaik there are also improvements to in 0.5.0
jacky: technically, sure. Right now it's all built with GitHub Actions, so publishing to GitHub Pages is pretty straightforward. Getting it to publish to whatever serves microformats.org would be a lot more work, auth issues, etc
ah gotcha
wow lol `microformats-v2/h-card/extendeddescription` is kicking a lot of the parser's butt
hmm actually now I wonder
jacky: pretty sure that’s just the comparatively new img/alt parsing not being available everywhere
ah gotcha
which is the case for a lot of those tests
Is there Python parser work to be done?
barnaby: trying now with updated php dependencies. Unfortunately that will overwrite the existing results that include rust, but /shrug
capjamesg[d]: if 1.1.2 (the tested version according to that matrix) is the latest, then the python parser needs img/alt parsing
and some changes to whitespace handling
willnorris: cool, thanks!
and then there are some other failures which point to larger problems, e.g. an entire nested mf being missing here https://dissolve.github.io/mf2-tester/python/microformats-v1/hreview/item.json.diff.txt
Good to know! I don’t know how hard this will be to tackle but I’ll take a look.
img/alt should be easy, unsure about the others
it’s been maybe 8 years since I helped tom get started on the python parser, so I’m not entirely familiar with how it works these days
It looks like that function is actually already implemented under a feature flag.
okay, https://willnorris.github.io/mf2-tester/ now reflects v0.5.0 of the php library. Now passes 6 additional test cases.
I'll send Ben a PR for that as well
It’s not on by default for backwards comparability according to the reader.
great, looks like it’s mostly whitespace stuff remaining now
capjamesg[d]: yeah img/alt being a breaking change for badly-written consuming code was one of the reasons to bump php-mf2 from 0.4 to 0.5
(and for me to write an article warning people not to make the mistake which leads to it being a breaking change)
well, perhaps “badly” is too harsh. “naively” would be more accurate
willnorris: are the tests in the matrix from https://github.com/microformats/tests? because afaik php-mf2 passes those, unless they’re not included in the default test suite
yes, this just vendors in microformats/tests
hmm looks like I need to fix some more stuff in php-mf2 then
good to know
I mean, it's also possible there's a bug somewhere in how it's running the tests. Does php-mf2 run the shared test suite as part of it's normal testing?
I thought it did, but it’s been a long time since I worked on the tests, and I didn’t look through them in detail when setting up the new CI recently
That's what I do for the go client. I have the shared test suite setup as a submodule in the repo, and https://github.com/willnorris/microformats/blob/main/testsuite_test.go runs it as part of the normal testing process
and differences between how whitespace is handled in e- and p- properties, which is weird because I’d assume that the value key of an e-parsed block would be exactly the same as a p-parsed property
barnaby: I had some somewhat similar escaping issues in the go client that was causing tests to fail. I suggested fixing it in the tester (https://github.com/dissolve/mf2-tester/pull/4) but folks didn't like that. I ended up doing it in the library (https://github.com/willnorris/microformats/commit/820225570fa984be709885c95180dd1bff0d7dd6)
The issue with json having multiple possible representations of unicode is a bit tricky
I think in this particular case it’s curly quotes which are the problem, so just special-casing a couple of characters isn’t going to cut it
I do remember needing to do some weird htmlentities and encoding juggling a long time ago to get DOMDocument to handle unicode correctly
angelo joined the channel
maybe there’s a better way by now, if it wasn’t already improved
You can have utf-8 representations or \u representationa iirc
I vaguely remember discussing this a while ago, but it may not be in the parsing issues
hmm okay it looks like the php-mf2 tests do pass the mf/tests suite internally, but only by subclassing Parser and adding a custom textContent implementation??
which would explain why the tests pass internally but not in the matrix
I need to look at this in more detail another time, I think
I’m more concerned by the apparent official difference between whitespace handling in e-*.value and p-* properties
It may be worth another pass through that with different parser implementations, and make some tests with emoji and other non basic utf-8 in.
definitely +1 on having more tests using non-english characters
and emoji too — any app which can’t handle emoji in 2022 is on the way to irrelevance ;)
Anders joined the channel
Anders: did you have a specific use-case in mind for the directory-listing idea?
Fairly specific. As an example, if you run `python3 -m http.server` it will give you a simple file browser over HTML
It wouldn't be too hard to add h-dir, h-file, p-mod-time, p-size, etc to that listing and make it machine readable
Once you have that it would be easy to wrap it in a FUSE mount
Or rclone
So any webserver that implements this could be treated as a read-only filesystem
Anders: sounds interesting, but what do you want to do with that?
I have xoxo to json and back in unmung, and python and php xoxo implementations, but it is a bit separate from other microformats
especially when webDAV exists already (albeit much more complex than what you’re suggesting)
[KevinMarks]: I meant more applications which actually parse xoxo data and use it for something useful, not just showing that it’s there :D
Simpler WebDAV is basically the idea
https://microformats.org/wiki/xoxo#Implementations mentions exactly one practical implementation on odeo, which no longer exists
One concrete example that would be useful for me. If you have a large directory tree that changes regularly but only a few files at a time, zipping the whole thing is impractical. Something like this would let you do incremental updates by crawling the tree and checking for files with changed modTimes or sizes
You could also make the spec a bit more complicated, and add reference links to thumbnails for images. That would let you implement a hole bunch of applications.
the size and modtime stuff sound like things which are already accessible from a HEAD request and caching headers. I’m not sure I see the value in duplicating that information elsewhere unless it’s important that it’s human-readable as well as machine readable
Another example would be opening files natively in VLC for streaming (assuming the server/FUSE support byte range requests).
The difference is you have to HEAD each file individually, which quickly becomes way too many requests. It's a moot point anyway, because webserver developers are duplicating that information anyway for the listings I mention above. I'm simply advocating doing that in a standard way.
Imagine if you had to ls -l each file on your filesystem one by one if you wanted to see the size of everything in a directory