#microformats 2022-09-23

2022-09-23 UTC
ur5us, jacky and [jgarber] joined the channel
#
[jgarber]
Has anyone else taken a swing at https://github.com/microformats/microformats2-parsing/issues/7 ? I’ve got `img[srcset]` parsing working in a yet-to-be-released version of MicroMicro implementing what I laid out back in 2020 in this comment: https://github.com/microformats/microformats2-parsing/issues/7#issuecomment-619023503
#
Loqi
[tantek] #7 Should u-* parsing special case img srcset?
#
GWG
[jgarber]: I contemplated it, but I have not written anything
#
[tantek]4
[jgarber] looks like there are sufficient use-cases to try solving this problem. I do like the Bridgy Publish use-case myself as I have definitely struggled with marking up my photo posts so that one resolution is shown on my website, and another much higher resolution (linked) is what is sent to Bridgy to POSSE to other sites
#
[tantek]4
currently I have that "working" without srcset, however I have a feeling that I may hit a limit with some photos where I want the max resolution to go to Flickr, but then have a downlevel but still high resolution to go to Twitter, and then a medium resolution for display on the page itself
#
[tantek]4
looks like we didn't settle on *how* srcset should be parsed, just kept as a string, subparsed into a nested structure, if so indexed array or keys
#
[tantek]4
I'm curious which approach would be easier for Bridgy to parse and look for what it wants
#
[tantek]4
none of the examples provided the bytes size of the image, only dimensions, so is the assumption that bridgy would try the largest (dimensional) image and see if that fits for POSSEing, and if not, go to the next smaller one, try that, and on down the line?
#
[tantek]4
also did anyone look into how the img element DOM was modified for srcset? is it "just" an attribute that returns a string value, or is there a srcset DOM with the individual pieces and ways to manipulate it?
#
[jgarber]
I’ve implemented the “nested structure” approach, adding a `srcset` key to the same structure holding `value` and `alt`. The value of the `srcset` key is a hash whose keys are width descriptors (e.g. `480w`, `2x` ) and values are the URLs associated with those descriptors.
#
[jgarber]
That’s a long-winded way of saying “What I wrote here way back when”: https://github.com/microformats/microformats2-parsing/issues/7#issuecomment-619023503
#
Loqi
[jgarber623] Building on @dshanske's comment with some uses cases pulled from [MDN's Responsive Images tutorial](https://developer.mozilla.org/en-US/docs/Learn/HTML/Multimedia_and_embedding/Responsive_images). URLs are resolved based on a hypothetical source page...
#
[jgarber]
…but to be fair, I’m approaching it from the “how would a parser expose this data?” and not a “how might a consumer use this data?” so it could be my implementation isn’t the most desirable.
#
[jgarber]
I’m not terribly familiar with Bridgy Publish, but a consuming application of this experimental nested `srcset` structure could expose a UI to a user with the width descriptors (e.g. `1x`, `2x`, `3x`) when choosing which image version/size to publish to a third-party.
#
[jgarber]
Tantek, I think that is the workflow you were referencing?
#
[jgarber]
_…or_ if the parser returns `srcset` and its value is the literal attribute value, then the consuming application (Bridgy, for example) would need to do the heavy lifting of parsing the attribute value itself.
#
[jgarber]
Apologies, “width descriptor” above is more correctly “condition descriptor” which could be _either_ a width descriptor (e.g. `480w`) or a pixel density descriptor (e.g. `2x`). Reference: https://developer.mozilla.org/en-US/docs/Web/API/HTMLImageElement/srcset
#
[tantek]4
I think all the use-cases are zero UI
#
[jgarber]
So a consuming application would have some logic around choosing the most appropriate image given the `srcset` attribute’s value (whether that be exposed as a single string or as a nested structure)…?
#
[tantek]4
looks like the DOM treats srcset as one string
#
[tantek]4
it would be consistent to provide a string in the mf2 parsed json. it would also preserve the orders of the condition/URL pairs
#
[tantek]4
the use of a JSON structure with keys does not guarantee order of the keys
#
[tantek]4
which would then not work for the HTML->mf2JSON->HTML use-case mentioned
#
[jgarber]
`srcset` image candidates aren’t source order dependent, as far as I understand.
#
[tantek]4
I wonder if there are tests for it
#
[jgarber]
Tests in…?
#
[tantek]4
likely in WPT
#
[jgarber]
What is WPT?
#
[jgarber]
loqi where you at
#
[jgarber]
Oh, interesting. That’s new to me.
#
[tantek]4
hmm, I suppose we have to parse out the URLs inside a srcset value in order to resolve relative URLs
#
[jgarber]
Yep! That’s a gnarly bit of code I wrote a while back. 😅
Seirdy and ur5us joined the channel
#
Loqi
[capjamesg] #6 Update user agent used by the parser
gRegorLove_ and petermolnar joined the channel
#
[KevinMarks]
If you want to preserve order you can parse to a list rather than a dict, but knowing if order matters is useful, and resolving relative urls is consistent with the rest of URL handling.
#
[jgarber]
It seems order within a srcset attribute doesn’t matter:
#
[jgarber]
Section 4.8.4.3.7 “Selecting an image source” notes:
#
[jgarber]
> In an implementation-defined manner, choose one image source from sourceSet. Let this be selectedSource.
#
[jgarber]
The spec does define how to handle duplicate descriptors, but my read of the spec (and confirmed by Mozilla’s article above) is that duplicate descriptors is erroneous and, in that case, first takes precedence.
ur5us, gRegorLove_ and jacky joined the channel
#
[KevinMarks]
So that does imply that order matters for duplicates , so if you are making it a dict you want to not override an existing key.
#
[jgarber]
Right. That’s the one thing I’d need to verify in my work in progress implementation. It’d be straightforward to add some tests for that, too.
jacky, gRegorLove__ and gRegor joined the channel
#
[tantek]4
good catch [KevinMarks], about not overriding
jacky and gRegorLove_ joined the channel; petermolnar left the channel
#
[jgarber]
Okay, that was fairly straightforward. An image with a duplicative condition descriptor:
#
[jgarber]
```<img srcset="elva-fairy-480w.jpg 480w,
#
[jgarber]
elva-fairy-800w.jpg 480w"
#
[jgarber]
src="elva-fairy-640w.jpg"
#
[jgarber]
alt="Elva dressed as a fairy"
#
[jgarber]
Would be parsed as follows (abbreviated, but this would be within `"photos": []`):
#
[jgarber]
class="u-photo">```
#
[jgarber]
"alt": "Elva dressed as a fairy",
#
[jgarber]
"srcset": {
#
[jgarber]
"480w": "http://example.com/elva-fairy-480w.jpg"
#
[jgarber]
"value": "http://example.com/elva-fairy-640w.jpg"
#
[jgarber]
Aligns with the spec noting that source order within `srcset` matters when encountering duplicative condition descriptors (in this case, the `480w`).
#
[tantek]4
jgarber can you show an example that demonstrates multiple valid condition descriptors, perhaps even mixing the 2x vs 800w style, and the 480w dupe?
#
[tantek]4
hoping to get a complex enough bit of JSON output for [snarfed] to evaluate as to how Bridgy might try parsing it prioritizing the keys for figuring out which image resolution to post where when POSSEing
#
[snarfed]1
:thumbsup:
jacky, gRegorLove__ and jeremycherfas joined the channel
#
[jgarber]
[tantek] Sure thing. I’ll add a few examples here:
#
[jgarber]
First, pixel density descriptors with a “fallback” (no descriptor specified) which a parser (according to the spec) would infer as `1x`:
#
[jgarber]
```<img srcset="elva-fairy-320w.jpg,
#
[jgarber]
elva-fairy-480w.jpg 1.5x,
#
[jgarber]
elva-fairy-640w.jpg 2x"
#
[jgarber]
src="elva-fairy-640w.jpg"
#
[jgarber]
alt="Elva dressed as a fairy"
#
[jgarber]
class="u-photo">```
#
[jgarber]
Parsed:
#
[jgarber]
"alt": "Elva dressed as a fairy",
#
[jgarber]
"srcset": {
#
[jgarber]
"1x": "http://example.com/elva-fairy-320w.jpg",
#
[jgarber]
"1.5x": "http://example.com/elva-fairy-480w.jpg",
#
[jgarber]
"2x": "http://example.com/elva-fairy-640w.jpg"
#
[jgarber]
"value": "http://example.com/elva-fairy-640w.jpg"
#
[jgarber]
This is a fairly chaotic example:
#
[jgarber]
Different sources have differing guidance on whether it’s permissible to mix width descriptors (`480w`) and pixel density descriptors (`1x`) but… it’s HTML and basically anything goes, so (to your point), we’d want to account for that.
#
[jgarber]
```<img srcset="elva-fairy-320w.jpg,
#
[jgarber]
elva-fairy-480w.jpg 1.5x,
#
[jgarber]
elva-fairy-480w.jpg 480w,
#
[jgarber]
elva-fairy-1080w.jpg 1080w,
#
[jgarber]
elva-fairy-640w.jpg 2x,
#
[jgarber]
elva-fairy-nope.jpg"
#
[jgarber]
elva-fairy-nope.jpg 480w,
#
[jgarber]
src="elva-fairy-640w.jpg"
#
[jgarber]
alt="Elva dressed as a fairy"
#
[jgarber]
…and parses out to:
#
[jgarber]
class="u-photo">```
#
[jgarber]
"value": "http://example.com/elva-fairy-640w.jpg",
#
[jgarber]
"srcset": {
#
[jgarber]
"1x": "http://example.com/elva-fairy-320w.jpg",
#
[jgarber]
"480w": "http://example.com/elva-fairy-480w.jpg",
#
[jgarber]
"1.5x": "http://example.com/elva-fairy-480w.jpg",
#
[jgarber]
"1080w": "http://example.com/elva-fairy-1080w.jpg",
#
[jgarber]
"2x": "http://example.com/elva-fairy-640w.jpg"
#
[jgarber]
"alt": "Elva dressed as a fairy"
ur5us joined the channel
#
[jgarber]
Does that give enough to go on or would you like an entire MF2 JSON structure?
#
[tantek]4
That looks good to me. [snarfed]?
#
[jgarber]
:thumbsup::skin-tone-2: Happy to post more if there’s anything you’d like me to try.