#microformats 2020-04-22

2020-04-22 UTC
[schmarty], KartikPrabhu, Bill_Bennett_NZ and [jgarber] joined the channel
#
[jgarber]
gRegorLove: I sorted out the conflicts on microformats/tests#112 and also found a bug and opened up an issue (microformats/tests#113) and a small PR to fix it (microformats/tests#114):
#
[jgarber]
☝ Thanks for your help today cranking through those existing PRs!
#
Loqi
[jgarber623] #114 Fix unresolved relative URL in output JSON
#
Loqi
[jgarber623] #112 Normalize example.com URLs in input and output
mauz555, [tantek], [jgmac1106], kino, [chrisaldrich] and GWG joined the channel
#
gRegorLove
[jgarber] I don't think the only-domain URLs are supposed to be normalized adding the trailing slash like that. Otherwise tests/112 looks good!
strugee joined the channel
#
Zegnat
It is hard to say what normalisation is supposed or not supposed to happen. I think we still do not define what a “normalized absolute URL” is, as the spec always calls it. I think it is not weird for libraries to require some value for path, which at minimum is the /. (Ie. it is not a “trailing slash”, it is “root path”.)
#
Zegnat
I think the IndieAuth specification even specifically mentions adding the last /, because of this reason: cannot have an empty path component to a URL.
#
Zegnat
Might be time to pick up this discussion about normalisation again: https://github.com/microformats/microformats2-parsing/issues/9#issuecomment-383051432
#
Loqi
[Zegnat] After reminding myself about what this was about again, there actually seem to be two things mentioned in this one issue. These may need to be addressed separately: 1. **What does the mf2 spec consider to be “normalized”?** [RFC 3986 section 6...
KartikPrabhu, [LewisCowles], mauz555, [jgmac1106], gRegorLove_, [tw2113] and [jgarber] joined the channel
#
[jgarber]
gRegorLove and Zegnat: Thanks for point out the inconsistencies around normalization. Should we move discussion to the above-linked issue on GitHub?
#
[jgarber]
I’m in favor of following the IndieAuth spec. Did some quick tests in Ruby-land that support this approach:
#
[jgarber]
The above snippet is using Ruby’s built-in URI parser. The below snippet is another example using the popular Addressable gem:
#
[jgarber]
```irb(main)> Addressable::URI.parse('http://example.com').normalize```
#
[jgarber]
```=> => #<Addressable::URI URI:http://example.com/>```
#
[jgarber]
Looking at other languages, the JavaScript `URL` interface behaves similarly:
#
[jgarber]
// returns "/"```
#
[jgarber]
☝ I’ll post the above on the GitHub issue.
#
aaronpk
The reason IndieAuth works that way is because it expects URLs to be fetchable, and a url with no path can't be fetched even though it is still a "valid" url by some definitions
#
Loqi
[jgarber623] This issue [came up in chat](https://chat.indieweb.org/microformats/2020-04-22#t1587535487332700) (via @gRegorLove): > I don't think the only-domain URLs are supposed to be normalized adding the trailing slash like that. @Zegnat noted [in a rep...
[tantek], JC1, Loqi_, aaronpk_, StacyBloom, KartikPrabhu, IWSlackGateway, [LewisCowles], [Ana_Rodrigues], gRegorLove_ and [snarfed] joined the channel
#
[tantek]
Who asked about non-HTML? Zegnat? Two things.
#
[tantek]
1 never make assumptions about standards you don't have to (e.g. assuming HTML), because such assumptions make your own standard more fragile, less re-usable.
#
[tantek]
2 As an example, SVG has a "class" attribute, it's technically possible to use microformats in SVG (though I don't know of any examples in the wild).
#
Zegnat
Not me this time :D
#
Zegnat
feels that any structure that can be expressed in DOM can be parsed by an mf2 parser
[calumryan] and Sajesajama joined the channel
#
jacky
Zegnat: technically! lol
#
[tantek]
3 Historically, microformats emerged in the days when there was a lot of XML popularity, and new "XML languages", ABCML, FooML etc. were a weekly occurrence. So it always made sense to abstract "document language" to avoid unnecessary time-wasting fights with XML heads.
#
Zegnat
Not just technically. The spec can be applied to anything that has the concept of children, attributes, and classes. Generally. So anything that can be expressed by DOM can be parsed. I think the PHP parser takes raw DOMDocuments as input, so those do not depend on HTML at all.
#
sknebel
we had the case of someone trying to get it to parse atom, right?
#
sknebel
and it almost working=
#
sknebel
(with the php parser?)
#
Zegnat
I’d expect that to just work, actually
#
Zegnat
Although the PHP parser may have a couple of hardcoded assumptions that the DOM always originated from HTML. E.g. when skipping template elements and using the base element to resolve relative URLs
#
sknebel
afaik we had a content-type check in the fetch method or something. I forget
#
Zegnat
That base element is probably also the reason we should not remove language like the sentence quoted by [jgarber] about “the containing document's language's rules”.
[tw2113] and [jgarber] joined the channel
#
[tantek]
yes that's exactly why.
KartikPrabhu joined the channel
#
gRegorLove_
yeah, there was an indiewebify.me issue trying to parse an XML document with php-mf2 https://github.com/indieweb/indiewebify-me/issues/78
#
Loqi
[mro] #78 false negative testing https://try.gogs.io/issue-5008
#
gRegorLove
I lean a bit in favor of adding / to normalize (for consistency), though I also wonder what benefits it has for consumers
#
GWG
gRegorLove: I still owe you a fixed PR
#
GWG
I have that odd testing error
#
gRegorLove
which error?
jamietanna joined the channel
#
Zegnat
gRegorLove: the benefit for consumers is as aaronpk pointed out wrt IndieAuth: no path is not fetchable. Although I would not be surprised if several fetching libraries have been tweaked to *also* normalise no path into /.
#
gRegorLove
I'm missing the connection to IndieAuth
#
gRegorLove
I get it in general, though. If the parsed, normalized URL makes it easier for a fetching lib, sounds good
#
Zegnat
No connection to IndieAuth, other than the usecase being the same :) You may want to fetch the u-url of a p-comment, for instance
#
Zegnat
Or maybe the usecase where you use the mf2 parser provided rel parser to find an endpoint to do IndieAuth with? I think some IndieAuth implementation rely on php-mf2
#
Zegnat
But in general, I would like us to figure out exactly what normalisation we expect, more than just “set empty paths to /”
KartikPrabhu joined the channel
#
gRegorLove
Are there examples of that use-case failing, though? Seems like most any mature fetching lib is handling that
#
gRegorLove
`curl https://example.com` works for example
exigirl joined the channel
#
aaronpk
the benefit would be making it easier to compare
#
GWG
gRegorLove: It says that stdClass isn't the same as array
#
GWG
Not sure what is going on there
#
Zegnat
“It says that stdClass isn't the same as array” can’t argue with that :P
#
Zegnat
Any code y ou can link to, GWG?
#
GWG
Zegnat: The PR for adding alt.. it's in the repo
#
GWG
I need to try to figure out what I did
#
gRegorLove
iirc, the img+alt processing is creating a stdClass which then gets converted into the JSON object
#
Zegnat
Ah, right, yes. So the question is whether we expect objects there or associated arrays. I would expect the latter. I thought we only used stdClass when we needed an empty {} in the JSON?
#
Zegnat
Oh, huh, is it ever using stdClass? parseImg seems to create an array.
#
Zegnat
goes to run the tests
Kaja_ joined the channel
#
Zegnat
GWG: I do not know why the test runner gives the result it does. Probably something to do with assertJsonStringEqualsJsonString. It just means that the output of the parser had an object ({value:…,alt:…}) while the compared string in the test has just the single string ("").
#
GWG
So, not anything I can fix?
#
Zegnat
I mean that you can fix the tests, it is just that the error message the test gives is a bit odd
#
Zegnat
Although looking at that last test there, ParseImpliedTest::testMultipleImpliedHCards, I wonder if the photo property on the Sally Ride h-card also needs to get an alt
#
Zegnat
It doesn’t seem to be getting an alt with the current implementation
#
Zegnat
Not sure what we expect the output of `<img class="h-card" alt="Sally Ride" src="http://upload.wikimedia.org/wikipedia/commons/a/a4/Ride-s.jpg"/>` to be.
#
Zegnat
GWG: you should be able to download that raw .patch file and just `git apply` it locally to your branch to test.
#
GWG
Zegnat: Then maybe I can get that merged
#
Zegnat
I think I may have found a bug there. But I am too tired to confirm with the spec. I think we expect `<img class="h-card" alt="Sally Ride" src="http://upload.wikimedia.org/wikipedia/commons/a/a4/Ride-s.jpg"/>` to also get an alt. But the test in ParseImpliedTest::testMultipleImpliedHCards suggests it does not get an alt
#
Zegnat
GWG, in Parser.php, the parseImpliedPhoto method, the first img case (img.h-x[src]) immediately returns the URL. But it should use parseImg() there too, I think
#
Zegnat
goes to sleep because he really cannot focus anymore
#
gRegorLove
I think that h-card example should have an alt. mf2py parses that way too.
#
gRegorLove
oh, maybe not. I was using KartikPrabhu's but just saw it's an experimental fork of mf2py
#
Loqi
[Tantek Çelik] microformats2 parsing specification
#
gRegorLove
Should make sure we have a test case covering each time "parse an img element for src and alt" appears on http://microformats.org/wiki/mf2p
#
Loqi
[Tantek Çelik] microformats2 parsing specification
gRegorLove_ joined the channel
#
gRegorLove_
Ah, it's behind an experimental flag in mf2py, which python.microformats.io probably does not have turned on.
jeremycherfas joined the channel
#
gRegorLove_
Raises the question if we should do that in php-mf2 as well
Bill_Bennett_NZ joined the channel
#
KartikPrabhu
gRegorLove: the alt was put behind a flag since it is a breaking change for consumers. The idea was to remove the flag and make it default in a "major version"
[tw2113] joined the channel