#microformats 2018-03-19

2018-03-19 UTC
[cleverdevil] and KartikPrabhu joined the channel
#
gregorlove
edited /hreview-examples-in-wild (+497) "move historical examples"
(view diff)
#
gregorlove
edited /hreview-examples-in-wild (+1134) "processing New Examples"
(view diff)
#
KartikPrabhu
gRegorLove++ for going through hreview in the wild
#
Loqi
gregorlove has 23 karma in this channel (223 overall)
#
KartikPrabhu
is working on hreview backcompat in mf2py
#
KartikPrabhu
hreview is weird enough that it breaks all rules about simple class subsitution!
nitot, tantek, [jeremycherfas], barpthewire, [pfefferle] and [kevinmarks] joined the channel
#
[kevinmarks]
Did gregorlove find examples that were that weird?
Adrian1, [kevinmarks], echarlie, [eddie], petermolnar_web_, jeremycherfas1, nitot, velope, Garbee, [cleverdevil] and [gerwitz] joined the channel
#
jribbens
edited /existing-rel-values (+241) "Undo revision 66678 by [[Special:Contributions/Tantek|Tantek]] ([[User talk:Tantek|Talk]]) - archived is in use and is not the same as 'archives'"
(view diff)
#
Loqi
[masteragen] edited /rel-author (-49)
[kevinmarks], tantek and [jeremycherfas] joined the channel
#
KartikPrabhu
gRegorLove: does php-mf2 not distinguish between " item" and "item vevent" rules in hreview backcompat?
#
KartikPrabhu
how doe it do that?
#
KartikPrabhu
mf2py at the moment cannot because it applies all the class-substitution rules in order
#
KartikPrabhu
it does not seem to have "item" to "p-item h-item" though
#
gRegorLove
s/Replaces/Adds/
#
KartikPrabhu
hmmm so these other are some special cased ones?
#
gRegorLove
The `item x` variants are treated as special cases
#
gRegorLove
They get processed first, then the base rules.
#
KartikPrabhu
hmm ok I was hoping to avoid something like that in the JSON rules
#
gRegorLove
Ah, right. I'm seeing the issue now.
#
KartikPrabhu
JSON does not have any "ordering"
#
KartikPrabhu
or atleast not the format I have now
#
gRegorLove
I though I recently saw that JSON arrays are ordered (and was surprised by that), but maybe I misread.
#
KartikPrabhu
arrays are. but currently all rules are JSON objects aka dictionaries just like mf2 parsed results
#
KartikPrabhu
I was thinking of keeping it simple and close to mf2 results format
#
Zegnat
wonders if JSON dictionaries are ordered or not
#
Zegnat
I never had reason to go and find out
#
KartikPrabhu
ok atleast python dictionaries are not, which is what JSON dicitonaries map to
#
[kevinmarks]
different languages treat them differently
#
[kevinmarks]
Python they're in hash order (so consistent by unpredictable) PHP they're in creation order and Go they're randomized
#
KartikPrabhu
right so language agnostic approach would be treat them as unordered
#
Zegnat
So, officially, an “object is an unordered collection”, and an “array is an ordered sequence”, per RFC 8259 (a.k.a. The Last JSON Spec)
#
KartikPrabhu
yeah I thought so
tantek joined the channel
#
sknebel
Fun fact: from python 3.7 they are insertion ordered
#
Zegnat
RFC 7493 (I-JSON) specifically notes that “order of object members in an I-JSON message does not change the meaning of an I-JSON message.” Just to make it clear that the order shouldn’t make a difference.
#
Zegnat
I-JSON is the spec that forbids duplicate object names, fixing one of the biggest interoperability problem. (I also opened an issue to specify that microformats parsers should stick to I-JSON for output.)
#
KartikPrabhu
Zegnat: I think mf2 ouput currently does not have duplicate object names due to parsing algorithm
#
Zegnat
It shouldn’t have no. But I didn’t see any harm in specifying what JSON spec we follow. And I-JSON seems to be the one recommended for max interoperability. (In fact, that’s why it exists.)
#
KartikPrabhu
I guess I should check for that in my JSON rules tests
#
tantek
Is I-JSON basically a stricter subset of JSON?
#
Zegnat
tantek, yes
#
tantek
definitely up for tightening things up as long as we don't lose functionality
#
tantek
or introduce artificial precision (like ordering where there isn't any)
#
KartikPrabhu
yeah I think mf2 parsing algo at the moment treats objects as unordered (this seems right) and does not have duplicate keys. So I am good with adding these somewhere in the spec in a language agnostic way
#
Zegnat
Tim Bray worked together with the ECMA guys to get RFC 8259/STD 90 published, the “final” version of JSON. But in his release announcement he still says he recommends I-JSON because it “explicitly rules out some legal-bug-dumbass things”: https://www.tbray.org/ongoing/When/201x/2017/12/14/RFC-8259-STD-90
#
Zegnat
s/legal-bug-dumbass/legal-but-dumbass/
#
KartikPrabhu
makes mental note to use "legal-bug-dumbass" somewhere
#
tantek
hmm - there may be some loss of information then - with the unordered objects
#
tantek
because I believe currently parsers do maintain document order
#
tantek
which is quite handy
#
tantek
e.g. when determining "first" of something
#
tantek
the items and children arrays in particular, are meant to be ordered
#
KartikPrabhu
right those are arrays which are ordered
#
tantek
ok good
#
tantek
the other one to investigate is rels
#
KartikPrabhu
but objects/dictionaries i.e. "{}" things are not
#
tantek
yes the order of properties inside a microformat should not matter
#
gRegorLove
That might have changed recently in php-mf2. I haven't been paying attention to ordering and latest version had some big changes to parsing recursively.
#
Zegnat
Not every array will keep source order though, e.g. PHP sorts the contents of "type" as seen here: https://github.com/microformats/microformats2-parsing/issues/22
#
Loqi
[Zegnat] #22 Is there a definitive version of a type array?
#
Zegnat
If we say arrays are strictly ordered, that means PHP has a different output than Go
#
Zegnat
goes to make a note in that issue
#
KartikPrabhu
Zegnat: mf2py does the same thing as Go so you can add that too
#
Zegnat
Can do
#
tantek
good catch. type array is an exception. since it comes from the class attribute, order cannot matter
#
tantek
so perhaps for interop/compat we need to define an arbitrary order :/
#
tantek
like alphabetical
#
tantek
just to make it clear that the order is not representing any information in the source
#
Zegnat
Or source order. per DOM (as noted in that issue) the list is ordered in source order with no duplicates
#
tantek
ugh no. because that would imply that source order of class names could or should mean anything
#
tantek
when it cannot
#
Zegnat
Per DOM classList, I should say
#
tantek
anyone paying attention to that order is introducing bugs in their code
#
tantek
because web authors have full freedom to re-order class names without any impact in meaning
#
tantek
the only use-case I can imagine for that order presevation is for an HTML editing tool that uses the DOM
#
Zegnat
Yep. I don’t really care. I think DOM only defines order because they want the same HTML document to always result in the same parsed output
#
tantek
no you should care because it's a potential source of bugs
#
Zegnat
I mean I do not care whether we follow WHATWG (sorted set for list of classes) or not. As long as the same HTML input always results in the same parser output.
#
tantek
what do you mean by "follow WHATWG"?
#
Zegnat
WHATWG’s classList is an ordered set of items
#
Zegnat
Its all in the issue I linked
#
tantek
in the DOM spec?
#
Loqi
[Zegnat] > I always have thought of it as “an unordered list of classes starting in h-” Maybe that would be a better description then? The way it specifies “`"h-*" type(s)`” rather than e.g. classes made me think it meant unique values, but I may b...
#
Zegnat
Yes, in the DOM spec.
#
tantek
it's not a matter of following or not - it's a matter of making it clear that what we define does not depend on order
#
tantek
depend on *source order of class names
#
Zegnat
Tools consuming the mf2 JSON definitely shouldn’t depend on order. But I think mf2 parsers should have a consistent order. I expect that’s why WHATWG defined an order for HTML-parsers, so every parser would have a consistent output for classList
#
tantek
no they defined an order there for the use case of HTML editing tools
#
tantek
not some abstract notion of consistent output
#
tantek
"Tools consuming the mf2 JSON definitely shouldn’t depend on order" <-- saying it won't make it so
#
tantek
that's not how tool developers work
#
tantek
if they get something back in what seems like a meaningful order, there is non-zero chance they will depend on that in some code
#
tantek
and they're not likely to have read every do/don't in every spec (has anyone?)
#
Zegnat
Actually, WHATWG does it just because and not for some imaginary HTML editing tool :P I quoted them in the issue too:
#
Zegnat
“In those cases where order is not important, we still use ordered sets; implementations can optimize based on the fact that the order is not observable.”
#
Zegnat
Either way. I have to head of for dinner. Issue 22 is there for the array inconsistency and contains all my thoughts on the matter already. No need to rehash here
#
Loqi
[Zegnat] #22 Is there a definitive version of a type array?
#
tantek
we should make sure we don't make the same mistake with rel values in the same rel attribute as well
#
tantek
so yes, type inside an item, and individual rel values from a single rel attribute, both should not be ordered
#
tantek
or rather, if the syntax implies/enforces an order (e.g. arrays), then for those cases we should pick an arbitrary order that does not reflect source order to avoid having consuming code accidentally depend on meaningless (and thus fragile) order
#
tantek
an arbitrary order like alphabetical
nitot, nitot_ and tantek joined the channel
#
KartikPrabhu
so the only actual ordered thing in mf2 is the "items" array?
#
KartikPrabhu
sorry "items" and "children" arrays
#
Zegnat
It sounds like we need to define an (arbitrary) order for every thing that is an array in the mf2 output, if we take JSON arrays as ordered and want parser output to be consistent
#
KartikPrabhu
right. so "items" and "children" are document order all other arrays are random ("alphabetical") ordered ?
#
KartikPrabhu
objects are unordered
#
KartikPrabhu
not sure this is a problem in practical consumption, but some mf2py tests do treat property arrays as ordered in source order
#
Zegnat
by property arrays you mean the array in `"photo": [ "url1", "url2" ]` within e.g. a parsed h-entry that happens to have 2 u-photo?
#
KartikPrabhu
in python ["url1", "url2"] ! = [ "url2", "url1" ]
#
KartikPrabhu
but maybe that is also doc order (?)
#
Zegnat
I think that must be in source order. Else there is no way to find the first of something.
#
KartikPrabhu
hmm yeah so only "type" is unordered then
#
Zegnat
type and rels within the "rel-urls", I’d say
#
Zegnat
And since “unordered” isn’t a thing in JSON, we should define what order we expect parsers to set. Something arbitrary like alphabetical might be the most optimal, per tantek above
#
Zegnat
Although I haven’t done enough rel parsing to know if there are any other edge-cases there
#
KartikPrabhu
yeah not sure. I can fix the python tests. this seems to be an issue only for tests afaik
#
tantek
yes property value arrays are also ordered document source order
#
tantek
which matters for any consuming code / applications that want the "first" value of a property
#
KartikPrabhu
right that was my bad
#
tantek
all the properties for example where certain consuming applications want to assume that there can only be one
#
KartikPrabhu
yup. like using first "photo" as a fallback for "featured" or maybe also for authorship
#
tantek
so rather than trying to itemize everything that is ordered (you're likely to miss one and make a mistake), you should itemize the things that are *NOT* ordered, or rather, where the order DOES NOT have a semantic (and thus should not accidentally imply one)
#
KartikPrabhu
I don't see why all arrays can't be doc ordered. Just the tests need to account for those
#
tantek
KartikPrabhu: already covered. search above for "fragile"
#
tantek
why = because it causes more bugs probabilistically over time
#
KartikPrabhu
aah maybe I have not been doing this long enough :P
#
KartikPrabhu
tantek: currently the only unordered ones seem to be "type" and "rel-urls > rels "
#
tantek
correct, those are the only two I know about where we take an *unordered* set from the markup, and return it as a syntactically ordered array
#
KartikPrabhu
yup so update spec to specify "alphabetical" ?
#
tantek
introducing (thus implying) order in the parsing step is a potential source of artificial precision, and thus adds error
#
tantek
if we have consensus on that and at least one implementation
#
Zegnat
Would this be a fair sum-up, tantek?
#
Zegnat
If we want to make a clear distinction between explicitly ordered arrays (where there is a meaning to being first) and arrays that just happen to have an order because of the JSON format (but should be treated as order not having any meaning), we shouldn’t order both in the same way. So if the first group of arrays uses source order, the second is better of with a different ordering. Alphabetically is as good as any.
#
tantek
is this captured in issue 22
#
KartikPrabhu
tantek: I can add this to experimental mf2py
#
tantek
Zegnat: that's a good general summary yes, with the specifics that the sets of values in 'class' and 'rel' are the two we know about
#
tantek
and more than "should be treated as order not having any meaning" (which is true), but rather, "are specified in the source document as being unordered sets, where implying an order could be harmful"
#
Zegnat
issue 22 is just about what the definitive output for the types array should be, should either rewrite that issue or maybe close it in favour of a more global one addressing all arrays in this way
#
tantek
keep it open
#
KartikPrabhu
hmm in python sets also means unique elements so maybe avoid "sets"
#
tantek
open a separate issue for parsed output that is derived from unordered sets in the source HTML MUST NOT imply any source order
#
tantek
and 'class' and 'rel' are the two we know about
#
Zegnat
I’m about to head for bed, so someone might want to spin this chat off into an issue. Or wait about 8 hours for me to be active again.
#
tantek
and yes, sorting alphabetically is one way to remove implied order noise
#
tantek
KartikPrabhu: from a physics/signal processing perspective, we want to avoid introducing noise as a result of the parsing "processing"
#
Zegnat
KartikPrabhu, set means unique in WHATWG-spec-speech too. Which is why the DOM classList property removes duplicate class names.
#
gRegorLove
I'm looking forward to a summary of this because I'm thoroughly confused :)
#
tantek
gRegorLove: what's something you're confused about? can you ask a specific question?
#
KartikPrabhu
oh in mf2 we don't remove duplicates
#
tantek
we could/should
#
tantek
the source semantic of class and rel is a set
#
KartikPrabhu
so "h-entry h-entry" would give type = ['h-entry', 'h-entry']
#
Zegnat
KartikPrabhu, no we do not, that’s another thing I asked in issue 22. Because issue 22 was all about the specific list of classes
#
gRegorLove
I'm just having a hard time following the whole discussion, what should/shouldn't be ordered. Could just be tiredness on my end. I'm not 100% today.
#
Zegnat
This whole JSON-arrays-are-ordered issue wasn’t on my radar when I created issue 22 at all
#
tantek
KartikPrabhu: that's likely harmless but potentially ambiguous
#
KartikPrabhu
tantek: yes
#
KartikPrabhu
gRegorLove: no worries. Zegnat will make an issue ;)
#
gRegorLove
I'll follow up with any specific questions if I can formulate them.
#
tantek
gRegorLove: no problem. that's confusing because we're talking about ordered/unordered semantics / aspects at three phases: 1. source document, 2. parser processing, 3. JSON syntax
#
Zegnat
Hmm, so either I brew some very strong tea and bash this out now. Or I will go to sleep and write the issue when freshly awake. Decisions decisions.
#
Loqi
sleep has 1 karma in this channel (4 overall)
#
KartikPrabhu
Zegnat: sleep++
#
tantek
Zegnat: no worries, get proper sleep and it will flow more quickly / accurately in the morning
#
Loqi
hehe
#
Zegnat
This reminds me of anomalily sending me to bed at IWS. Alright. See you all in about 8 hours ;)
#
KartikPrabhu
Zegnat: night! might update mf2py anyway :P
#
Zegnat
No problem, leave a note in here if you update the version on https://kartikprabhu.com/connection/mfparser so I know I can use that as a bleeding edge example tomorrow morning
#
KartikPrabhu
:thumsup:
#
KartikPrabhu
actually python sets do both de-duplication and ordering so this should be easy
#
KartikPrabhu
aah another case for class = "u-url u-url" should the url proprty have repeated entries?
#
tantek
no because class="u-url u-url" is equivalent to class="u-url"
#
tantek
class and rel are sets
#
tantek
not sure how many more times I have to say that
#
KartikPrabhu
oh! so this change is in parsing the HTML not in mf2 per se
tantek and chrisaldrich joined the channel