#microformats 2018-05-28

2018-05-28 UTC
#
@super10extra
Web屋さんに質問なんですけどmicroformatsって今使われてる技術なのか知りたい
(twitter.com/_/status/1000898355902332928)
#
@super10extra
microformats,情報がなさすぎる
(twitter.com/_/status/1000899219224281088)
KartikPrabhu and gRegorLove joined the channel
#
gRegorLove
tantek: nothing outstanding on https://github.com/microformats/microformats2-parsing/issues/6 afaict. It can be closed.
#
Loqi
[tantek] #6 reduce instances when p-name is implied
KartikPrabhu joined the channel
#
KartikPrabhu
gRegorLove: do you know why the h-entry>properties>photo has the first thing as the URL here http://pin13.net/mf2-dev/?id=20180528013306025
#
Loqi
[Tantek Çelik] microformats2 parsing specification
[quinnvinlove], [cleverdevil], [Natris1979], [tantek], [jeremycherfas], tantek_ and barpthewire joined the channel
#
Zegnat
That's an interesting test there, KartikPrabhu. I'll be checking that out.
#
Zegnat
!tell tantek Issue #6 is still open because I resolved it in the spec but I can't close the issue. Not enough rights on the repo. You'll have to close it yourself, tantek, as the person who opened it.
#
Loqi
Ok, I'll tell them that when I see them next
KartikPrabhu, ivc_, wakest, ben2, [jgmac1106] and [miklb] joined the channel
#
Zegnat
Clearly I have 0 clue where the PHP parser decides in what order stuff gets parsed. Have been trying to find the point that would trigger the weird double parse of the photo property without luck
#
sknebel
the "value" is wrong too
#
Zegnat
Yes, but I think I might know why that is happening
#
Zegnat
My textContent implementation in the PHP parser is always replacing images with their alt or URL. But it shouldn’t be doing that in the case of u- fallback
[jgmac1106], [miklb], [snarfed], tantek, [pfefferle], [tantek] and KartikPrabhu joined the channel
#
KartikPrabhu
Zegnat: also that property should be in a "value" property not by itself I think
#
KartikPrabhu
also should that also use the textContent algo with whitespace stuff?
#
Zegnat
I think it would make sense to limit that to p-* parsing?
#
Zegnat
Because the 'value' key of the h-cite object should be parsed following u-* parsing, it should actually be an empty string, right?
#
KartikPrabhu
depends on the whitespaces :P
#
KartikPrabhu
because it triggers the get textContent part of the algo
#
KartikPrabhu
in mfp2y I get u'value': u'\n \n '
#
KartikPrabhu
err forget the "u"s
#
Zegnat
... I can’t reach microformats.org right now
#
KartikPrabhu
hmm down for me too
#
Zegnat
But from memory: I think the final fallback for u-* parsing is textContent with whitespace trimmed. If we only apply the fancy new algorithm to p-* parsing, that means the final value should be an empty string.
#
Zegnat
If we do apply the new algo, because it replaces IMG elements, the value should be the alt of the image.
#
Zegnat
I think?
#
KartikPrabhu
yeah I am not sure that the spec says that anywhere for u-*
#
KartikPrabhu
I think it only got updated in p-* things
#
Zegnat
“else get the textContent of the element after removing all leading/trailing whitespace and nested <script> & <style> elements” - https://web.archive.org/web/20180528014534/http://microformats.org/wiki/microformats2-parsing#parsing_a_u-_property
#
Loqi
[Tantek Çelik] microformats2 parsing specification
#
Zegnat
So we do not swap the IMG for its alt. Which leaves only whitespace as textContent in your example. Then removing all leading/trailing whitespace returns an empty string.
#
KartikPrabhu
yeah no image replacement there
[cleverdevil] joined the channel
#
KartikPrabhu
yeah, maybe mf2py is not removing whitespace there. not sure what is "expected"
#
KartikPrabhu
it is a made up example
#
Zegnat
So both PHP and Python are getting value wrong per-spec. PHP applies the plain-text textContent algo and gets a string, and Python seems to return the raw textContent with trailing/leading whitespace.
#
Zegnat
However the question is, what is the expected value? And that’s hard to say in a made-up example.
#
KartikPrabhu
I think replacing the first u-photo with u-like-of should also do similar things?
#
KartikPrabhu
that is a more "real" example
#
KartikPrabhu
but now it is adding a value property
#
KartikPrabhu
but not some string
#
KartikPrabhu
not sure why "u-photo" behaves differently than "u-like-of" in php-mf2. that seems like a bug
#
KartikPrabhu
aaronpk: microformats.org is down. Is that under your control?
#
Zegnat
u-like-if should not be behaving differently. I still have no clue where that extra value was coming from so I have no idea where this bug is :/
#
Loqi
[kartikprabhu] #176 funny parsing of 'u-photo h-cite'
#
Zegnat
Thanks KartikPrabhu!
#
Zegnat
I am once again having a good look at innerText, but the more I look at the W3C/WHATWG description of the thing, the more confused I get :P
#
KartikPrabhu
haha yes me too
#
Zegnat
Is that based on either W3C or WHATWG innerText? Or just based on our examples and discussions?
#
KartikPrabhu
roughly both :P
#
KartikPrabhu
I am doing the text collection stuff but then going more by my intuition than what WHATWG says (because I got confused)
#
Zegnat
Currently breaking my head on step 4 of the inner text collection steps (https://html.spec.whatwg.org/multipage/dom.html#inner-text-collection-steps). If I am currently inspecting a Text node, how do I determine that it is “the last line of the block”? Also, what block does that even refer to? And I have to act differently if “it ends with a br element”? But a Text node is never ending with any sort of element!
#
KartikPrabhu
yeah it is very non-local that way. You have to know the context of the surrounding stuffs
#
KartikPrabhu
which is why I sort of ditched following it to the letter
#
KartikPrabhu
my algorithm currently passes all the whitespace tests that aaronpk has, except #11 which is incorrect
#
KartikPrabhu
incorrect in the tests
#
KartikPrabhu
judas priest that is going to be hard
#
KartikPrabhu
two pre things get a \n
#
Zegnat
Apparently
#
Zegnat
Probably because the PRE element is a block element, thus always gets a \n between it and the next node?
#
KartikPrabhu
no, i don't think so
#
Zegnat
“If node's used value of 'display' is block-level or 'table-caption', then append 1 (a required line break count) at the beginning and end of items.”
#
KartikPrabhu
what happens for <pre>stuffs</pre><span>
#
Zegnat
“<pre>stuffs</pre><span>more stuffs</span>” => “stuffs\nmore stuffs”. I’d say
#
Zegnat
You need some sort of element display-value look-up to know when the add the extra linebreak before and after contents.
#
Zegnat
It is just a very hard problem. textContent is what DOM gives us, but people find the results unexpected. But that leaves us with either emulating the browser’s innerText (no clear algo for non-CSS implementers) or having our own living algo that changes when more user examples come in :(
#
KartikPrabhu
right now I am inclined to go with the second one :P
#
KartikPrabhu
examples-based
#
KartikPrabhu
much easier to do
#
Zegnat
Let me know when you have written a spec based on yours so I can start implementing it? ;)
#
KartikPrabhu
I have no idea how to write specs. I might write up the algo I am using
#
aaronpk
you can do it! specs are just very precise documentation
#
Zegnat
The one I wrote was mostly based on the way WHATWG writes their state machines, I find that to be a pretty clean style. https://wiki.zegnat.net/media/textparsing.html
#
Zegnat
microformats.org is back up btw!
#
KartikPrabhu
I might have more or less the same algo
#
Zegnat
This was the first one I implemented. So could be much the same, for sure.
#
Zegnat
I wanted to move it to the mf wiki so we could more easily iterate, but I think I still haven’t gotten enough edits to my name to be allowed to create a page
#
KartikPrabhu
oh the one thing I did add was dropping HTML comments
#
KartikPrabhu
mostly because the DOM thing that mf2py uses thinks of comments as strings with a subclass
#
Zegnat
Right. Those are implied to be dropped as I work specifically with “Element” instances. Which are never comments in the DOM spec
#
Zegnat
Everything that isn’t a Text or Element implementation is silently ignored in my algo
#
KartikPrabhu
do comments not count as "Text" node either?
#
KartikPrabhu
aah ok then yes I needed to special case that for mf2py. Maybe I should add similar things for CDATA and all that
#
KartikPrabhu
but XML is not a priority right now
#
Zegnat
https://dom.spec.whatwg.org/#interface-comment - Comment is CharacterData, just like Text, but isn’t Text
#
Zegnat
So maybe your `isinstance(el, NavigableString)` matches both Text and Comment?
#
KartikPrabhu
yes. but Comment is a subclass of NavigableString so I could special case that
#
Zegnat
CDATA should be a subclass of Text. So really checking for Text should be enough, if you are working with DOM-spec compatible objects: https://dom.spec.whatwg.org/#interface-cdatasection
#
KartikPrabhu
right, but we don't want CDATA to show up in the text either do we?
#
KartikPrabhu
so I will have to special case that like Comment
#
Zegnat
Couldn’t you theoretically have CDATA just as the content of a <p> element?
#
Zegnat
isn’t even 100% sure CDATA is still a thing in HTML5
#
KartikPrabhu
yeah CDATA is more of an XML thing
#
KartikPrabhu
also browsers don't display CDATA do they?
#
Zegnat
“CDATA sections can only be used in foreign content (MathML or SVG).” - https://html.spec.whatwg.org/multipage/syntax.html#cdata-sections
#
Zegnat
Browsers will probably render it in MathML. Not sure if mf2 is supposed to handle foreign content in HTML though.
#
KartikPrabhu
anyway not a priority right now ;)
#
Zegnat
But if you can, check for instance of Text. That should mean you get both pure Text nodes as well as CDATA nodes.
#
Zegnat
So people who want to add mf2 to their MathML get their symbols picked up, haha
#
KartikPrabhu
I wonder if something breaks with embedded SVG
#
Zegnat
Probably not. It will just treat those elements as HTML nodes, is my guess
#
Zegnat
With the userland HTML5 parser we recommend for php-mf2, we could theoretically ignore SVG nodes.
#
Zegnat
It really only is important if any elements we are special casing (like data) is a different element within <svg> and shouldn’t be special cased there.
#
Zegnat
But that is extremely theoretic, so lets forget about that for now ;)
#
KartikPrabhu
ok will attempt now to resolve the img-alt stuff in the spec
#
Zegnat
Exciting!
#
Zegnat
I’m probably around for another half hour if you need a second pair of eyes
#
KartikPrabhu
will put up the proposal on the issue and link here
#
KartikPrabhu
at some point we would have to resolve all the text stuff floating around in the spec. where should images be replaced etc...
#
Zegnat
... yeah
#
KartikPrabhu
maybe it is resolved and I just need to put it in mf2py correctly
#
Zegnat
I think it is resolved? IIRC we only replace IMG elements for p-* (and therefore also e-*) parsing, and not in other cases? That feels right to me.
#
Zegnat
microformats.org is down again for me, so can’t check
#
Zegnat
Who controls microformats.org? Phae (ping)?
#
KartikPrabhu
that's why we need BLOCKCHAIN ;)
#
Zegnat
public domain wikis may be one of the very few examples that might actually work well on a blockchain :P
#
KartikPrabhu
no, BLOCKCHAIN should always be in all caps :P
vivus, KartikPrabhu, [jeremycherfas], [tantek] and [chrisaldrich] joined the channel; vivus left the channel
#
Loqi
[kartikprabhu] Here are the proposed changes to the spec to account for `alt` attribute. Add a new section 1.5 with title "parse an `img` element for `src` and `alt`" with the steps - if `img[alt]` - return a new `{}` structure with - `value`: the `src`...
#
KartikPrabhu
[tantek]: ^ please review
KartikPrabhu and [keithjgrant] joined the channel
webchat162 and KartikPrabhu joined the channel