#microformats 2018-05-28

2018-05-28 UTC
# 00:35 
@super10extra Web屋さんに質問なんですけどmicroformatsって今使われてる技術なのか知りたい (twitter.com/_/status/1000898355902332928)
# 00:39 
@super10extra microformats，情報がなさすぎる (twitter.com/_/status/1000899219224281088)
KartikPrabhu and gRegorLove joined the channel
# 01:26 
gRegorLove tantek: nothing outstanding on https://github.com/microformats/microformats2-parsing/issues/6 afaict. It can be closed.
# 01:26 
Loqi [tantek] #6 reduce instances when p-name is implied
KartikPrabhu joined the channel
# 01:45 
KartikPrabhu gRegorLove: do you know why the h-entry>properties>photo has the first thing as the URL here http://pin13.net/mf2-dev/?id=20180528013306025
# 01:45 
KartikPrabhu how is that matching http://microformats.org/wiki/microformats2-parsing#parse_an_element_for_class_microformats ?
# 01:45 
Loqi [Tantek Çelik] microformats2 parsing specification
[quinnvinlove], [cleverdevil], [Natris1979], [tantek], [jeremycherfas], tantek_ and barpthewire joined the channel
# 08:11 
Zegnat That's an interesting test there, KartikPrabhu. I'll be checking that out.
# 08:21 
Zegnat !tell tantek Issue #6 is still open because I resolved it in the spec but I can't close the issue. Not enough rights on the repo. You'll have to close it yourself, tantek, as the person who opened it.
# 08:21 
Loqi Ok, I'll tell them that when I see them next
KartikPrabhu, ivc_, wakest, ben2, [jgmac1106] and [miklb] joined the channel
# 14:31 
Zegnat Clearly I have 0 clue where the PHP parser decides in what order stuff gets parsed. Have been trying to find the point that would trigger the weird double parse of the photo property without luck
# 14:32 
sknebel the "value" is wrong too
# 14:32 
Zegnat Yes, but I think I might know why that is happening
# 14:35 
Zegnat My textContent implementation in the PHP parser is always replacing images with their alt or URL. But it shouldn’t be doing that in the case of u- fallback
[jgmac1106], [miklb], [snarfed], tantek, [pfefferle], [tantek] and KartikPrabhu joined the channel
# 18:17 
KartikPrabhu Zegnat: also that property should be in a "value" property not by itself I think
# 18:19 
KartikPrabhu also should that also use the textContent algo with whitespace stuff?
# 18:19 
Zegnat I think it would make sense to limit that to p-* parsing?
# 18:20 
Zegnat Because the 'value' key of the h-cite object should be parsed following u-* parsing, it should actually be an empty string, right?
# 18:20 
KartikPrabhu depends on the whitespaces :P
# 18:20 
KartikPrabhu because it triggers the get textContent part of the algo
# 18:21 
KartikPrabhu in mfp2y I get u'value': u'\n    \n    '
# 18:21 
KartikPrabhu err forget the "u"s
# 18:21 
Zegnat ... I can’t reach microformats.org right now
# 18:21 
KartikPrabhu hmm down for me too
# 18:22 
Zegnat But from memory: I think the final fallback for u-* parsing is textContent with whitespace trimmed. If we only apply the fancy new algorithm to p-* parsing, that means the final value should be an empty string.
# 18:22 
Zegnat If we do apply the new algo, because it replaces IMG elements, the value should be the alt of the image.
# 18:22 
Zegnat I think?
# 18:23 
KartikPrabhu hmm
# 18:24 
KartikPrabhu yeah I am not sure that the spec says that anywhere for u-*
# 18:24 
KartikPrabhu I think it only got updated in p-* things
# 18:25 
Zegnat “else get the textContent of the element after removing all leading/trailing whitespace and nested <script> & <style> elements” - https://web.archive.org/web/20180528014534/http://microformats.org/wiki/microformats2-parsing#parsing_a_u-_property
# 18:25 
Loqi [Tantek Çelik] microformats2 parsing specification
# 18:26 
Zegnat So we do not swap the IMG for its alt. Which leaves only whitespace as textContent in your example. Then removing all leading/trailing whitespace returns an empty string.
# 18:26 
KartikPrabhu yeah no image replacement there
[cleverdevil] joined the channel
# 18:26 
KartikPrabhu yeah, maybe mf2py is not removing whitespace there. not sure what is "expected"
# 18:26 
KartikPrabhu it is a made up example
# 18:27 
Zegnat So both PHP and Python are getting value wrong per-spec. PHP applies the plain-text textContent algo and gets a string, and Python seems to return the raw textContent with trailing/leading whitespace.
# 18:27 
KartikPrabhu right
# 18:27 
Zegnat However the question is, what is the expected value? And that’s hard to say in a made-up example.
# 18:28 
KartikPrabhu I think replacing the first u-photo with u-like-of should also do similar things?
# 18:29 
KartikPrabhu http://pin13.net/mf2-dev/?id=20180528182845980
# 18:30 
KartikPrabhu that is a more "real" example
# 18:30 
KartikPrabhu but now it is adding a value property
# 18:31 
KartikPrabhu but not some string
# 18:33 
KartikPrabhu not sure why "u-photo" behaves differently than "u-like-of" in php-mf2. that seems like a bug
# 18:35 
KartikPrabhu aaronpk: microformats.org is down. Is that under your control?
# 18:45 
aaronpk nope
# 19:48 
Zegnat u-like-if should not be behaving differently. I still have no clue where that extra value was coming from so I have no idea where this bug is :/
# 19:48 
KartikPrabhu ok filed issue https://github.com/indieweb/php-mf2/issues/176
# 19:48 
Loqi [kartikprabhu] #176 funny parsing of 'u-photo h-cite'
# 19:48 
Zegnat Thanks KartikPrabhu!
# 19:50 
Zegnat I am once again having a good look at innerText, but the more I look at the W3C/WHATWG description of the thing, the more confused I get :P
# 19:50 
KartikPrabhu haha yes me too
# 19:51 
KartikPrabhu Zegnat: if it helps I have this https://github.com/kartikprabhu/mf2py/blob/30f29d72e0e1f88ddc7360ddde05d12d7cc4da0a/mf2py/dom_helpers.py#L53 in mf2py experimental
# 19:52 
Zegnat Is that based on either W3C or WHATWG innerText? Or just based on our examples and discussions?
# 19:52 
KartikPrabhu roughly both :P
# 19:52 
KartikPrabhu I am doing the text collection stuff but then going more by my intuition than what WHATWG says (because I got confused)
# 19:55 
Zegnat Currently breaking my head on step 4 of the inner text collection steps (https://html.spec.whatwg.org/multipage/dom.html#inner-text-collection-steps). If I am currently inspecting a Text node, how do I determine that it is “the last line of the block”? Also, what block does that even refer to? And I have to act differently if “it ends with a br element”? But a Text node is never ending with any sort of element!
# 19:56 
KartikPrabhu yeah it is very non-local that way. You have to know the context of the surrounding stuffs
# 19:56 
KartikPrabhu which is why I sort of ditched following it to the letter
# 19:57 
KartikPrabhu my algorithm currently passes all the whitespace tests that aaronpk has, except #11 which is incorrect
# 19:57 
KartikPrabhu incorrect in the tests
# 20:02 
Zegnat Hmm. I wonder if I could base it on the browsers tests https://github.com/web-platform-tests/wpt/blob/master/html/dom/elements/the-innertext-idl-attribute/getter-tests.js
# 20:03 
KartikPrabhu judas priest that is going to be hard
# 20:03 
KartikPrabhu https://github.com/web-platform-tests/wpt/blob/master/html/dom/elements/the-innertext-idl-attribute/getter-tests.js#L21
# 20:03 
KartikPrabhu two pre things get a \n
# 20:03 
Zegnat Apparently
# 20:04 
Zegnat Probably because the PRE element is a block element, thus always gets a \n between it and the next node?
# 20:04 
KartikPrabhu no, i don't think so
# 20:05 
Zegnat “If node's used value of 'display' is block-level or 'table-caption', then append 1 (a required line break count) at the beginning and end of items.”
# 20:05 
KartikPrabhu what happens for <pre>stuffs</pre><span>
# 20:06 
Zegnat And PRE is a block level element: https://html.spec.whatwg.org/multipage/rendering.html#flow-content-3
# 20:07 
Zegnat “<pre>stuffs</pre><span>more stuffs</span>” => “stuffs\nmore stuffs”. I’d say
# 20:08 
Zegnat You need some sort of element display-value look-up to know when the add the extra linebreak before and after contents.
# 20:08 
KartikPrabhu <sigh>
# 20:10 
Zegnat It is just a very hard problem. textContent is what DOM gives us, but people find the results unexpected. But that leaves us with either emulating the browser’s innerText (no clear algo for non-CSS implementers) or having our own living algo that changes when more user examples come in :(
# 20:11 
KartikPrabhu right now I am inclined to go with the second one :P
# 20:11 
KartikPrabhu examples-based
# 20:12 
KartikPrabhu much easier to do
# 20:12 
Zegnat Let me know when you have written a spec based on yours so I can start implementing it? ;)
# 20:12 
KartikPrabhu lol!
# 20:13 
KartikPrabhu I have no idea how to write specs. I might write up the algo I am using
# 20:13 
aaronpk you can do it! specs are just very precise documentation
# 20:14 
Zegnat The one I wrote was mostly based on the way WHATWG writes their state machines, I find that to be a pretty clean style. https://wiki.zegnat.net/media/textparsing.html
# 20:14 
Zegnat microformats.org is back up btw!
# 20:18 
KartikPrabhu I might have more or less the same algo
# 20:19 
Zegnat This was the first one I implemented. So could be much the same, for sure.
# 20:19 
Zegnat I wanted to move it to the mf wiki so we could more easily iterate, but I think I still haven’t gotten enough edits to my name to be allowed to create a page
# 20:19 
KartikPrabhu oh the one thing I did add was dropping HTML comments
# 20:20 
KartikPrabhu mostly because the DOM thing that mf2py uses thinks of comments as strings with a subclass
# 20:21 
Zegnat Right. Those are implied to be dropped as I work specifically with “Element” instances. Which are never comments in the DOM spec
# 20:21 
Zegnat Everything that isn’t a Text or Element implementation is silently ignored in my algo
# 20:22 
KartikPrabhu do comments not count as "Text" node either?
# 20:23 
Zegnat No
# 20:23 
KartikPrabhu aah ok then yes I needed to special case that for mf2py. Maybe I should add similar things for CDATA and all that
# 20:24 
KartikPrabhu but XML is not a priority right now
# 20:24 
Zegnat https://dom.spec.whatwg.org/#interface-comment - Comment is CharacterData, just like Text, but isn’t Text
# 20:24 
KartikPrabhu aah
# 20:25 
Zegnat So maybe your `isinstance(el, NavigableString)` matches both Text and Comment?
# 20:26 
KartikPrabhu yes. but Comment is a subclass of NavigableString so I could special case that
# 20:26 
Zegnat CDATA should be a subclass of Text. So really checking for Text should be enough, if you are working with DOM-spec compatible objects: https://dom.spec.whatwg.org/#interface-cdatasection
# 20:26 
KartikPrabhu right, but we don't want CDATA to show up in the text either do we?
# 20:27 
KartikPrabhu so I will have to special case that like Comment
# 20:27 
Zegnat Couldn’t you theoretically have CDATA just as the content of a <p> element?
# 20:27 
KartikPrabhu hmmm
# 20:27 
Zegnat isn’t even 100% sure CDATA is still a thing in HTML5
# 20:27 
KartikPrabhu yeah CDATA is more of an XML thing
# 20:27 
KartikPrabhu also browsers don't display CDATA do they?
# 20:28 
Zegnat “CDATA sections can only be used in foreign content (MathML or SVG).” - https://html.spec.whatwg.org/multipage/syntax.html#cdata-sections
# 20:28 
Zegnat Browsers will probably render it in MathML. Not sure if mf2 is supposed to handle foreign content in HTML though.
# 20:28 
KartikPrabhu anyway not a priority right now ;)
# 20:29 
Zegnat But if you can, check for instance of Text. That should mean you get both pure Text nodes as well as CDATA nodes.
# 20:29 
Zegnat So people who want to add mf2 to their MathML get their symbols picked up, haha
# 20:29 
KartikPrabhu eeek
# 20:29 
KartikPrabhu I wonder if something breaks with embedded SVG
# 20:30 
Zegnat Probably not. It will just treat those elements as HTML nodes, is my guess
# 20:30 
KartikPrabhu yeah
# 20:31 
Zegnat With the userland HTML5 parser we recommend for php-mf2, we could theoretically ignore SVG nodes.
# 20:32 
Zegnat It really only is important if any elements we are special casing (like data) is a different element within <svg> and shouldn’t be special cased there.
# 20:32 
Zegnat But that is extremely theoretic, so lets forget about that for now ;)
# 20:33 
KartikPrabhu yeah
# 20:33 
KartikPrabhu ok will attempt now to resolve the img-alt stuff in the spec
# 20:33 
Zegnat Exciting!
# 20:34 
Zegnat I’m probably around for another half hour if you need a second pair of eyes
# 20:34 
KartikPrabhu will put up the proposal on the issue and link here
# 20:35 
KartikPrabhu at some point we would have to resolve all the text stuff floating around in the spec. where should images be replaced etc...
# 20:35 
Zegnat ... yeah
# 20:36 
KartikPrabhu maybe it is resolved and I just need to put it in mf2py correctly
# 20:37 
Zegnat I think it is resolved? IIRC we only replace IMG elements for p-* (and therefore also e-*) parsing, and not in other cases? That feels right to me.
# 20:37 
Zegnat microformats.org is down again for me, so can’t check
# 20:37 
Zegnat Who controls microformats.org? Phae (ping)?
# 20:38 
KartikPrabhu that's why we need BLOCKCHAIN ;)
# 20:40 
Zegnat public domain wikis may be one of the very few examples that might actually work well on a blockchain :P
# 20:44 
KartikPrabhu no, BLOCKCHAIN should always be in all caps :P
vivus, KartikPrabhu, [jeremycherfas], [tantek] and [chrisaldrich] joined the channel; vivus left the channel
# 21:51 
KartikPrabhu ok here is a proposal for changes to img parsing https://github.com/microformats/microformats2-parsing/issues/2#issuecomment-392608361
# 21:51 
Loqi [kartikprabhu] Here are the proposed changes to the spec to account for `alt` attribute.
Add a new section 1.5 with title "parse an `img` element for `src` and `alt`" with the steps
- if `img[alt]`
- return a new `{}` structure with
- `value`: the `src`...
# 21:53 
KartikPrabhu [tantek]: ^ please review
KartikPrabhu and [keithjgrant] joined the channel
# 23:05 
@Noith_CA hentry （マークアップ: http://microformats.org） 構造化データエラー が出たときの対処方法 http://column.noith.com/hentry-error/?utm_source=ReviveOldPost&utm_medium=social&utm_campaign=ReviveOldPost (twitter.com/_/status/1001238067666075648)
webchat162 and KartikPrabhu joined the channel