#ZegnatAlso, there might be a case to be made to not imply any properties because other properties are explicitly marked. I think there was a implied-name discussion about that.
#Loqi[tantek] From the end of the wiki discussion, one straw proposal was:
"any explicit p-* property on an element stops implied p-name"
(this sounds a bit ambiguous and could be reworded, but I think the general intent / principle is workable)
#aaronpkbut yes, the parser does not incorrectly imply that photo property
#ZegnatHmm, that post is interesting. That image definitely should not be removed from the e-content, otherwise there is no content left within the a element that’s in the content.
[kevinmarks] joined the channel
#ZegnatOr well. “definitely” as in my gut feeling, haha
#aaronpkthat's correct tho, because the post is a photo
#aaronpkso consumers will render the photo from the `photo` property
#ZegnatActually, I would argue the markup there is just plain wrong and the u-photo class should be on the a element. Then consumers get the full photo URL.
#aaronpkwith my new XRay parsing, the HTML of that post turns into an empty <a> tag, heh
#ZegnatThe problem I am seeing here with deduping is that the content will be an <a> element linking to the actual full photo but nothing in it so probably not shown in a feed reader
#Zegnatyes, exactly. That’s why I said my gut feeling was that it should not be removed from e-content ;)
#ZegnatActually the content is DIV>A>IMG, since it uses e-*, saying that the HTML is important here.
#aaronpkif there were not a u-photo class on the img tag then I would agree
#ZegnatEven then the content is still DIV>A right? The A might have important rel and href values.
#ZegnatThat’s why I think deduping *in this specific case* is hard. I don’t think leaving the A element empty is more correct than the parser giving back 2 images (one in content and one in photo property)
#aaronpkI do think it's more correct because it looks super broken to show two images, and doesn't look super broken to show one image that happens to not link to the full-size image
#aaronpkpossibly yes, especially if it means consumers of the mf2 data can't use the plaintext value that the parser returns
#ZegnatAlso after reading the discussion in indieweb-dev, I now feel I need to compare PHP’s DOMDocument textContent to the DOM spec.
#ZegnatI wonder if PHP’s textContent is broken, or if we collectively just need different output than the DOM textContent property gives us. The later case would require the spec to be updated to define what textContent is.
#ZegnatOr well, it doesn’t mean that it has any special textContent, is what I should say
#ZegnatIt represents a line break for HTML rendering engines (those ignore \n), but it does not add a line break to the actual DOM in any way that I can find.
nitot joined the channel
#ZegnatGo and Python parsers both rely on DOM textContent. PHP adds magic \n and spaces. Ruby doesn’t parse my test at all (on ruby.microformats.io) and the node one is still down.
#aaronpkI think his compact list is actually pretty nice as is
#tantekit doesn't really show much information, and abstract linked names are less compelling than icons of people
#Zegnattantek, if we are using DOM spec’s textContent (as I assumed, and as I write in the issue) that is fine but should be called out.
#ZegnatAnd if that is decided, a bug should be filed on the PHP parser which isn’t doing so at present. Again because of reasons captured in the issue.
#aaronpkadactio's lack of facepiles is really not the problem here
#tantekZegnat: last time I checked, I thought I referenced the HTML spec in particular, for parsing textContent
#ZegnatHTML spec builds on DOM spec, but yes. So the PHP parser is wrong per-spec, but the PHP parser does what at least 2 users (the one who opened the issue, and aaronpk) want. (And probably what more people want, since other people like glenn and gRegorLove went and implemented it.)
#Zegnats/want/expect/ ... maybe more accurate. Don’t want to state what people “want”, but they did have a proclaimed expectancy.
#ZegnatHTML defines parsing into a node tree, and DOM defines how to handle said node tree and its default attributes. Is how I saw it. Lets just say they are both needed, haha.
#ZegnatEither way, DOM spec defines the textContent getter, and we have users on record saying DOM textContent is not a useful plain-text version of HTML. (Because it isn’t meant to be.) This user feedback has triggered the PHP parser to change its behaviour away from the mf2 spec.
#tantekok that sounds non-trivial and needing some work to resolve
#tantekhave parsers converged on their own "textContent"?
#ZegnatNo. PHP uses their own, which is based on (maybe the same as?) the one included in the JS microformats-shiv.
#ZegnatPython and Go seem to use DOM’s textContent.
#ZegnatTest case and parser results all in the GitHub issue
#ZegnatThanks for adding the node parser output gRegorLove. I guess glennjones’ innerText implementation sits behind a flag while in the PHP parser it is the default?
#ZegnatThat might be worth noting too, if that is the case.
#sknebelhas a checkbox on the webinterface, doesn't seem to change the output here
#gRegorLoveI didn't think so, but it's been quite a while since I looked at Node's innerText
#ZegnatHmm, I would have thought my test case should have triggered it. But I didn’t spend too much time looking at the node parser.
#gRegorLoveOops, wrong method name. innerText() in php-mf2, links to mf-shiv github
#Zegnatp- says you replace img and drop style/script. e- only says you replace img.
[keithjgrant] joined the channel
#gRegorLoveHm, the nested object in name is really weird there too
#ZegnatWhy? That’s what you get when you do e-name. Every parser supports that.
#gRegorLoveIt's weird in combination with the p-name
#ZegnatI could have used p-contentone and e-contenttwo to separate the properties, but this gave a more fun parser output.
#ZegnatBesides, all parsers I just tested handled that without problem. It is just that most drop the SCRIPT element on e-, which is likely what the spec intended.
#ZegnatOnly Python seems broken, it doesn’t drop SCRIPT at all, not in p- either.
#gRegorLoveI get the spec bug, yeah. I don't think I'd seen or contrived the scenario before where a property gets a string value and an object (where the object isn't nested with 'children')
#sknebelthat was just to show the difference in handling I guess
#ZegnatTheoretically the test case is just <div class="h-x">Hello <script>beautiful </script>person</div>, which should return an implied name property "Hello beautiful person", since SCRIPT should not be removed on implied name either.
#ZegnatBut mine is just more fun, and clearly shows the specific handling difference between e- and p- when they have exactly the same content.
#Zegnat(So much the same that they are on the same element.)
#ZegnatAh, good to know. I have a post-it on my desk saying to reread that issues page and finish it once and for all, but I have exams and had to postpone :(