#microformats 2018-05-31

2018-05-31 UTC
tantek, TimC, j12t and gRegorLove joined the channel
#
Loqi
[gRegorLove] #177 Fix XPaths used in implied photo parsing
[chrisaldrich], [Natris1979], KartikPrabhu, [kevinmarks], tantek_, adactio, [jgmac1106], [mrkrndvs] and snarfed joined the channel
#
snarfed
morning all! i have an mf2 implied p-name question (apologies in advance :P)
#
Loqi
snarfed: KartikPrabhu left you a message on 2018-03-10 at 5:34am UTC: see: https://snarfed.org/2018-03-09_backcompat-mf2py-parallel-transport#comment-2618809 ;)
#
snarfed
given this html: <p class="h-card"> My Name <img src="http://xyz" /> </p>
#
snarfed
according to the current mf2 parsing spec, http://xyz should be included in the implied p-name...right?
#
snarfed
so weird and bad
#
snarfed
remind me where in the spec that is? i haven't found it in http://microformats.org/wiki/microformats2-implied-properties
#
KartikPrabhu
in there "else use the textContent of the .h-x for name after:"
#
Loqi
[Tantek Çelik] microformats2 parsing specification
#
KartikPrabhu
aah yes not the previous URL
#
snarfed
sooo ugly. honestly i can't realistically upgrade to the new mf2py with this behavior. i generally expect p-name to be human readable, but this often results in long ugly URLs included in names, which is not ok for bridgy, granary, or other downstream users
#
KartikPrabhu
maybe open a parsing bug since you have a use-case
#
KartikPrabhu
possibly we should not use the img replacement for implied name?
#
KartikPrabhu
what about the "alt" replacement?
#
aaronpk
wait why would the src attribute end up in the implied name?
#
KartikPrabhu
because textContent rules
#
aaronpk
that makes no sense tho lol
#
sknebel
because the textContent language got updated everywhere
#
sknebel
and nobody apparently noticed that problem
#
aaronpk
like snarfed said, URLs (especially img URLs) are usually not human readable
#
KartikPrabhu
sure. that is how the spec is. please file parsing issues. I think having the same textContent everywhere is a problem
#
snarfed
does anyone here feel strongly that img srcs (when there aren't alts) *should* go into implied p-name?
#
snarfed
(btw KartikPrabhu i definitely didn't mean that you did anything wrong! hence asking about the spec here, instead of on mf2py itself)
#
snarfed
hell, aaronpk even thinks img *alts* shouldn't go into implied p-name either. https://github.com/microformats/microformats2-parsing/issues/16
#
snarfed
(i agree)
#
Loqi
[aaronpk] #16 consider not including img alt text as part of surrounding text properties
#
aaronpk
oh yeah i'm still stuck on that one huh
#
Loqi
[Tantek Çelik] microformats2 parsing specification
#
Zegnat
Basically it made implied name the way it would be had there been a .p-name wrapping the same content
#
Loqi
[Tantek Çelik] microformats2 parsing specification
#
sknebel
so seems like my mistake mentioning it that nobody caught
#
snarfed
sounds like no one (here now at least) particularly wants to keep img srcs or alts in implied p-name
#
snarfed
shall we file an issue? or expand aaronpk's to include src?
#
Zegnat
aaronpk, writes “It would only be included as part of implied values” ... so does that issue really want to even remove alt from the implied parsing?
#
sknebel
make a new one
[cleverdevil] and [tantek] joined the channel
#
Loqi
[snarfed] #35 proposal: drop img src (and alt) from implied p-name
#
snarfed
please 👍!
mandy, [Mandy_Honeyman] and KartikPrabhu joined the channel
#
KartikPrabhu
this can be fixed relatively easily in mf2py since my textContent basically has a flag for this which I can turn off
#
Zegnat
I am not sure how I feel about completely dropping alt. I kinda feel like the plain-text version of a thing including something that would also be read out loud by ATs makes sense.
#
Zegnat
But maybe not for implied name specifically ...
#
[jgmac1106]
Already forgot what I thought I knew yesterday. In this example I do not need the u-photo since the photo is not main purpose of the note? https://jgregorymcverry.com/3017-2/
#
Loqi
Hello #doo #digped #edtechchat #ds106 #clmooc #engchat #literacies @scsu #ctu friends I am going to be hosting weekly #IndieWeb Blogging 101 session every Wednesday night starting at 8:30 (or closest to bedtime for kids what ever comes last) for the... https://media2.giphy.com/media/3otO6xQxvlzQAAyhLG/giphy.gif
#
Zegnat
Exactly
#
sknebel
so where do we encounter implied name? are there actual cases outside h-cards? even very minimal h-entry note markup likely doesn't fall under implied name anymore, or is likely to look wrong
#
Zegnat
That photo just happens to be part of the post content
#
[jgmac1106]
well if u-photo make loqi display it then, yes, yes I do. Just don't know if its correct
#
[jgmac1106]
would Loqi have pulled the image if it wasn't u-photo?
#
Zegnat
Probably not. Should it have? It isn’t supposed to be a photo post, right?
#
sknebel
so lets think about the h-card case, are there cases where we need the alt?
#
Zegnat
sknebel, I don’t have the data so I don’t know. For all I know, someone has done <a class="h-card" href="/"><img src="photo.jpg" alt="Martijn"></a>.
#
sknebel
that's covered by the rules
#
sknebel
> else if .h-x>img:only-child[alt]:not([alt=""]):not[.h-*] then use that img’s alt for name
#
Zegnat
Aah. Hmm.
#
sknebel
(was wondering about that case too)
#
Zegnat
Haha, I thought you were surprisingly fast on the weird selector to refute me :P
#
sknebel
so would someone replace *part of* their name with an image?
#
Zegnat
Yes. That would be the only reason. And I would tend to say “no”.
#
Zegnat
So dropping images completely makes sense then
#
sknebel
(e.g. Twitter) usernames with emoji in them, with the emoji replaced with images. but there's a good chance the alt-text for the emoji is a textual description
#
[jgmac1106]
i think though in my note example I am missing h-entry. Not sure where gwg puts in on the note
#
[jgmac1106]
will ask in Wordpress
#
Zegnat
I have also seen alt text of emoji be :github_slack_emoji_name:, which I would argue is even worse to keep around
#
sknebel
and before the last change, parsers didn't include the alt either
#
sknebel
(at least the python one didn't)
#
sknebel
so there's no old examples being broken
#
Zegnat
I think PHP also only did raw textContent
#
sknebel
so no, *i* can't come up with a reason to keep alt there
#
Zegnat
Every change we will ever do that touches on what we mean with textContent will break old examples, I think.
#
Zegnat
But yeah, you have talked me ’round. Especially with only an IMG alt being covered by the algo already. (Which I hadn’t looked up yet. My mistake.)
#
Zegnat
I’ll comment and link this conversation.
snarfed1 joined the channel
#
Zegnat
Thought process logged on the issue. 👍 added to snarfed :)
#
snarfed1
thanks! and responded
chrisaldrich joined the channel
#
Zegnat
Aa. Hmm. Do you have any HTML usecases where it fails for explicit p-name, snarfed1?
#
snarfed1
Zegnat: not sure what you mean by fails...?
#
Zegnat
As in: the parser gives a result you did not expect
#
snarfed1
i don't expect img srcs to end up in parsed name values, whether or not p-name was explicit
#
sknebel
but that wasn't changed recently
#
Zegnat
Yes, but mf2 parsing isn’t vocab aware. It doesn’t know the difference between p-name and p-content.
#
snarfed
the spec wasn't changed recently, but my parser was :P
#
sknebel
oh, mf2py didn't do that before?
#
Zegnat
And I would be opposed to introducing vocab aware parsing for explicit p-* to make “name” different
#
Zegnat
So dropping alt/src from p-name === dropping alt/src from all plain text values.
#
Zegnat
And I feel like that is a pretty big change. And possibly not warranted.
#
snarfed
Zegnat: ok. sorry, not sure what to tell you. i don't really deeply follow the details of the parsing spec. i'm just describing the behavior that's desirable and undesirable as a parser user.
#
Zegnat
Yes, I understand. That is why I am asking for examples where explitic p-name is parsed against your expectation.
#
sknebel
oh, mf2py did indeed not do that before. awkward
#
Zegnat
Because if that is a very rare occurance, I would keep the change to implied name only.
#
snarfed
i'd actually love to see counterexamples, ie where including img src in either name or content is useful or meaningful to human readers
#
snarfed
i expect they're very rare, if even any at all
#
sknebel
ask aaronpk or tantek, that's older than my involvement with the community
#
snarfed
Zegnat: your example HTML in https://github.com/microformats/microformats2-parsing/issues/35#issuecomment-393615508 is parsed against my expectation :P
#
Loqi
[Zegnat] To summarise the previous change: we normalised text parsing to use the exact same wording for `p-*`, `e-*`’s `value`, and implied name. Originally to make sure `<script>` and `<style>` tags would get correctly dropped everywhere, but in the end al...
#
Zegnat
As someone who browses the web with images turned off, I have gotten used to seeing alt text. So I personally like that. src is indeed questionable.
#
sknebel
depends what you do with it. if you autolink, you get to see the image in jgmacs post above at least as link, otherwise its gone completely
#
Zegnat
Yep, which is why I am not opposed to making implicit and explicit name parsing different again, snarfed. But I don’t expect anyone to have ever written that HTML in the wild, is whay I am saying
#
sknebel
so it kinda makes sense for cotnent
#
snarfed
ahhhh ok. so if the q is, find HTML in the wild where someone includes an img inside a p-name, then no, i don't have that offhand. i still don't get why we'd specify something we all pretty much agree is wrong, just because we think it's rare
#
sknebel
do we agree its wrong? and unlike only reverting it for implied names, changing p- parsing globally would be a big change
#
Zegnat
No, I think it is 100% right. I want to read alt text, just as I do in my day-to-day browsing.
#
Zegnat
I think there is a case for skipping it in implied name values, where the user hasn’t explicitly marked any HTML as expressing the name. I do not agree with dropping all mention of an image when it is part of, say, p-content
#
snarfed
heh. your day to day browsing is very unusual. may not be the best argument for how parsers should behave
#
snarfed
but again, we're primarily talking about src here, not alt
#
sknebel
note with an image. do you want to eliminate all traces of the image in it from the parser output?
#
snarfed
no clue. i'm not sure that adding the URL to content or name is net positive though.
#
sknebel
anyways, I'd say keep the current issue about implied p-name. we can probably quickly revert that everywhere, confident not to break anything
#
snarfed
agreed
#
sknebel
the bigger discussion, check how that relates to aaronpks issue and/or make a new one
#
sknebel
because that changes a lot
#
sknebel
(I just tested an older version of the php-parser, it did do it as spec-ed, so it's less likely anyone relied on mf2py's behavior of not following the spec there)
#
Zegnat
It is interesting to me that mf2py never followed the spec on that point though.
KartikPrabhu and snarfed1 joined the channel
#
Zegnat
Yes? That is implied name from before the spec change, right?
#
Zegnat
So uses pure textContent
#
sknebel
no, it uses the alt
#
sknebel
due to img only child rule I guess
#
Zegnat
Huh. Right. That’s a bug, haha
#
Zegnat
Might be related to how the XPath for implied photo turned out to be wrong
#
Zegnat
I *really* want a proper test suit now :(
#
Zegnat
rereading what I wrote before so I could add to the github issue, I apologise if it seemed I was coming on a bit strong snarfed!
#
Zegnat
sknebel, mind posting that alt-used-as-name thing to the php-mf2 issue tracker?
KartikPrabhu joined the channel
#
snarfed1
Zegnat: not at all! i was too :P no worries, i don't take any of this personally
barpthewire joined the channel
#
sknebel
Zegnat: is it wrong? What does "only-child" mean exactly?
#
sknebel
(had to switch to phone, so can'tlook stuff up well right now)
#
Zegnat
will look for the official definition of :only-child to find out
#
Zegnat
Looks like it is right, sknebel! Text nodes aren’t counted as children for CSS selectors, it looks like
#
Zegnat
That feels a little unexpected.
#
Zegnat
So if you have a line of text, if it happens to contain an <abbr>, its title attribute will be the implied name.
#
Zegnat
(Also for things-I-did-not-know, :only-child is defined as `:first-child:last-child`)
#
sknebel
That seems not to be the intention?
#
sknebel
I really wish we had a list of examples that showed what case the rules are intended to cover
#
Zegnat
My gut feeling is that it is unintended, yeah.
KartikPrabhu, snarfed, chrisaldrich, vivus, [cleverdevil], [chrisaldrich] and jalcine joined the channel; snarfed, KartikPrabhu and vivus left the channel