#microformats 2017-04-27

2017-04-27 UTC
[cleverdevil] and [eddie] joined the channel
#
ben_thatmustbeme
Woo, making great progress on my rewrite of the parser
#
aaronpk
wow awesome
#
ben_thatmustbeme
Super basic parsing is already working.
#
gRegorLove
ben_thatmustbeme++
#
Loqi
ben_thatmustbeme has 2 karma in this channel (203 overall)
tantek joined the channel
#
ben_thatmustbeme
It's actually pretty interesting as I'm learning little edge cases of microformats I didn't know about
#
gRegorLove
I think I need some clarification on the implied URL parsing related to: https://github.com/indieweb/php-mf2/issues/110
#
Loqi
[gRegorLove] #110 Fix implied u-url when multiple links
#
gRegorLove
"else if .h-x>a[href]:only-of-type:not[.h-*], then use that [href] for url" from http://microformats.org/wiki/microformats2-parsing##if+no+explicit+%22url%22+property
#
Loqi
[Tantek Çelik] microformats2 parsing specification
#
gRegorLove
".h-x > a[href]:only-of-type" means .h-x has only one direct child <a>, correct?
#
tantek
has only one direct child that is an <a> tag with an 'href' attribute
#
gRegorLove
Meaning, :only-of-type doesn't restrict sibling elements from having <a> as children
#
tantek
correct - haven't run into that case though
#
gRegorLove
See the github issue. The second link is inside a sibling <b>
#
tantek
interesting
#
gRegorLove
Maybe a product of weird MediaWiki formatting
#
gRegorLove
(Speaking of edge cases, ben_thatmustbeme. Heh)
#
gRegorLove
tantek: So is the parser technically correct in this example?
#
ben_thatmustbeme
I haven't gotten to much of the implied properties part yet. May get messy, not sure yet
#
gRegorLove
mf2py also returns the implied URL for that HTML
#
gRegorLove
And microformat-shiv
#
ben_thatmustbeme
I suppose it would, assuming the > means direct decendant in the html
#
ben_thatmustbeme
I suppose it would be correct
#
tantek
is distracted by some in person things - may have to look into later tonight
#
ben_thatmustbeme
And it doesn't mean descendants that are not inside sub [h,p,e,dt,u]-*
#
gRegorLove
Yeah, it's direct descendant afaik.
#
gRegorLove
Reasoning probably being to prevent really unexpected implied values
#
gRegorLove
Yeah, moving the </b> to the end gives no implied URL
#
ben_thatmustbeme
So the conclusion is, stop issues <b> tags already
#
ben_thatmustbeme
Also if it weren't direct descendants the parsing would get way more messy
#
KartikPrabhu
gRegorLove: is mf2py giving the correct implied URL not the one in the <b>
#
KartikPrabhu
and so is pin13
#
KartikPrabhu
so they seem to playing by the parsing rules
#
KartikPrabhu
if you put a u-url on the /2017/Bellingham link then they both return that link as expected
[chrisaldrich], nitot, [tamaracks], [eddie], tantek and [jeremycherfas] joined the channel
#
gRegorLove
KartikPrabhu: The HTML's already been fixed to get the desired u-url explicitly. the issue appeared to be php-mf2 not following the implied u-url algorithm correctly.
#
KartikPrabhu
aah ok. I was wondering if mf2py is doing it right, and I think it is
#
gRegorLove
But after review, it appears it is parsing correctlly, just the weird HTML didn't give the desired u-url as a result
#
gRegorLove
All of the parsers are doing it, and it appears all it takes is moving the </b> to the end, then no implied u-url
#
KartikPrabhu
yeah, that is what the parsing-algo says atm
#
gRegorLove
So pretty sure there's no parsing bug. Will await tantek's confirmation to be sure.
#
KartikPrabhu
also, traversing down children of h-* is going to be very annoying
#
gRegorLove
Yeah, the more I looked at it, the reasoning for the very strict implied algo makes sense
#
gRegorLove
short version: if you really want the property, add it explicitly :)
#
KartikPrabhu
yeah I think that is true for more complex markup
#
KartikPrabhu
but implied-properties are cool too :P
#
gRegorLove
!tell tantek summarized the conversation on github: https://github.com/indieweb/php-mf2/issues/110
#
Loqi
Ok, I'll tell them that when I see them next
#
Loqi
[gRegorLove] #110 Fix implied u-url when multiple links
[johnholdun], [kevinmarks], nitot, [colinwalker], [jeremycherfas], [pfefferle] and tantek joined the channel
#
@rashidnoorani
http://schema.org for all types of researched predefined #schemas. #gids17 #microformats.
(twitter.com/_/status/857514688665579520)
#
tantek
lol "researched"
#
Loqi
tantek: gRegorLove left you a message 3 hours, 34 minutes ago: summarized the conversation on github: https://github.com/indieweb/php-mf2/issues/110
#
tantek
!tell gRegorLove thanks!
#
Loqi
Ok, I'll tell them that when I see them next
nitot, adactio, rodolfojcj, barpthewire, KartikPrabhu and tantek joined the channel
#
ben_thatmustbeme
hmm, noticed a difference between pin13 and unmung as far as stripping whitespace
#
ben_thatmustbeme
specifically the html:
#
KartikPrabhu
before the <p> tag?
#
KartikPrabhu
that might be due to the HTML parsers used and not the mf2 part
#
KartikPrabhu
in fact pin13 removes the next line \n in the value and ummung does not
#
KartikPrabhu
ben_thatmustbeme: what is your HTML so I can try it on my mf2py
#
Loqi
some such thing
#
Loqi
you're welcome, ben_thatmustbeme
#
ben_thatmustbeme
hands loqi the dictionary entry on sarcasm
#
KartikPrabhu
interesting, my mf2py installation preserves the space before <p> in html property and keeps the \n in the value property
#
ben_thatmustbeme
likelty some of this is due to what is considered whitespace by the language
#
ben_thatmustbeme
though some don't try to strip at all, others do
#
ben_thatmustbeme
or rather what the stripping function considers whitspace
#
ben_thatmustbeme
trying to understand the .e-*.h-* interaction in my parser, making me rethink a few things
#
ben_thatmustbeme
would that be the only time you can have anything other than type, properties, children and value?
#
ben_thatmustbeme
is having an html as well
[chrisaldrich], [kevinmarks], [jeremycherfas], rodolfojcj and nitot joined the channel
#
Loqi
[Tantek Çelik] microformats2 parsing specification
#
ben_thatmustbeme
i'm confused what the difference is between the name and photo sections for example
#
ben_thatmustbeme
.h-x>img:only-child[alt]:not([alt=""]):not[.h-*]
#
ben_thatmustbeme
vs .h-x>img[src]:only-of-type:not[.h-*]
#
ben_thatmustbeme
just getting lost in them a bit
gRegorLove, rodolfojcj and [kevinmarks] joined the channel
#
gRegorLove
ben_thatmustbeme: First one means: .h-x with an img[src] as its only child where the alt is not empty and the img does not have an .h-x
#
Loqi
gRegorLove: tantek left you a message 7 hours, 40 minutes ago: thanks!
#
gRegorLove
Second is: .h-x with only one img as a child and the img does not have .h-x
#
ben_thatmustbeme
"with an img[src]" mean with and image with a src attribute
#
ben_thatmustbeme
dang, i just wrote this as only-of-type instead of only-child
#
ben_thatmustbeme
i think it was the difference in ordering that was confusing me
#
ben_thatmustbeme
img:only-child[alt] vs img[src]:only-of-type
KartikPrabhu joined the channel
#
ben_thatmustbeme
last questions gRegorLove to make sure i have this right,
#
ben_thatmustbeme
.h-x>img:only-child[alt]:not([alt=""]):not[.h-*]
#
ben_thatmustbeme
if it has more than one img tag, say 4, one has h-*, one has no alt, one has an empty alt, one has a non-empty alt and no h-*....
#
ben_thatmustbeme
oh wait, only, ONLY CHILD, basically cuts that all
#
ben_thatmustbeme
i guess thats a question for only-of-type
#
ben_thatmustbeme
but i'm just going to assume its actually only of that type, not only of that with that has attribute ...
#
gRegorLove
Correct, I'm pretty sure only-of-type applies only to the selector it comes after, not the following attributes
#
KartikPrabhu
yes, that's how it works in CSS too
#
gRegorLove
Are you using xpath in the parser?
#
ben_thatmustbeme
its using nokogiri and i'm descending the tree myself
#
ben_thatmustbeme
though i suppose that might make more sense huh
#
gRegorLove
Maybe, not sure. Was just going to suggest php-mf2 has several of them, like in parseImpliedPhoto()
#
ben_thatmustbeme
i sort of don't want to look directly at other parsers, lest it confuse me more
#
gRegorLove
Haha, fair enough.
#
Loqi
gRegorLove: lol
#
KartikPrabhu
ben_thatmustbeme: that is actually a good idea. independently written parser might find inconsistencies in the already existing ones
#
ben_thatmustbeme
*write a big pile of code to handle implied properties* *rerun tests* *number changes from 56 failers to 55 failures* *SIGH*
#
ben_thatmustbeme
yeah, that was the other reason
#
KartikPrabhu
ben_thatmustbeme: also please document the "space collapsing" difference you found.
#
ben_thatmustbeme
sure, where?
#
KartikPrabhu
err good point :P
#
gRegorLove
May be related to https://github.com/indieweb/php-mf2/issues/69? Haven't checked the HTML you're referring to
#
Loqi
microformats2-parsing-issues
#
Loqi
[ghost] #69 `<br>` between `<span>` tags are not interpreted as whitespace
#
Loqi
some such thing
[colinwalker], rodolfojcj, [chrisaldrich], [eddie], [cleverdevil], tantek and [manton] joined the channel
#
ben_thatmustbeme
\me wipes brow, failing on 43 of the 92 tests now but i'm only testing the v2 folder yet
#
ben_thatmustbeme
pretty good progress though
#
ben_thatmustbeme
https://raw.githubusercontent.com/microformats/tests/master/tests/microformats-v2/h-card/nested.html curious on this one, I don't see why the child h-card h-org has a value attribute
#
Loqi
Mitchell Baker
#
KartikPrabhu
ben_thatmustbeme: all h-* get atleast a value
#
KartikPrabhu
so people can use value as fallback text representation for any h-* in case they don't understand the particular vocabulary
#
ben_thatmustbeme
except for those in items[] ?
#
KartikPrabhu
I think all h-* get a value
#
KartikPrabhu
do you have an example?
#
ben_thatmustbeme
the parsing for that one
#
ben_thatmustbeme
also, not finding the part in the parsing spec of where it gets that value from
#
ben_thatmustbeme
i see it for if .p-*.h-* etc
#
KartikPrabhu
oops maybe I mispoke
#
KartikPrabhu
mf2py does not give value for that markup in any h-*
#
KartikPrabhu
value is for e-* things I think, so you have html property and a value property for plaintext representation
#
KartikPrabhu
strange pin13 i.e. php-mf2 does give a value just like the tests!
#
Loqi
[Tantek Çelik] microformats2 parsing specification
#
ben_thatmustbeme
so value is used inif p-*.h-* e-* u-*.h-*
#
ben_thatmustbeme
that section under value: is not terribly clear
#
KartikPrabhu
but that is only if the child microformat is also a property
#
KartikPrabhu
in this example markup it sin't
#
ben_thatmustbeme
i don't see anywhere that value: should be set for children
#
KartikPrabhu
might be a bug in the tests, maybe leave a !tell to tantek to confirm
nitot joined the channel
#
KartikPrabhu
but then either php-mf2 is wrong or mf2py is
#
KartikPrabhu
ben_thatmustbeme++ for thorough checking of mf2 tests
#
Loqi
ben_thatmustbeme has 3 karma in this channel (204 overall)
#
ben_thatmustbeme
not sure what unmung uses
#
KartikPrabhu
mf2py i am guessing
#
KartikPrabhu
so it doesnot have the "value"
#
ben_thatmustbeme
i'm basing all of this parser on the tests, so if it doesn't pass things, i'll know
#
KartikPrabhu
yes, that is good. you are simultaneously checking the tests, the spec and other parsers :P
#
KartikPrabhu
I think I did something like this while writing code for mf2py :P
#
KartikPrabhu
but now have forgotten everything
#
ben_thatmustbeme
!tell tantek hitting what is either an error in the mf2 tests and a bug in php-mf2 or something missing in the spec and a bug in mf2py. children elements seem to be getting a value: set, but not sure why. https://github.com/microformats/tests/blob/master/tests/microformats-v2/h-card/nested.html
#
Loqi
Ok, I'll tell them that when I see them next
#
ben_thatmustbeme
!tell tantek h-card/nested.html parses without value for child h-org h-card in mf2py and with one via php-mf2
#
Loqi
Ok, I'll tell them that when I see them next
#
ben_thatmustbeme
this might actually answer a LOT of my non-passing tests
#
ben_thatmustbeme
just looking through
#
ben_thatmustbeme
my only real points left to add are proper date parsing, and backcompat... i think
#
ben_thatmustbeme
this one is wrong in the other direction, p-affiliation h-card should have a value
#
ben_thatmustbeme
at least the parsers seem to all agree on that one, pretty clear thats a bug in the test
tantek joined the channel
#
KartikPrabhu
ben_thatmustbeme: yup that one does seem like a bug
#
KartikPrabhu
and now I recall the logic
#
KartikPrabhu
if some h-* that you understand has a property which is a h-*2 that you don't understand, then you can use the "value" directly
#
KartikPrabhu
which is also why the "value" is generated depending on the property type
tantek, [chrisaldrich], [ianmjones] and nitot joined the channel
#
gRegorLove
Yeah, looks like php-mf2 is incorrectly always setting the 'value' for a nested h-*: https://github.com/indieweb/php-mf2/blob/master/Mf2/Parser.php#L870
#
KartikPrabhu
gRegorLove: so do you agree this is a problem in the tests and php-mf2 and the mf2py seems to be following the spec?
[mko] joined the channel
#
gRegorLove
Need to wrap it in a conditional check for mf property classes
#
gRegorLove
mf2py (unmung) seems to be following the algo, no 'value' in the child.
#
KartikPrabhu
ok, could you file bug on both php-mf2 and spec?
#
gRegorLove
php-mf2 appears to have a bug, always adding the 'value' regardless if it's a property
#
gRegorLove
Don't think there's a spec issue
#
KartikPrabhu
sorry tests not spec
#
KartikPrabhu
4/5-letter wrdos are hrad
#
gRegorLove
Aha, already an issue. I thought this sounded familiar. https://github.com/indieweb/php-mf2/issues/98
#
Loqi
[gRegorLove] #98 Parsing "value" for nested microformat when it's not a property
#
KartikPrabhu
aah is there an issue on the tests?
sknebel_ and [ianmjones] joined the channel
#
KartikPrabhu
cool, I thumbs-upped it
#
gRegorLove
Looks like there was a similar fix for another test: https://github.com/microformats/tests/pull/53
#
Loqi
[willnorris] #53 impliedname: remove 'value' in nested microformat
#
gRegorLove
Woo, down to 8 open issues in php-mf2
edsu joined the channel
#
gRegorLove
Once I add rel-urls I can start using the test suite more seriously
rodolfojcj and tantek joined the channel