#microformats 2015-06-08

2015-06-08 UTC
tantek, KevinMarks__, KevinMarks_, fuzzyhorns, KartikPrabhu, danielfilho, KevinMarks and krijnhoetmer joined the channel
#
@XJINE
書いた。各フォーマットと比較で書こうと思ったけど、長すぎた。>「microformats について : 利点と欠点」 http://neareal.com/1619/ #HTML #SEO
(twitter.com/_/status/607757390851678208)
Soopaman, fuzzyhorns, gRegorLove, Left_Turn, KevinMarks__, ChiefRA, kez, KevinMarks_, eschnou, glennjones, KartikPrabhu, chiui, adactio and csarven joined the channel
#
@ProjectPeachUK
We've #played with #microformats. Love the #idea of #marking up our #business data to #machines as well as #humans! #biznoticeUK #SMO
(twitter.com/_/status/607852962426327040)
Left_Turn and KevinMarks_ joined the channel
#
@Bobby6740
RT @ProjectPeachUK: We've #played with #microformats. Love the #idea of #marking up our #business data to #machines as well as #humans! #bi…
(twitter.com/_/status/607863742806802432)
KartikPrabhu and benborges joined the channel
#
@ProjectPeachUK
We've #played with #microformats. Love the #idea of #marking up our #business data to #machines as well as #humans! #JS #fpsbs #think
(twitter.com/_/status/607902036139614209)
#
@InfusedReTweets
RT @ProjectPeachUK: We've #played with #microformats. Love the #idea of #marking up our #business data to #machines as well as #humans! #JS…
(twitter.com/_/status/607902357326823425)
fuzzy_horns and KartikPrabhu joined the channel
#
@GraphicInfusion
RT @ProjectPeachUK: We've #played with #microformats. Love the #idea of #marking up our #business data to #machines as well as #humans! #JS…
(twitter.com/_/status/607910910448037888)
TallTed, KevinMarks__, KartikPrabhu, KevinMarks_ and KevinMarks___ joined the channel
#
@JRs_partsonline
RT @ProjectPeachUK: We've #played with #microformats. Love the #idea of #marking up our #business data to #machines as well as #humans! #JS…
(twitter.com/_/status/607930743084490752)
glennjones and gRegorLove joined the channel
#
Loqi
tommorris: tantek left you a message on 6/1 at 5:16pm: your comments would be welcome here too: http://microformats.org/wiki/microformats2-parsing-issues#drop_alternates_collection_and_include_them_in_rels
#
@amazingmap
Amazingly comprehensive map of every country in the world that uses the MMDDYYYY format https://twitter.com/amazingmap/status/599931666803597312/photo/1
(twitter.com/_/status/599931666803597312)
#
aaronpk
hahaha
KevinMarks_, KevinMarks__, tantek, KevinMarks, KartikPrabhu, eschnou and GWG joined the channel
#
kylewm
do y'all know why h-cite recommend p-content instead of e-content on http://microformats.org/wiki/h-cite#Properties? I would like to parse out reposts with markup and images from acegiak.net
Zegnat joined the channel
#
gRegorLove
Perhaps because the simplest case is to cite just the text, sans markup. The example given there is "short text notes"
#
tantek
I suppose the question there is whether a repost should be using h-cite at all
#
tantek
since a repost is intended as *post* of the original content
#
tantek
kylewm: that is, asking for e-content is solving the wrong problem IMO - it may be expedient, but it's not actually helping the markup be correct
#
kylewm
I've gone through a couple iterations of my own markup with reposts actually
#
kylewm
right now i have an u-repost-of h-cite INSIDE the e-content of the post itself
#
kylewm
that way you can include the permalink of the original post as well as the url of your repost
#
tantek
that actually makes more sense
#
tantek
than e-content on h-cite
#
@ronaldwidha
So how many format does a publisher need to support now, Apple News, HTML, RSS, Newsstand EPUB, Facebook Instant Articles, microformats
(twitter.com/_/status/607994487579754496)
#
kylewm
(I have both right now... so like h-entry > e-content > u-repost-of h-cite > e-content
chiui joined the channel
#
tantek
that's e-content *inside* h-cite, not e-content *on* h-cite
KevinMarks joined the channel
#
glennjones
KevinMarks: Could you have a quick look at the 4 issues aginst your test changes https://github.com/microformats/tests/issues - they are all to do with when to trim whitespaces. I am happy to make corrections just want known if you think I am right or not
#
Loqi
glennjones: KevinMarks left you a message 6 days ago: said text was OK as a single string, but kyle's issue of pointing to the same URL twice may suggest we need an array. Are there in the wild exampels fo linking to the same URL wiht multiple types? I think that is contrived, but linking to the same URL with multiple texts is quite plausible
#
KevinMarks
reading now - I thought trimming whitespace was expected in general, but I may have over-interpreted
#
glennjones
As far as I can see its only mention one on the parsing page in the “implied name rules” ie “drop leading & trailing white-space from…”
#
KevinMarks
IMO, strip leading/traing universally is always good, and collapse interior whitespace in implied name makes sense
#
KevinMarks
but need more votes
#
tantek
hmm - KevinMarks that issue doesn't make much sense
#
tantek
examples are both incorrect and theoretical
#
tantek
so it is "premature to say we are going with the tenor of"
#
KevinMarks
"trim leading/trailing" is consensus so far
#
tantek
sorry - example is not broken, just confused where the p-name was
#
tantek
KevinMarks: and white-space collapsing in general for implied names
#
KevinMarks
I cna switch to these examples if you prefer
#
tantek
would be preferable since I don't even know of any h-review-aggregate posts yet
#
glennjones
I have add it as an option to my parser just in case we move to all property text at some stage, but at moment I am trying to get the test to match current rules
#
glennjones
The test are a real mixture of old real world examples, a few taken from wiki examples and a hand full made up to help parser authors
#
glennjones
Happy to see them replaced over time with indieweb patterns of use, but I need to get the current set in working order first
KartikPrabhu joined the channel
#
tantek
glennjones - get the current set in working order first makes sense
#
glennjones
KevinMarks: you OK with me updating your changes? I known you must of put a lot of time into the pull request.
#
KevinMarks
Is there any pushback from anyone on leading/trailing stripping for p- properties?
#
KevinMarks
'cos I'd rather make that 1 line change to the parsing spec at least
#
tantek
did you implement that?
#
tantek
and what about implied-name whitespace normalizing?
#
KevinMarks
yes, implemented in mf2py
#
KevinMarks
glenn has implied whitespace behind a flag I think?
#
KevinMarks
mf2py doesn't have implied whitespace normalising yet
#
tantek
glenn what is your opinion of trim leading/trailing whitespace on p-* properties?
#
tantek
KevinMarks: right now we don't really have consensus - we have lack of opinions
#
glennjones
Yes I have it not just p-* but all properties - but yes behind a falg switch off by default
#
tantek
the only people agreeing explicitly are you (proposer of issue/solution), and me (spec editor)
#
KevinMarks
glenn didn't vote, but agreed in prose
#
tantek
would prefer to have at least one more parser dev opinion
#
tantek
interesting re: for all properties
#
tantek
glennjones, could you explain why "all properties" is better than just for p-* ? (honestly curious)
#
KevinMarks
I think I wanted it for dt too
#
tantek
or is it easier to implement? or?
#
tantek
KevinMarks: proposal in the wiki that you and I +1 is " keep as is but have mf2 parser trim leading/trailing whitespace" which does not limit to p-* properties
#
kylewm
is there ever any such thing like "first name: <span class="p-name">Kyle </span> last name: <span class="p-name">Mahan</span>"
#
tantek
kylewm: hopefully not - for many reasons
#
tantek
e.g. should be more like "given name: <span class="p-give-name">Kyle </span> family name: <span class="p-family-name">Mahan</span>"
#
kylewm
I can +1 stripping leading/trailing whitespace for all properties, but I agree with Kevin's comment that ot dpesm
#
kylewm
that it doesn't help with our test case failures
#
kylewm
all properties except maybe e-content
#
tantek
well if we can at least get that resolved we can move forward with spec edit and implementation updates
#
tantek
why not e-* content as well?
#
glennjones
I think I did across all properties just to safe guard against me adding leading/trailing single spaces by mistake. Not sure I work out all the possible impacts, but it was how my parser has work for last two years with any noticalbe problem
#
tantek
all the same arguments apply - in terms of having to put markup on separate lines from the thing being marked up
#
KevinMarks
right, the <pre> case in e-content would be inside it
#
kylewm
<pre> -- that's what i meant rather than e-content
#
kylewm
thanks KevinMarks
#
KevinMarks
if it's in <pre> it shouldn't be parsed as html
KartikPrabhu joined the channel
#
glennjones
Would you trim the value part of e-* output?
#
KevinMarks
I meant if you had <div class="e-content">\n<pre>\n\nhello\n</pre></div> you are ok losing the \n before the <pre>
#
tantek
KevinMarks: more likely that that \n is extra from the publishing system that you have no control over
#
KevinMarks
yes exactly
elux joined the channel
#
tantek
thus you want to get rid of it
#
tantek
since it's not from the author
#
tantek
if you really must include an extra return like that, you can
#
tantek
so I'm leaning towards glennjones position of all properties
#
kylewm
edited /microformats2-parsing-issues (+205) "/* whitespace collapsing revisited */ +1 leading/trailing"
(view diff)
#
KevinMarks
I think so
#
tantek
bbiab
#
KevinMarks
maybe I was accidentally getting the crappy built-in parser
#
KevinMarks
are you running on 2.x or 3.x?
#
KevinMarks
oh you said 2.7
#
kylewm
i tried both
#
kylewm
i got fewer failures on 3.4, but the number of failures was consistent with/without unmung
#
kylewm
(i'm intrigued/concerned why 3.4 is getting ~10 fewer failing tests than 2.7, but haven't looked into it yet)
#
KevinMarks
I'll adda check fro which parser is being used, as I think we want html.parser to DIAF
KartikPrabhu and glennjones_ joined the channel
#
KevinMarks
there isn't a "UNICODE" feature in bs4
#
KevinMarks
can I tell whcih is being used?
#
kylewm
which parser?
#
kylewm
um, not that I know of
#
kylewm
they go to great pains to hide that from you
#
kylewm
KevinMarks: i don't think html.parser should have a different result unicode-wise though, does it??
#
KevinMarks
well, empirically i was seeing a different result
#
kylewm
with python 2.7.6 and html.parser I get 64 failures with and without unmung :/
#
kylewm
and parsed strings do appear to be unicode
#
kylewm
what is the different result you were getting?
#
KevinMarks
if I change unmung(s) to return s I get 193 failures
#
KevinMarks
AssertionError: value='h-card'; type=<type 'str'> etc
#
kylewm
weeeeird
#
kylewm
what python version?
tantek joined the channel
#
KevinMarks
how do I ask BeautifulSoup what it is doing parser wise?
#
KevinMarks
oh weird
netweb joined the channel
#
KevinMarks
if I invoke it wil lxml I get those errors
#
kylewm
oh huh, yeah it's possible i was using lxml ... really thought i was not though
#
kylewm
well... we can pick a parser to go with for the test cases without limiting the parsers that you can use in general use, maybe
#
KevinMarks
no, it's only if I use lxml I get the erros
#
KevinMarks
with html.parser and html5lib I don't
glennjones joined the channel
#
kylewm
that's what i mean, i thought i was getting them with html5lib, but it's possible i had lxml on the path and didn't realize it
#
kylewm
and now i can' treproduce with html5lib
#
kylewm
bbiab
#
KevinMarks
I mean I get missing unicode with lxml and not with html.parser or html5lib
#
KevinMarks
so i wonder if I have some bad lxml version
#
KevinMarks
no, lxml is doing it on purpose: "In Python 2, lxml's API returns byte strings for plain ASCII text values, be it for tag names or text in Element content."
#
tantek
in general if there are problems with using html5lib we really should work on resolving them rather than avoiding using html5lib
#
KevinMarks
no the problem is with lxml
#
kylewm
sorry i was being dense, KevinMarks
#
KevinMarks
if I replace unmung(s) with unicode(s) I get happy results
#
KevinMarks
I do get different numbers of fails with the 3 parsers though
#
kylewm
ok same results here with lxml
#
KevinMarks
58 vs 61 vs 64
#
KevinMarks
which says to me we should test with all 3 if possible
#
kylewm
sounds right, sadly
KartikPrabhu joined the channel
glennjones joined the channel
#
tantek
KevinMarks: interesting that they're looking at such structured information - that's perhaps a good chance to get them to look at adding microformats2 support
#
tantek
do we know anyone there that would be good to ask about that?
#
KevinMarks
should we ask hober?
#
KevinMarks
not sure who is on the OS side
#
tantek
hober would be a good start
#
tantek
also interesting that their example used recipe
#
tantek
and omits instructions
kez joined the channel
#
kylewm
yikes, article from 2010
csarven joined the channel
#
tantek
Twitter often resurfaces old articles
tantek joined the channel
#
KevinMarks
so html.parser fails 5 built-in tests
#
KevinMarks
and is doing a lot it's own bad whitespace run compression
#
KevinMarks
it seems to compress whitespace to a single char, but preserve an \n if there is one
#
KevinMarks
so we should prefer the other 2
#
KevinMarks
hm, lxml does somethign simialr
#
KevinMarks
u'name': [u'John\nDoe']
#
KevinMarks
as if they are indent stripping
#
KevinMarks
which makes some sense I suppose
#
KevinMarks
that gives the variation between html5lib abd lxml
#
KevinMarks
html.parser also fails test_parser.test_multiple_root_classnames