#microformats 2015-06-08

2015-06-08 UTC
tantek, KevinMarks__, KevinMarks_, fuzzyhorns, KartikPrabhu, danielfilho, KevinMarks and krijnhoetmer joined the channel
# 03:53 
@XJINE æ›¸ã„ãŸã€‚å„ãƒ•ã‚©ãƒ¼ãƒžãƒƒãƒˆã¨æ¯”è¼ƒã§æ›¸ã“ã†ã¨æ€ã£ãŸã‘ã©ã€é•·ã™ãŽãŸã€‚>ã€Œmicroformats ã«ã¤ã„ã¦ : åˆ©ç‚¹ã¨æ¬ ç‚¹ã€ http://neareal.com/1619/ #HTML #SEO (twitter.com/_/status/607757390851678208)
Soopaman, fuzzyhorns, gRegorLove, Left_Turn, KevinMarks__, ChiefRA, kez, KevinMarks_, eschnou, glennjones, KartikPrabhu, chiui, adactio and csarven joined the channel
# 09:39 
@SplashCopy Are microformats the best #SEO tool that you aren't using? https://www.splashcopywriters.co.uk/blog/microformats-are-they-the-most-underrated-seo-weapon-of-all-time.html (twitter.com/_/status/607844340384321536)
# 10:13 
@ProjectPeachUK We've #played with #microformats. Love the #idea of #marking up our #business data to #machines as well as #humans! #biznoticeUK #SMO (twitter.com/_/status/607852962426327040)
Left_Turn and KevinMarks_ joined the channel
# 10:56 
@Bobby6740 RT @ProjectPeachUK: We've #played with #microformats. Love the #idea of #marking up our #business data to #machines as well as #humans! #bi… (twitter.com/_/status/607863742806802432)
KartikPrabhu and benborges joined the channel
# 13:28 
@ProjectPeachUK We've #played with #microformats. Love the #idea of #marking up our #business data to #machines as well as #humans! #JS #fpsbs #think (twitter.com/_/status/607902036139614209)
# 13:29 
@InfusedReTweets RT @ProjectPeachUK: We've #played with #microformats. Love the #idea of #marking up our #business data to #machines as well as #humans! #JS… (twitter.com/_/status/607902357326823425)
fuzzy_horns and KartikPrabhu joined the channel
# 14:03 
@GraphicInfusion RT @ProjectPeachUK: We've #played with #microformats. Love the #idea of #marking up our #business data to #machines as well as #humans! #JS… (twitter.com/_/status/607910910448037888)
TallTed, KevinMarks__, KartikPrabhu, KevinMarks_ and KevinMarks___ joined the channel
# 15:22 
@JRs_partsonline RT @ProjectPeachUK: We've #played with #microformats. Love the #idea of #marking up our #business data to #machines as well as #humans! #JS… (twitter.com/_/status/607930743084490752)
glennjones and gRegorLove joined the channel
# 16:35 
tommorris funny stuff. https://twitter.com/amazingmap/status/599931666803597312
# 16:35 
Loqi tommorris: tantek left you a message on 6/1 at 5:16pm: your comments would be welcome here too: http://microformats.org/wiki/microformats2-parsing-issues#drop_alternates_collection_and_include_them_in_rels
# 16:35 
@amazingmap Amazingly comprehensive  map of every  country in the world that uses the MMDDYYYY format https://twitter.com/amazingmap/status/599931666803597312/photo/1 (twitter.com/_/status/599931666803597312)
# 16:36 
aaronpk hahaha
KevinMarks_, KevinMarks__, tantek, KevinMarks, KartikPrabhu, eschnou and GWG joined the channel
# 19:19 
kylewm do y'all know why h-cite recommend p-content instead of e-content on http://microformats.org/wiki/h-cite#Properties? I would like to parse out reposts with markup and images from acegiak.net
Zegnat joined the channel
# 19:24 
gRegorLove Perhaps because the simplest case is to cite just the text, sans markup. The example given there is "short text notes"
# 19:24 
tantek I suppose the question there is whether a repost should be using h-cite at all
# 19:24 
tantek since a repost is intended as  *post* of the original content
# 19:30 
tantek kylewm: that is, asking for e-content is solving the wrong problem IMO - it may be expedient, but it's not actually helping the markup be correct
# 19:31 
kylewm I've gone through a couple iterations of my own markup with reposts actually
# 19:32 
kylewm right now i have an u-repost-of h-cite INSIDE the e-content of the post itself
# 19:32 
kylewm that way you can include the permalink of the original post as well as the url of your repost
# 19:32 
kylewm e.g. https://kylewm.com/2015/06/repost-of-russell-keith-magee-cooking-gaming-humor
# 19:34 
tantek that actually makes more sense
# 19:34 
tantek than e-content on h-cite
# 19:35 
@ronaldwidha So how many format does a publisher need to support now, Apple News, HTML, RSS, Newsstand EPUB, Facebook Instant Articles, microformats (twitter.com/_/status/607994487579754496)
# 19:37 
kylewm (I have both right now... so like h-entry > e-content > u-repost-of h-cite > e-content
# 19:37 
kylewm )
chiui joined the channel
# 19:41 
tantek that's e-content *inside* h-cite, not e-content *on* h-cite
KevinMarks joined the channel
# 20:02 
glennjones KevinMarks: Could you have a quick look at the 4 issues aginst your test changes https://github.com/microformats/tests/issues - they are all to do with when to trim whitespaces. I am happy to make corrections just want known if you think I am right or not
# 20:02 
Loqi glennjones: KevinMarks left you a message 6 days ago: said text was OK as a single string, but kyle's issue of pointing to the same URL twice may suggest we need an array. Are there in the wild exampels fo linking to the same URL wiht multiple types? I think that is contrived, but linking to the same URL with multiple texts is quite plausible
# 20:06 
KevinMarks reading now - I thought trimming whitespace was expected in general, but I may have  over-interpreted
# 20:09 
glennjones As far as I can see its only mention one on the parsing page in the “implied name rules” ie “drop leading & trailing white-space from…”
# 20:09 
KevinMarks I think we were going witht he tenor of http://microformats.org/wiki/microformats2-parsing-issues#whitespace_collapsing_revisited rather than spec update
# 20:10 
KevinMarks IMO, strip leading/traing universally is always good, and collapse interior whitespace in implied name makes sense
# 20:10 
KevinMarks but need more votes
# 20:10 
tantek hmm - KevinMarks that issue doesn't make much sense
# 20:10 
tantek examples are both incorrect and theoretical
# 20:11 
tantek so it is "premature to say we are going with the tenor of"
# 20:12 
KevinMarks "trim leading/trailing" is consensus so far
# 20:12 
tantek sorry - example is not broken, just confused where the p-name was
# 20:13 
tantek KevinMarks: and white-space collapsing in general for implied names
# 20:14 
KevinMarks I cna switch to these examples if you prefer
# 20:14 
KevinMarks http://microformats.org/wiki/hReview-aggregate
# 20:15 
tantek would be preferable since I don't even know of any h-review-aggregate posts yet
# 20:16 
glennjones I have add it as an option to my parser just in case we move to all property text at some stage, but at moment I am trying to get the test to match current rules
# 20:18 
glennjones The test are a real mixture of old real world examples, a few taken from wiki examples and a hand full made up to help parser authors
# 20:20 
glennjones Happy to see them replaced over time with indieweb patterns of use, but I need to get the current set in working order first
KartikPrabhu joined the channel
# 20:26 
tantek glennjones - get the current set in working order first makes sense
# 20:30 
glennjones KevinMarks: you OK with me updating your changes? I known you must of put a lot of time into the pull request.
# 20:31 
KevinMarks Is there any pushback from anyone on leading/trailing stripping for p- properties?
# 20:31 
KevinMarks 'cos I'd rather make that 1 line change to the parsing spec at least
# 20:32 
tantek did you implement that?
# 20:32 
tantek and what about implied-name whitespace normalizing?
# 20:32 
KevinMarks yes, implemented in mf2py
# 20:33 
KevinMarks glenn has implied whitespace behind a flag I think?
# 20:33 
KevinMarks mf2py doesn't have implied whitespace normalising yet
# 20:33 
tantek glenn what is your opinion of trim leading/trailing whitespace on p-* properties?
# 20:34 
tantek KevinMarks: right now we don't really have consensus - we have lack of opinions
# 20:34 
glennjones Yes I have it not just p-* but all properties - but yes behind a falg switch off by default
# 20:34 
tantek the only people agreeing explicitly are you (proposer of issue/solution), and me (spec editor)
# 20:34 
KevinMarks glenn didn't vote, but agreed in prose
# 20:35 
tantek would prefer to have at least one more parser dev opinion
# 20:35 
tantek interesting re: for all properties
# 20:35 
KevinMarks hm
# 20:35 
tantek glennjones, could you explain why "all properties" is better than just for p-* ? (honestly curious)
# 20:35 
KevinMarks I think I wanted it for dt too
# 20:35 
tantek or is it easier to implement? or?
# 20:36 
tantek KevinMarks: proposal in the wiki that you and I +1 is " keep as is but have mf2 parser trim leading/trailing whitespace" which does not limit to p-* properties
# 20:38 
kylewm is there ever any such thing like "first name: <span class="p-name">Kyle </span> last name: <span class="p-name">Mahan</span>"
# 20:38 
tantek kylewm: hopefully not - for many reasons
# 20:38 
kylewm OK
# 20:39 
tantek e.g. should be more like "given name: <span class="p-give-name">Kyle </span> family name: <span class="p-family-name">Mahan</span>"
# 20:39 
kylewm I can +1 stripping leading/trailing whitespace for all properties, but I agree with Kevin's comment that ot dpesm
# 20:39 
kylewm bah
# 20:39 
kylewm that it doesn't help with our test case failures
# 20:39 
KevinMarks http://www.unmung.com/?html=+%3Cspan+class%3D%22h-test%22%3E%3Cspan+class%3D%22p-name%22%3EKyle+%3C%2Fspan%3E+last+name%3A+%3Cspan+class%3D%22p-name%22%3EMahan%3C%2Fspan%3E%3C%2Fspan%3E&pretty=on
# 20:39 
kylewm all properties except maybe e-content
# 20:39 
kylewm e-*
# 20:39 
tantek well if we can at least get that resolved we can move forward with spec edit and implementation updates
# 20:39 
tantek why not e-* content as well?
# 20:40 
glennjones I think I did across all properties just to safe guard against me adding leading/trailing single spaces by mistake. Not sure I work out all the possible impacts, but it was how my parser has work for last two years with any noticalbe problem
# 20:40 
tantek all the same arguments apply - in terms of having to put markup on separate lines from the thing being marked up
# 20:40 
KevinMarks right, the <pre> case in e-content would be inside it
# 20:40 
kylewm <pre> -- that's what i meant rather than e-content
# 20:40 
kylewm thanks KevinMarks
# 20:41 
KevinMarks if it's in <pre> it shouldn't be parsed as html
KartikPrabhu joined the channel
# 20:42 
KevinMarks right?
# 20:42 
glennjones Would you trim the value part of e-* output?
# 20:42 
KevinMarks I meant if you had <div class="e-content">\n<pre>\n\nhello\n</pre></div> you are ok losing the \n before the <pre>
# 20:44 
tantek KevinMarks: more likely that that \n is extra from the publishing system that you have no control over
# 20:44 
KevinMarks yes exactly
elux joined the channel
# 20:44 
tantek thus you want to get rid of it
# 20:44 
tantek since it's not from the author
# 20:44 
tantek if you really must include an extra return like that, you can
# 20:45 
tantek so I'm leaning towards glennjones position of all properties
# 20:45 
kylewm edited /microformats2-parsing-issues (+205) "/* whitespace collapsing revisited */ +1 leading/trailing" (view diff)
# 20:45 
KevinMarks I think so
# 20:45 
tantek bbiab
# 20:46 
kylewm KevinMarks: and did you see my last comment on https://github.com/tommorris/mf2py/pull/46#issuecomment-109860488
# 20:47 
KevinMarks maybe I was accidentally getting the crappy built-in parser
# 20:48 
KevinMarks are you running on 2.x or 3.x?
# 20:48 
KevinMarks oh you said 2.7
# 20:48 
kylewm i tried both
# 20:48 
kylewm i got fewer failures on 3.4, but the number of failures was consistent with/without unmung
# 20:49 
kylewm (i'm intrigued/concerned why 3.4 is getting ~10 fewer failing tests than 2.7, but haven't looked into it yet)
# 20:49 
KevinMarks I'll adda check fro which parser is being used, as I think we want html.parser to DIAF
KartikPrabhu and glennjones_ joined the channel
# 20:54 
KevinMarks there isn't a "UNICODE" feature in bs4
# 20:54 
KevinMarks can I tell whcih is being used?
# 20:59 
kylewm which parser?
# 20:59 
kylewm um, not that I know of
# 20:59 
kylewm they go to great pains to hide that from you
# 21:04 
kylewm KevinMarks: i don't think html.parser should have a different result unicode-wise though, does it??
# 21:07 
KevinMarks well, empirically i was seeing a different result
# 21:11 
kylewm with python 2.7.6 and html.parser I get 64 failures with and without unmung :/
# 21:11 
kylewm and parsed strings do appear to be unicode
# 21:12 
kylewm what is the different result you were getting?
# 21:12 
KevinMarks if I change unmung(s) to return s I get 193 failures
# 21:13 
KevinMarks AssertionError: value='h-card'; type=<type 'str'> etc
# 21:13 
kylewm weeeeird
# 21:14 
kylewm what python version?
tantek joined the channel
# 21:16 
KevinMarks 2.7.6
# 21:16 
KevinMarks how do I ask BeautifulSoup what it is doing parser wise?
# 21:20 
KevinMarks oh weird
netweb joined the channel
# 21:20 
KevinMarks if I invoke it wil lxml I get those errors
# 21:22 
kylewm oh huh, yeah it's possible i was using lxml ... really thought i was not though
# 21:22 
kylewm well... we can pick a parser to go with for the test cases without limiting the parsers that you can use in general use, maybe
# 21:22 
KevinMarks no, it's only if I use lxml I get the erros
# 21:23 
KevinMarks with html.parser and html5lib I don't
glennjones joined the channel
# 21:23 
kylewm that's what i mean, i thought i was getting them with html5lib, but it's possible i had lxml on the path and didn't realize it
# 21:23 
kylewm and now i can' treproduce with html5lib
# 21:24 
kylewm bbiab
# 21:25 
KevinMarks I mean I get missing unicode with lxml and not with html.parser or html5lib
# 21:25 
KevinMarks so i wonder if I have some bad lxml version
# 21:29 
KevinMarks no, lxml is doing it on purpose: "In Python 2, lxml's API returns byte strings for plain ASCII text values, be it for tag names or text in Element content."
# 21:32 
tantek in general if there are problems with using html5lib we really should work on resolving them rather than avoiding using html5lib
# 21:32 
KevinMarks no the problem is with lxml
# 21:33 
kylewm sorry i was being dense, KevinMarks
# 21:33 
KevinMarks if I replace unmung(s) with unicode(s) I get happy results
# 21:34 
KevinMarks I do get different numbers of fails with the 3 parsers though
# 21:34 
kylewm ok same results here with lxml
# 21:34 
KevinMarks 58 vs 61 vs 64
# 21:35 
KevinMarks which says to me we should test with all 3 if possible
# 21:42 
kylewm sounds right, sadly
KartikPrabhu joined the channel
# 21:44 
KevinMarks apple adopts schema and ogp https://developer.apple.com/library/prerelease/ios/releasenotes/General/WhatsNewIniOS/Articles/iOS9.html#//apple_ref/doc/uid/TP40016198-DontLinkElementID_2
glennjones joined the channel
# 21:48 
tantek KevinMarks: interesting that they're looking at such structured information - that's perhaps a good chance to get them to look at adding microformats2 support
# 21:48 
tantek do we know anyone there that would be good to ask about that?
# 21:48 
KevinMarks should we ask hober?
# 21:49 
KevinMarks not sure who is on the OS side
# 21:49 
tantek hober would be a good start
# 21:50 
tantek also interesting that their example used recipe
# 21:54 
tantek and omits instructions
kez joined the channel
# 22:23 
@technotipz Ultimate Guide to Microformats: Reference and Examples http://www.technotipz.com/tips-and-tricks/ultimate-guide-to-microformats-reference-and-examples/?utm_source=ReviveOldPost&utm_medium=social&utm_campaign=ReviveOldPost  #tips (twitter.com/_/status/608036625063550976)
# 22:25 
KevinMarks modifies to http://www.technotipz.com/tips-and-tricks/ultimate-guide-to-microformats-reference-and-examples/?utm_source=indieweb&utm_medium=irc&utm_campaign=ReviveOldPost
# 22:40 
kylewm yikes, article from 2010
csarven joined the channel
# 23:08 
tantek Twitter often resurfaces old articles
tantek joined the channel
# 23:50 
KevinMarks so html.parser fails 5 built-in tests
# 23:50 
KevinMarks and is doing a lot it's own bad whitespace run compression
# 23:52 
KevinMarks it seems to compress whitespace to a single char, but preserve an \n if there is one
# 23:54 
KevinMarks so we should prefer the other 2
# 23:54 
KevinMarks hm, lxml does somethign simialr
# 23:57 
KevinMarks both of them turn https://github.com/microformats/tests/blob/master/tests/microformats-v1/hcard/format.html inot
# 23:57 
KevinMarks u'name': [u'John\nDoe']
# 23:57 
KevinMarks as if they are indent stripping
# 23:58 
KevinMarks which makes some sense I suppose
# 23:58 
KevinMarks that gives the variation between html5lib abd lxml
# 23:59 
KevinMarks html.parser also fails test_parser.test_multiple_root_classnames