#microformats 2016-03-13

2016-03-13 UTC
emmak joined the channel
# 02:00 
@SanRafaelTech #wordpress #seo #sitemap #tags #hierarchy Markup: http://microformats.org https://wordpress.org/support/topic/sitemap-still-pending-in-google (twitter.com/_/status/708835141787869184)
KartikPrabhu joined the channel
# 06:45 
@KitaitiMakoto user_urlã®æ‰€URI Templateä½¿ã£ã¦ã‚‹ã®ã‹ãƒ¼ã€‚ã‚µãƒ³ãƒ—ãƒ«ãŒdefunktã ã‹ã‚‰å½¼ã®ä»•äº‹ãªã®ã‹çŸ¥ã‚‰ã€‚Microformatsã¨ã‹ã‚‚ã‚„ã£ã¦ãŸã—ã“ã†ã„ã†ã®å¥½ããã†ã€‚ Feeds | GitHub Developer Guide https://developer.github.com/v3/activity/feeds/ (twitter.com/_/status/708906686463913985)
# 10:30 
@reese_transeo Reviews: Author HReview Plugin #bestseotools http://graywolfseo.com/reviews/author-hreview-plugin/ (twitter.com/_/status/708963336453541888)
barnabywalters and warehouse13 joined the channel
# 18:25 
barnabywalters I’m not a big fan of http://microformats.org/wiki/microformats2-parsing-issues#exclude_style_elements_before_parsing
# 18:25 
Loqi barnabywalters: tantek left you a message on 8/18 at 9:19am: one more (hopefully minor) mf2 parser issue with proposal, implied date for dt properties: http://microformats.org/wiki/microformats2-parsing-issues#implied_date_for_dt_properties_both_mf2_and_backcompat please comment!
# 18:25 
Loqi barnabywalters: tantek left you a message on 9/18 at 4:26pm: all resolved issues with implementation(s) incorporated into the microformats2 parsing spec - take a look, see if you have any qs: http://microformats.org/wiki/index.php?title=microformats2-parsing&diff=65229&oldid=65090&rcid=101616
# 18:25 
Loqi barnabywalters: tantek left you a message on 11/24 at 5:39pm: please see new microformats2 parsing issue and propose resolution - we have a decent consensus, lacking objections I'm going to edit the spec accordingly in the next few days http://microformats.org/wiki/microformats2-parsing-issues#uf2_children_on_backcompat_properties
# 18:25 
barnabywalters excluding the content of these elements from the plaintext value makes perfect sense
# 18:26 
barnabywalters but for me, removing it from the HTML value shouldn’t be something the generic parser does
# 18:27 
barnabywalters that’s a job for a sanitization stage, which is going to be different depending on the use case of the consumer
# 18:28 
barnabywalters e.g. the <style> element could be used to provide per-post custom styling (maybe using the scoped attribute)
# 18:29 
barnabywalters and <script> could be used for publishing interactive HTML documents
# 18:29 
barnabywalters (I think I actually did this once, will try to find the example)
# 18:30 
barnabywalters as aaronpk says, people who don’t want these effects, or the potential attacks which could results from including <style> and <script> elements need to remove them anyway
# 18:30 
barnabywalters so why make the parser do that, and prevent people who do want to get the original HTML content from being able to do so?
# 18:32 
aaronpk hm, maybe it does make sense to keep them for e-* properties, but remove them from all plaintext values
# 18:34 
barnabywalters that would be my suggestion
# 18:34 
barnabywalters keep plaintext values as slim and as user-focused as possible, as the big reason they’re there is as an easy alternative to processing the HTML content
# 18:35 
barnabywalters and then leave the HTML content as it is
# 18:35 
barnabywalters of course there are always going to be limits
# 18:35 
barnabywalters e.g. do we parse all CSS in <style> elements and resolve relative URLs, as we now do in srcset attributes (thanks to voxpelli)
# 18:35 
aaronpk heh
# 18:35 
aaronpk seems reasonable
# 18:36 
barnabywalters but generally I’d rather leave the option open. It’s easy enough to remove these elements if they’re unwanted
# 18:36 
barnabywalters and as you pointed out, for reposting, sanitisation is obligatory anyway
# 18:36 
barnabywalters so we’re not even saving anyone any work by removing style and script at the parsing stage
# 18:37 
barnabywalters there are hundreds of other attack vectors people need sanitisation stages to protect against
# 18:37 
aaronpk not sure if you saw, but I did a bunch of work on this which ended up at https://xray.p3k.io http://github.com/aaronpk/XRay
# 18:37 
aaronpk it does the sanitization of the HTML values and allows a limited subset of tags
# 18:37 
barnabywalters aha, is this jf2? need to familiarise myself with that
# 18:38 
aaronpk it isn't officially, but is my experiment with that
# 18:38 
aaronpk essentially a subset of mf2 which is vocabulary-aware, designed to be easier to consume when you know what it is you're consuming
# 18:39 
barnabywalters cool, a declarative approach to what php-mf2-cleaner was trying to solve
# 18:39 
aaronpk yeah you could say that
# 18:40 
barnabywalters and in much the same way is presumably a sanitisation stage, applied to the results of parsing using a generic parser?
# 18:40 
aaronpk yeah after the mf2 parsing, xray goes into any HTML values and does sanitization
# 18:41 
barnabywalters I’d like to get tantek’s input on my objection, as well as anyone actively working on mf2 parsers. I’ll try to express it more concisely and add it to the wiki page
# 18:42 
aaronpk yeah, worth noting that on the wiki vote section
# 18:43 
aaronpk oh and XRay does the whole "is the name a prefix of the content" thing to un-imply the p-name property
# 18:45 
aaronpk one particularly clever thing XRay does when sanitizing HTML is it removes all class names from HTML attributes except mf2 classes :D
# 18:47 
barnabywalters edited /microformats2-parsing-issues (+590) "/* exclude style elements before parsing */ added agreement to removing style and script contents from plaintext properties, objection to removing them from HTML properties" (view diff)
# 18:48 
barnabywalters sounds like xray is doing a very similar job to the mf2 post-processing code I wrote for shrewdness
# 18:49 
aaronpk probably
# 18:49 
aaronpk i'm currently using it for webmention.io, and will soon be using it for my reader
# 18:49 
barnabywalters that produced a more or less flat dictionary of reliable properties based on extensive normalisation and post-processing of raw mf2 structures
# 18:49 
barnabywalters https://github.com/barnabywalters/shrewdness/blob/master/src/app.php#L90
# 18:50 
barnabywalters using your comments library as a basis :)
# 18:50 
aaronpk yeah. I wanted to do that in a self-contained environment so that I could write thorough unit tests for it
# 18:50 
aaronpk also wanted to be able to use it with webmention.io which is Ruby, so I needed it as an API
# 18:51 
aaronpk cause I really didn't want to re-implement it in Ruby too :)
# 18:51 
barnabywalters shrewdness does a bunch of things like fetching other URLs and seeing what’s there, so I started out writing it as part of shrewdness
# 18:51 
barnabywalters the intention was always to abstract it out as a separate library for mf2 postprocessing when it was “finished”
# 18:51 
barnabywalters rather than work on two things at once
# 18:51 
aaronpk always a tough call :)
# 18:51 
barnabywalters yeah
# 18:52 
barnabywalters with taproot I found I was bogged down by having so many bits of functionality abstracted out as libraries
# 18:52 
barnabywalters especially as nobody else even uses the libraries ;)
# 18:52 
barnabywalters so with shrewdness I wanted to avoid that from the beginning as much as possible
# 18:52 
aaronpk interesting
# 18:53 
barnabywalters now I realise I could have just written the extra libraries hard-coded into taproot for quick editing, and copied them into standalone libraries separately
# 18:54 
barnabywalters (most of the things I’m talking about are pretty self contained, just one file and one test file)
# 18:54 
barnabywalters so I might do that at some point
# 18:54 
barnabywalters but even with this ease-of-development approach for shrewdness I ended up losing motivation to work on it a lot… until now ;)
# 18:55 
aaronpk oh? what's new?
# 18:56 
barnabywalters nothing “new” as such, I just finally got some motivation to work on personal programming projects again
# 18:56 
aaronpk awesome
# 19:01 
aaronpk whoa, I forgot I made that PR to remove the script and style tags
# 19:03 
barnabywalters heh yeah it was probably some time ago
# 19:03 
barnabywalters thanks for helping look after php-mf2 during my random absence, btw
# 19:03 
aaronpk I got way more involved with it than I intended to :)
# 19:04 
aaronpk but it was out of necessity, since I wanted things for XRay
tantek, barnabywalters and KartikPrabhu joined the channel
# 20:20 
barnabywalters tantek: did you see my comments on http://microformats.org/wiki/microformats2-parsing-issues#exclude_style_elements_before_parsing? would be interested to hear your input
# 20:20 
tantek looks
# 20:21 
tantek barnabywalters: "publishing posts with embedded per-post styling" is done via a property, not by an embedded <style>
# 20:21 
barnabywalters tantek: cool, I missed out on that in my abscence, how’s it done?
# 20:22 
tantek giving you some discoveryable links/phrases in #Indiewebcamp now :)
# 20:22 
barnabywalters looking
# 20:22 
barnabywalters nice
# 20:23 
tantek "publishing interactive HTML documents with embedded javascript" is interesting, yet "document" is a bit of a misnomer there
# 20:23 
tantek as soon as it's non-trivially interactive it's more of an application than a document
# 20:23 
tantek and *that* is a diffferent enough use-case from a post to be worth documenting
# 20:23 
tantek (separately)
# 20:23 
tantek once people start publishing them
# 20:23 
tantek although...
# 20:24 
barnabywalters I think I did this at least once, I can’t find an example quickly
# 20:24 
aaronpk i published an article which had an inline interactive form
# 20:24 
aaronpk used javascript for some things
# 20:25 
barnabywalters but without evidence that for the generic parser to not remove style and script elements from the raw HTML content is a major burden for microformats consumers, I’m against making changes which arbitrarily remove information
# 20:25 
tantek it's not arbitrary
# 20:25 
barnabywalters in the plaintext properties it’s not arbitrary, because this content is causing legitimate problems
# 20:25 
tantek the burden of evidence is always on use-cases, to keep things, not "not to remove"
# 20:26 
tantek https://indiewebcamp.com/custom_post_script solves the "publishing interactive HTML documents with embedded javascript" use-cases we've found so far
# 20:26 
barnabywalters in html content it’s removed anyway by sanitization stages, and I see no reason for the parser to do sanitization work
# 20:26 
barnabywalters it will inevitably do an incomplete job of it
# 20:27 
barnabywalters I see the parser as a generic HTML consumption tool rather than something focused on specific use cases
# 20:27 
barnabywalters and the exact degree of sanitization depends on the use case
# 20:27 
tantek "sanitization" isn't the purpose here - thus it's a strawman to say "do an incomplete job of it"
# 20:28 
tantek the purpose is to remove documented cases of noise
# 20:28 
tantek which we are doing, one at a time, as we document them
# 20:28 
barnabywalters tantek: then what is the purpose, specifically of removing this content from the raw HTML values
# 20:28 
barnabywalters the existing cases point to noise in the plaintext properties
# 20:28 
barnabywalters and I completely agree that contents of script and style elements should be removed from those
# 20:29 
tantek ok that is a smaller change that still satisfies the documented use-cases
# 20:29 
barnabywalters where are the documented examples of <script> and <style> contents causing problems in raw HTML content structures
# 20:29 
barnabywalters yes, hence I am more in favour of it. Satisfy the documented use cases whilst removing the minimum amount of information
# 20:31 
tantek seems reasonable. if we get any problems later with those elements in e-* properties, we can consider that separately
# 20:33 
barnabywalters yep, agreed
# 20:33 
tantek however, I think we should still remove them from *all* the other property prefixes (u-* dt-*)
# 20:33 
barnabywalters yep, absolutely, everything which is plaintext
# 20:33 
barnabywalters including the value property in html/value structs (do we have a good name for those?)
# 20:34 
barnabywalters everything plaintext is safe, the html property is where people go if they want a hard time ;)
# 20:36 
barnabywalters aaronpk: shall I go ahead and adapt your php-mf2 PR to remove <script> and <style> from everything apart from html properties, and merge? Then I can publish a new release this evening
# 20:37 
tantek edited /microformats2-parsing-issues (+444) "/* exclude style elements before parsing */ Proposal 2, e-* properties HTML values preserve all markup, others drop style and script elements and their content" (view diff)
# 20:38 
tantek barnabywalters, aaronpk, kylewm added Proposal 2 with barnabywalters suggested refinement. Please check and +1/0/-1 accordingly: http://microformats.org/wiki/microformats2-parsing-issues#exclude_style_elements_before_parsing
# 20:39 
barnabywalters cool, I’ll respond there and wait for consensus from other implementers before implementing in php-mf2
# 20:39 
barnabywalters edited /microformats2-parsing-issues (+44) "/* exclude style elements before parsing */" (view diff)
# 20:39 
tantek I think if aaronpk is good with it you can proceed. I don't expect kylewm to object to a smaller change as a first step
# 20:40 
tantek barnabywalters: and thank you for going through the issues! we have a bunch more that need implementer feedback
# 20:42 
barnabywalters edited /microformats2-parsing-issues (+170) "/* ignore u-camelCase properties */" (view diff)
# 20:42 
barnabywalters tantek: np, looking through and responding accordingly now
# 20:44 
barnabywalters edited /microformats2-parsing-issues (+100) "/* use poster if no src on video for u props */" (view diff)
barnabywalters and tantek joined the channel
# 20:56 
barnabywalters edited /microformats2-parsing-issues (+104) "/* uf2 children on backcompat properties */" (view diff)
# 20:58 
barnabywalters edited /microformats2-parsing-issues (+262) "/* default generated HTML */" (view diff)
barnabywalters joined the channel
# 21:16 
aaronpk edited /microformats2-parsing-issues (+125) "/* exclude style elements before parsing */" (view diff)
barnabywalters joined the channel
# 21:24 
aaronpk barnabywalters: sorry, was afk. thanks and yes feel free to update my PR
# 21:25 
barnabywalters me too, cool, will do when kylewm chimes in on the issue
# 21:29 
tantek also, thanks to a lot of hard work by glennjones, his microformatshiv parser has landed in Firefox and is now in the Firefox Developer Release if you want to try it out
# 21:30 
aaronpk whoa!
# 21:30 
tantek https://www.mozilla.org/en-US/firefox/developer/
# 21:30 
tantek (as of the release made this Tuesday)
# 21:30 
tantek IndieWebCamp participant Shane Caraveo (@mixedpuppy) helped work the parser into the code and handle all the landing / source control / test case details
# 21:31 
aaronpk that's amazing news!
# 21:31 
tantek and Operator add on creator Mike Kaply is submitting patches to improve microformatshiv as well
# 21:31 
tantek aaronpk: indeed!
# 21:31 
tantek I'm working on ways to demonstrate this functionality at a user level
# 21:31 
tantek so I'm not making any big announcements until we figure that out
# 21:32 
tantek e.g. - so how I can tell that there's an mf2 parser in FF?
# 21:32 
tantek s/I can/can I
# 21:32 
Loqi tantek meant to say: e.g. - so how can I tell that there's an mf2 parser in FF?
# 21:33 
tantek aaronpk: anyway, there's also tons more awesome DevTools improvements in this version of FF Dev Edition, so you might find it interesting to check out for webdev in general
barnabywalters joined the channel
# 22:36 
barnabywalters edited /microformats2-parsing-issues (+1) "/* exclude style elements before parsing */" (view diff)
tantek joined the channel
# 23:40 
kylewm edited /microformats2-parsing-issues (+301) "/* exclude style elements before parsing */ +1 barnabywalters narrowed proposal for <script> and <style> tags" (view diff)
# 23:40 
kylewm edited /microformats2-parsing-issues (-4) "/* exclude style elements before parsing */ grammar" (view diff)