2016-03-13 UTC
emmak joined the channel
KartikPrabhu joined the channel
barnabywalters and warehouse13 joined the channel
# 18:25 barnabywalters excluding the content of these elements from the plaintext value makes perfect sense
# 18:26 barnabywalters but for me, removing it from the HTML value shouldn’t be something the generic parser does
# 18:27 barnabywalters that’s a job for a sanitization stage, which is going to be different depending on the use case of the consumer
# 18:28 barnabywalters e.g. the <style> element could be used to provide per-post custom styling (maybe using the scoped attribute)
# 18:30 barnabywalters as aaronpk says, people who don’t want these effects, or the potential attacks which could results from including <style> and <script> elements need to remove them anyway
# 18:30 barnabywalters so why make the parser do that, and prevent people who do want to get the original HTML content from being able to do so?
# 18:32 aaronpk hm, maybe it does make sense to keep them for e-* properties, but remove them from all plaintext values
# 18:34 barnabywalters keep plaintext values as slim and as user-focused as possible, as the big reason they’re there is as an easy alternative to processing the HTML content
# 18:35 barnabywalters e.g. do we parse all CSS in <style> elements and resolve relative URLs, as we now do in srcset attributes (thanks to voxpelli)
# 18:36 barnabywalters but generally I’d rather leave the option open. It’s easy enough to remove these elements if they’re unwanted
# 18:36 barnabywalters so we’re not even saving anyone any work by removing style and script at the parsing stage
# 18:37 barnabywalters there are hundreds of other attack vectors people need sanitisation stages to protect against
# 18:37 aaronpk it does the sanitization of the HTML values and allows a limited subset of tags
# 18:38 aaronpk it isn't officially, but is my experiment with that
# 18:38 aaronpk essentially a subset of mf2 which is vocabulary-aware, designed to be easier to consume when you know what it is you're consuming
# 18:40 barnabywalters and in much the same way is presumably a sanitisation stage, applied to the results of parsing using a generic parser?
# 18:40 aaronpk yeah after the mf2 parsing, xray goes into any HTML values and does sanitization
# 18:41 barnabywalters I’d like to get tantek’s input on my objection, as well as anyone actively working on mf2 parsers. I’ll try to express it more concisely and add it to the wiki page
# 18:43 aaronpk oh and XRay does the whole "is the name a prefix of the content" thing to un-imply the p-name property
# 18:45 aaronpk one particularly clever thing XRay does when sanitizing HTML is it removes all class names from HTML attributes except mf2 classes :D
# 18:48 barnabywalters sounds like xray is doing a very similar job to the mf2 post-processing code I wrote for shrewdness
# 18:49 aaronpk i'm currently using it for webmention.io, and will soon be using it for my reader
# 18:49 barnabywalters that produced a more or less flat dictionary of reliable properties based on extensive normalisation and post-processing of raw mf2 structures
# 18:50 aaronpk yeah. I wanted to do that in a self-contained environment so that I could write thorough unit tests for it
# 18:50 aaronpk also wanted to be able to use it with webmention.io which is Ruby, so I needed it as an API
# 18:51 aaronpk cause I really didn't want to re-implement it in Ruby too :)
# 18:51 barnabywalters shrewdness does a bunch of things like fetching other URLs and seeing what’s there, so I started out writing it as part of shrewdness
# 18:51 barnabywalters the intention was always to abstract it out as a separate library for mf2 postprocessing when it was “finished”
# 18:52 barnabywalters with taproot I found I was bogged down by having so many bits of functionality abstracted out as libraries
# 18:52 barnabywalters so with shrewdness I wanted to avoid that from the beginning as much as possible
# 18:53 barnabywalters now I realise I could have just written the extra libraries hard-coded into taproot for quick editing, and copied them into standalone libraries separately
# 18:54 barnabywalters (most of the things I’m talking about are pretty self contained, just one file and one test file)
# 18:54 barnabywalters but even with this ease-of-development approach for shrewdness I ended up losing motivation to work on it a lot… until now ;)
# 18:56 barnabywalters nothing “new” as such, I just finally got some motivation to work on personal programming projects again
# 19:01 aaronpk whoa, I forgot I made that PR to remove the script and style tags
# 19:03 aaronpk I got way more involved with it than I intended to :)
# 19:04 aaronpk but it was out of necessity, since I wanted things for XRay
tantek, barnabywalters and KartikPrabhu joined the channel
# 20:21 tantek barnabywalters: "publishing posts with embedded per-post styling" is done via a property, not by an embedded <style>
# 20:22 tantek giving you some discoveryable links/phrases in #Indiewebcamp now :)
# 20:23 tantek "publishing interactive HTML documents with embedded javascript" is interesting, yet "document" is a bit of a misnomer there
# 20:23 tantek as soon as it's non-trivially interactive it's more of an application than a document
# 20:23 tantek and *that* is a diffferent enough use-case from a post to be worth documenting
# 20:24 aaronpk i published an article which had an inline interactive form
# 20:25 barnabywalters but without evidence that for the generic parser to not remove style and script elements from the raw HTML content is a major burden for microformats consumers, I’m against making changes which arbitrarily remove information
# 20:25 barnabywalters in the plaintext properties it’s not arbitrary, because this content is causing legitimate problems
# 20:25 tantek the burden of evidence is always on use-cases, to keep things, not "not to remove"
# 20:26 barnabywalters in html content it’s removed anyway by sanitization stages, and I see no reason for the parser to do sanitization work
# 20:27 barnabywalters I see the parser as a generic HTML consumption tool rather than something focused on specific use cases
# 20:27 tantek "sanitization" isn't the purpose here - thus it's a strawman to say "do an incomplete job of it"
# 20:28 tantek the purpose is to remove documented cases of noise
# 20:28 tantek which we are doing, one at a time, as we document them
# 20:28 barnabywalters tantek: then what is the purpose, specifically of removing this content from the raw HTML values
# 20:28 barnabywalters and I completely agree that contents of script and style elements should be removed from those
# 20:29 tantek ok that is a smaller change that still satisfies the documented use-cases
# 20:29 barnabywalters where are the documented examples of <script> and <style> contents causing problems in raw HTML content structures
# 20:29 barnabywalters yes, hence I am more in favour of it. Satisfy the documented use cases whilst removing the minimum amount of information
# 20:31 tantek seems reasonable. if we get any problems later with those elements in e-* properties, we can consider that separately
# 20:33 tantek however, I think we should still remove them from *all* the other property prefixes (u-* dt-*)
# 20:33 barnabywalters including the value property in html/value structs (do we have a good name for those?)
# 20:34 barnabywalters everything plaintext is safe, the html property is where people go if they want a hard time ;)
# 20:36 barnabywalters aaronpk: shall I go ahead and adapt your php-mf2 PR to remove <script> and <style> from everything apart from html properties, and merge? Then I can publish a new release this evening
# 20:39 barnabywalters cool, I’ll respond there and wait for consensus from other implementers before implementing in php-mf2
# 20:39 tantek I think if aaronpk is good with it you can proceed. I don't expect kylewm to object to a smaller change as a first step
# 20:40 tantek barnabywalters: and thank you for going through the issues! we have a bunch more that need implementer feedback
barnabywalters and tantek joined the channel
barnabywalters joined the channel
barnabywalters joined the channel
# 21:24 aaronpk barnabywalters: sorry, was afk. thanks and yes feel free to update my PR
# 21:29 tantek also, thanks to a lot of hard work by glennjones, his microformatshiv parser has landed in Firefox and is now in the Firefox Developer Release if you want to try it out
# 21:30 tantek IndieWebCamp participant Shane Caraveo (@mixedpuppy) helped work the parser into the code and handle all the landing / source control / test case details
# 21:31 tantek and Operator add on creator Mike Kaply is submitting patches to improve microformatshiv as well
# 21:31 tantek I'm working on ways to demonstrate this functionality at a user level
# 21:31 tantek so I'm not making any big announcements until we figure that out
# 21:32 tantek e.g. - so how I can tell that there's an mf2 parser in FF?
# 21:32 Loqi tantek meant to say: e.g. - so how can I tell that there's an mf2 parser in FF?
# 21:33 tantek aaronpk: anyway, there's also tons more awesome DevTools improvements in this version of FF Dev Edition, so you might find it interesting to check out for webdev in general
barnabywalters joined the channel
tantek joined the channel