#microformats 2016-03-13

2016-03-13 UTC
emmak joined the channel
KartikPrabhu joined the channel
#
@KitaitiMakoto
user_urlの所URI Template使ってるのかー。サンプルがdefunktだから彼の仕事なのか知ら。Microformatsとかもやってたしこういうの好きそう。 Feeds | GitHub Developer Guide https://developer.github.com/v3/activity/feeds/
(twitter.com/_/status/708906686463913985)
barnabywalters and warehouse13 joined the channel
#
Loqi
barnabywalters: tantek left you a message on 8/18 at 9:19am: one more (hopefully minor) mf2 parser issue with proposal, implied date for dt properties: http://microformats.org/wiki/microformats2-parsing-issues#implied_date_for_dt_properties_both_mf2_and_backcompat please comment!
#
Loqi
barnabywalters: tantek left you a message on 9/18 at 4:26pm: all resolved issues with implementation(s) incorporated into the microformats2 parsing spec - take a look, see if you have any qs: http://microformats.org/wiki/index.php?title=microformats2-parsing&diff=65229&oldid=65090&rcid=101616
#
Loqi
barnabywalters: tantek left you a message on 11/24 at 5:39pm: please see new microformats2 parsing issue and propose resolution - we have a decent consensus, lacking objections I'm going to edit the spec accordingly in the next few days http://microformats.org/wiki/microformats2-parsing-issues#uf2_children_on_backcompat_properties
#
barnabywalters
excluding the content of these elements from the plaintext value makes perfect sense
#
barnabywalters
but for me, removing it from the HTML value shouldn’t be something the generic parser does
#
barnabywalters
that’s a job for a sanitization stage, which is going to be different depending on the use case of the consumer
#
barnabywalters
e.g. the <style> element could be used to provide per-post custom styling (maybe using the scoped attribute)
#
barnabywalters
and <script> could be used for publishing interactive HTML documents
#
barnabywalters
(I think I actually did this once, will try to find the example)
#
barnabywalters
as aaronpk says, people who don’t want these effects, or the potential attacks which could results from including <style> and <script> elements need to remove them anyway
#
barnabywalters
so why make the parser do that, and prevent people who do want to get the original HTML content from being able to do so?
#
aaronpk
hm, maybe it does make sense to keep them for e-* properties, but remove them from all plaintext values
#
barnabywalters
that would be my suggestion
#
barnabywalters
keep plaintext values as slim and as user-focused as possible, as the big reason they’re there is as an easy alternative to processing the HTML content
#
barnabywalters
and then leave the HTML content as it is
#
barnabywalters
of course there are always going to be limits
#
barnabywalters
e.g. do we parse all CSS in <style> elements and resolve relative URLs, as we now do in srcset attributes (thanks to voxpelli)
#
aaronpk
seems reasonable
#
barnabywalters
but generally I’d rather leave the option open. It’s easy enough to remove these elements if they’re unwanted
#
barnabywalters
and as you pointed out, for reposting, sanitisation is obligatory anyway
#
barnabywalters
so we’re not even saving anyone any work by removing style and script at the parsing stage
#
barnabywalters
there are hundreds of other attack vectors people need sanitisation stages to protect against
#
aaronpk
not sure if you saw, but I did a bunch of work on this which ended up at https://xray.p3k.io http://github.com/aaronpk/XRay
#
aaronpk
it does the sanitization of the HTML values and allows a limited subset of tags
#
barnabywalters
aha, is this jf2? need to familiarise myself with that
#
aaronpk
it isn't officially, but is my experiment with that
#
aaronpk
essentially a subset of mf2 which is vocabulary-aware, designed to be easier to consume when you know what it is you're consuming
#
barnabywalters
cool, a declarative approach to what php-mf2-cleaner was trying to solve
#
aaronpk
yeah you could say that
#
barnabywalters
and in much the same way is presumably a sanitisation stage, applied to the results of parsing using a generic parser?
#
aaronpk
yeah after the mf2 parsing, xray goes into any HTML values and does sanitization
#
barnabywalters
I’d like to get tantek’s input on my objection, as well as anyone actively working on mf2 parsers. I’ll try to express it more concisely and add it to the wiki page
#
aaronpk
yeah, worth noting that on the wiki vote section
#
aaronpk
oh and XRay does the whole "is the name a prefix of the content" thing to un-imply the p-name property
#
aaronpk
one particularly clever thing XRay does when sanitizing HTML is it removes all class names from HTML attributes except mf2 classes :D
#
barnabywalters
edited /microformats2-parsing-issues (+590) "/* exclude style elements before parsing */ added agreement to removing style and script contents from plaintext properties, objection to removing them from HTML properties"
(view diff)
#
barnabywalters
sounds like xray is doing a very similar job to the mf2 post-processing code I wrote for shrewdness
#
aaronpk
probably
#
aaronpk
i'm currently using it for webmention.io, and will soon be using it for my reader
#
barnabywalters
that produced a more or less flat dictionary of reliable properties based on extensive normalisation and post-processing of raw mf2 structures
#
barnabywalters
using your comments library as a basis :)
#
aaronpk
yeah. I wanted to do that in a self-contained environment so that I could write thorough unit tests for it
#
aaronpk
also wanted to be able to use it with webmention.io which is Ruby, so I needed it as an API
#
aaronpk
cause I really didn't want to re-implement it in Ruby too :)
#
barnabywalters
shrewdness does a bunch of things like fetching other URLs and seeing what’s there, so I started out writing it as part of shrewdness
#
barnabywalters
the intention was always to abstract it out as a separate library for mf2 postprocessing when it was “finished”
#
barnabywalters
rather than work on two things at once
#
aaronpk
always a tough call :)
#
barnabywalters
with taproot I found I was bogged down by having so many bits of functionality abstracted out as libraries
#
barnabywalters
especially as nobody else even uses the libraries ;)
#
barnabywalters
so with shrewdness I wanted to avoid that from the beginning as much as possible
#
aaronpk
interesting
#
barnabywalters
now I realise I could have just written the extra libraries hard-coded into taproot for quick editing, and copied them into standalone libraries separately
#
barnabywalters
(most of the things I’m talking about are pretty self contained, just one file and one test file)
#
barnabywalters
so I might do that at some point
#
barnabywalters
but even with this ease-of-development approach for shrewdness I ended up losing motivation to work on it a lot… until now ;)
#
aaronpk
oh? what's new?
#
barnabywalters
nothing “new” as such, I just finally got some motivation to work on personal programming projects again
#
aaronpk
awesome
#
aaronpk
whoa, I forgot I made that PR to remove the script and style tags
#
barnabywalters
heh yeah it was probably some time ago
#
barnabywalters
thanks for helping look after php-mf2 during my random absence, btw
#
aaronpk
I got way more involved with it than I intended to :)
#
aaronpk
but it was out of necessity, since I wanted things for XRay
tantek, barnabywalters and KartikPrabhu joined the channel
#
tantek
looks
#
tantek
barnabywalters: "publishing posts with embedded per-post styling" is done via a property, not by an embedded <style>
#
barnabywalters
tantek: cool, I missed out on that in my abscence, how’s it done?
#
tantek
giving you some discoveryable links/phrases in #Indiewebcamp now :)
#
tantek
"publishing interactive HTML documents with embedded javascript" is interesting, yet "document" is a bit of a misnomer there
#
tantek
as soon as it's non-trivially interactive it's more of an application than a document
#
tantek
and *that* is a diffferent enough use-case from a post to be worth documenting
#
tantek
(separately)
#
tantek
once people start publishing them
#
tantek
although...
#
barnabywalters
I think I did this at least once, I can’t find an example quickly
#
aaronpk
i published an article which had an inline interactive form
#
aaronpk
used javascript for some things
#
barnabywalters
but without evidence that for the generic parser to not remove style and script elements from the raw HTML content is a major burden for microformats consumers, I’m against making changes which arbitrarily remove information
#
tantek
it's not arbitrary
#
barnabywalters
in the plaintext properties it’s not arbitrary, because this content is causing legitimate problems
#
tantek
the burden of evidence is always on use-cases, to keep things, not "not to remove"
#
tantek
https://indiewebcamp.com/custom_post_script solves the "publishing interactive HTML documents with embedded javascript" use-cases we've found so far
#
barnabywalters
in html content it’s removed anyway by sanitization stages, and I see no reason for the parser to do sanitization work
#
barnabywalters
it will inevitably do an incomplete job of it
#
barnabywalters
I see the parser as a generic HTML consumption tool rather than something focused on specific use cases
#
barnabywalters
and the exact degree of sanitization depends on the use case
#
tantek
"sanitization" isn't the purpose here - thus it's a strawman to say "do an incomplete job of it"
#
tantek
the purpose is to remove documented cases of noise
#
tantek
which we are doing, one at a time, as we document them
#
barnabywalters
tantek: then what is the purpose, specifically of removing this content from the raw HTML values
#
barnabywalters
the existing cases point to noise in the plaintext properties
#
barnabywalters
and I completely agree that contents of script and style elements should be removed from those
#
tantek
ok that is a smaller change that still satisfies the documented use-cases
#
barnabywalters
where are the documented examples of <script> and <style> contents causing problems in raw HTML content structures
#
barnabywalters
yes, hence I am more in favour of it. Satisfy the documented use cases whilst removing the minimum amount of information
#
tantek
seems reasonable. if we get any problems later with those elements in e-* properties, we can consider that separately
#
barnabywalters
yep, agreed
#
tantek
however, I think we should still remove them from *all* the other property prefixes (u-* dt-*)
#
barnabywalters
yep, absolutely, everything which is plaintext
#
barnabywalters
including the value property in html/value structs (do we have a good name for those?)
#
barnabywalters
everything plaintext is safe, the html property is where people go if they want a hard time ;)
#
barnabywalters
aaronpk: shall I go ahead and adapt your php-mf2 PR to remove <script> and <style> from everything apart from html properties, and merge? Then I can publish a new release this evening
#
tantek
edited /microformats2-parsing-issues (+444) "/* exclude style elements before parsing */ Proposal 2, e-* properties HTML values preserve all markup, others drop style and script elements and their content"
(view diff)
#
tantek
barnabywalters, aaronpk, kylewm added Proposal 2 with barnabywalters suggested refinement. Please check and +1/0/-1 accordingly: http://microformats.org/wiki/microformats2-parsing-issues#exclude_style_elements_before_parsing
#
barnabywalters
cool, I’ll respond there and wait for consensus from other implementers before implementing in php-mf2
#
barnabywalters
edited /microformats2-parsing-issues (+44) "/* exclude style elements before parsing */"
(view diff)
#
tantek
I think if aaronpk is good with it you can proceed. I don't expect kylewm to object to a smaller change as a first step
#
tantek
barnabywalters: and thank you for going through the issues! we have a bunch more that need implementer feedback
#
barnabywalters
edited /microformats2-parsing-issues (+170) "/* ignore u-camelCase properties */"
(view diff)
#
barnabywalters
tantek: np, looking through and responding accordingly now
#
barnabywalters
edited /microformats2-parsing-issues (+100) "/* use poster if no src on video for u props */"
(view diff)
barnabywalters and tantek joined the channel
#
barnabywalters
edited /microformats2-parsing-issues (+104) "/* uf2 children on backcompat properties */"
(view diff)
#
barnabywalters
edited /microformats2-parsing-issues (+262) "/* default generated HTML */"
(view diff)
barnabywalters joined the channel
#
aaronpk
edited /microformats2-parsing-issues (+125) "/* exclude style elements before parsing */"
(view diff)
barnabywalters joined the channel
#
aaronpk
barnabywalters: sorry, was afk. thanks and yes feel free to update my PR
#
barnabywalters
me too, cool, will do when kylewm chimes in on the issue
#
tantek
also, thanks to a lot of hard work by glennjones, his microformatshiv parser has landed in Firefox and is now in the Firefox Developer Release if you want to try it out
#
tantek
(as of the release made this Tuesday)
#
tantek
IndieWebCamp participant Shane Caraveo (@mixedpuppy) helped work the parser into the code and handle all the landing / source control / test case details
#
aaronpk
that's amazing news!
#
tantek
and Operator add on creator Mike Kaply is submitting patches to improve microformatshiv as well
#
tantek
aaronpk: indeed!
#
tantek
I'm working on ways to demonstrate this functionality at a user level
#
tantek
so I'm not making any big announcements until we figure that out
#
tantek
e.g. - so how I can tell that there's an mf2 parser in FF?
#
tantek
s/I can/can I
#
Loqi
tantek meant to say: e.g. - so how can I tell that there's an mf2 parser in FF?
#
tantek
aaronpk: anyway, there's also tons more awesome DevTools improvements in this version of FF Dev Edition, so you might find it interesting to check out for webdev in general
barnabywalters joined the channel
#
barnabywalters
edited /microformats2-parsing-issues (+1) "/* exclude style elements before parsing */"
(view diff)
tantek joined the channel
#
kylewm
edited /microformats2-parsing-issues (+301) "/* exclude style elements before parsing */ +1 barnabywalters narrowed proposal for <script> and <style> tags"
(view diff)
#
kylewm
edited /microformats2-parsing-issues (-4) "/* exclude style elements before parsing */ grammar"
(view diff)