#microformats 2018-11-26

2018-11-26 UTC
pniedzielski[m], wakest[m], [jgmac1106], [Sim], jgmac1106, milkii_, [eddie], [tantek], kisik21, barpthewire, bigbluehat_, ivc_, [Vincent], [chrisburnell], Kaja___, sknebel, Facebook, [kevinmarks], KartikPrabhu, [dave], microgram, [Csongor], blundin, [cleverdevil], [pfefferle] and [schmarty] joined the channel
# 19:14 
tantek edited /h-event (+107) "/* Examples in the wild */ add oauth.net/events" (view diff)
eduardm, [jgmac1106], [eddie], blundin and tantek joined the channel
# 21:36 
gRegorLove Re: https://github.com/microformats/php-mf2/issues/209 should we whitelist a few content types with a config option so devs can customize/disable entirely?
# 21:36 
Loqi [Zegnat] #209 Try to parse any file with the HTML parser
eduardm_ joined the channel
# 21:37 
gRegorLove text/html, application/xhtml+xml, text/xml, application/xml... any others?
# 21:38 
Zegnat image/svg+xml, maybe? Not sure anyone has shipped mf in those though
# 21:38 
gRegorLove fetch() does allow the first two, since it searches "html" case-insensitive
# 21:39 
sknebel oh right, that's a search
# 21:39 
sknebel didn't realize that made it work already
# 21:39 
Zegnat I am not sure that really changes much in the end though. It is not like we do not download the file if it has the wrong content type. We’ve already downloaded the file, does it then matter whether we make the DOM parser just try to consume it?
# 21:39 
gRegorLove We could also change the search to "html" or "xml" with option to add explicit types to whitelist
[tantek] joined the channel
# 21:40 
gRegorLove Ah, good point
# 21:40 
Zegnat I guess there could be unknown exploitable bugs in the parser. But those will then also exist when people fake the content-type (remember: we are not sniffing it ourselves, we trust the resource provider)
# 21:41 
gRegorLove Should we do a HEAD request before downloading?
# 21:41 
gRegorLove Yeah
# 21:42 
Zegnat HEAD requests make sense if we want to do pre-checks, yes.
# 21:42 
sknebel can curl be told to abort on content-types that don't match?
# 21:43 
Zegnat ... maybe
# 21:44 
Zegnat you can define a callback function for HTTP headers. That may be able to force quit the download after seeing an unsupported content-type header
# 21:44 
Zegnat In PHP, that is. For raw curl, who knows
# 21:44 
sknebel sure, talking about parser ocntext
# 21:45 
Zegnat I am heading for bed. But if people want to drop some brainstorms on a GitHub issue please do! We should define some sort of baseline what we think is up to us to protect users against, and what is up to users, though.
# 21:45 
Zegnat At some point users will always be better of grabbing Guzzle or another dedicated HTTP lib for fetching the external data, and then feeding it to the mf2 parser manually (perhaps even after validating and parsing the HTML themselves)
# 21:46 
Zegnat Gonna sleep on this, maybe I’ll wake up with all the answers :D Cheers all!
# 21:47 
sknebel that's kind of the problem.. if you offer fetch, people will use it, and it makes some sense to offer a fetch with sane defaults so not everyone has to discover those themselves. o nthe other hand, you're now on the hook to have somewhat sane defaults
# 21:53 
[tantek] right sknebel.
# 21:54 
[tantek] also we used to have a more generic applicability of microformats, to "in HTML/HTML5, and Atom/RSS/XHTML or other XML. "
# 21:55 
[tantek] but all the mf2 vocabs say "in HTML" (though that should likely be loosened and/or reference the parsing spec instead)
# 21:58 
sknebel the one noticable case we had recently was someone with atom (or atom-like? didn't check too closely) xml (and XSLT for browsers) who wondered why indiewebify.me didn't like their rel=me
# 22:02 
gRegorLove yeah, Atom. https://l.mro.name/o/p/
# 22:05 
[tantek] I think we dropped them (like old properties) because of lack of real world examples
# 22:05 
[tantek] e.g. even SVG has a class attribute, and despite the presence of lots of SVG, there's no critical mass, culture or community for using SVG for anything "semantic" (though that was its design)
# 22:06 
[tantek] SVG seems to be just a way to provide more (vector) efficient images, icons, etcc.
# 22:07 
gRegorLove The parsing spec includes "follow the HTML parsing rules" so maybe that should be clarified
# 22:07 
[tantek] HTML is a spec. It has parsing rules. Do you mean a specific link / reference?
# 22:07 
gRegorLove https://html.spec.whatwg.org/multipage/parsing.html#parsing
# 22:08 
gRegorLove * The mf2 parsing spec includes
# 22:08 
[tantek] Atom lacks a class attribute. Any of use microformats in Atom must actually be in XHTML inside an entry content, possibly entry title
# 22:09 
gRegorLove Atom does have rels, though, which is what started this conversation
# 22:09 
[tantek] huh. is it useful to use an mf2 parser just to extract rels from an Atom file
# 22:09 
[tantek] ?
# 22:11 
gRegorLove Outside of this indiewebify.me use-case, I'm not aware of one. The person expected indiewebify.me to validate the rel-me, which do link to each other. It failed because php-mf2 doesn't parse text/xml content-type.
# 22:11 
gRegorLove https://github.com/indieweb/indiewebify-me/issues/78
# 22:11 
Loqi [mro] #78 false negative testing https://try.gogs.io/issue-5008
# 22:14 
[tantek] I think that's reasonable (to only support HTML) until there's non-trivial real world examples / uses of XML, whether Atom or rando
# 22:15 
gRegorLove Another option I suggested in the comments could be to expand indieweb/rel-me lib to parse rels from headers
# 22:15 
gRegorLove To clarify, is a single real world example considered trivial?
# 22:16 
[tantek] yes
# 22:16 
[tantek] on the scale of the web? certainly
# 22:17 
[tantek] single real world examples pop into and out of existence all the time. like quantum particles in the ether.
# 22:17 
[tantek] (except the latter are far more frequent)
# 22:17 
gRegorLove Fair
# 22:21 
sknebel it really only came up for consideration because the parsers more or less already do it
# 22:21 
sknebel mf2py did it, the go one did it, php-mf2 only rejected the file because it checked the content-type before parsing, the actual parsing logic doesn't really care
# 22:22 
sknebel an HTML5 parser apparently happily will turn XML into a DOM with some unusual elements
# 22:22 
[tantek] that's more interesting then. if there appears to be consensus among parsers for how to do some minimal XML parsing for mf2, we can try to make sense of it and document something minimal that appears to match use-cases
# 22:22 
sknebel which I guess makes sense given XHTML etc
# 22:22 
[tantek] ah in that way, interesting
# 22:24 
gRegorLove Yeah, I was concerned a bit about formalizing the parsing spec to cover XML vs. the current accidental parsing.
# 22:25 
[tantek] we can formalize what happens if you happen to just get random XML that an HTML parser makes some sense of
# 22:30 
sknebel realistically we are mostly bound to our HTML parsers for that I think. we're not writing or modifying those ourselves, and we don't want to. So I think treating the text about the parsing rules as as "we work on the DOM we get from an HTML parser, if you're content works for that we'll take it" sort of works, with the understanding that the weirder your "HTML" gets the more problems will pop up and its best effort
# 22:30 
sknebel i.e. we had (and might still have) problems with people using highly minimized HTML5, because older parser don't quite understand that right
# 22:31 
sknebel which python and php fixed by recommending HTML5 parsers (and I guess go has one by default, given Google and all), but e.g. bridgy uses a different HTML parser with mf2-py because the good one is slow and thus more expensive
# 22:34 
[tantek] right
# 22:35 
[tantek] is there an opportunity here like microformats on web components?
[schmarty] joined the channel
# 22:41 
sknebel not sure, don't know much about them. I think as long as everything is in the tree as it is parsed (so not in weird properties or filled by JS), the parsers will just treat them like any other element
[kevinmarks] joined the channel
# 22:56 
tantek web components are different in that regard (parsing, embedding, etc.)
# 22:56 
tantek so that's why it's potentially interesting