#microformats 2018-11-26

2018-11-26 UTC
pniedzielski[m], wakest[m], [jgmac1106], [Sim], jgmac1106, milkii_, [eddie], [tantek], kisik21, barpthewire, bigbluehat_, ivc_, [Vincent], [chrisburnell], Kaja___, sknebel, Facebook, [kevinmarks], KartikPrabhu, [dave], microgram, [Csongor], blundin, [cleverdevil], [pfefferle] and [schmarty] joined the channel
#
tantek
edited /h-event (+107) "/* Examples in the wild */ add oauth.net/events"
(view diff)
eduardm, [jgmac1106], [eddie], blundin and tantek joined the channel
#
gRegorLove
Re: https://github.com/microformats/php-mf2/issues/209 should we whitelist a few content types with a config option so devs can customize/disable entirely?
#
Loqi
[Zegnat] #209 Try to parse any file with the HTML parser
eduardm_ joined the channel
#
gRegorLove
text/html, application/xhtml+xml, text/xml, application/xml... any others?
#
Zegnat
image/svg+xml, maybe? Not sure anyone has shipped mf in those though
#
gRegorLove
fetch() does allow the first two, since it searches "html" case-insensitive
#
sknebel
oh right, that's a search
#
sknebel
didn't realize that made it work already
#
Zegnat
I am not sure that really changes much in the end though. It is not like we do not download the file if it has the wrong content type. We’ve already downloaded the file, does it then matter whether we make the DOM parser just try to consume it?
#
gRegorLove
We could also change the search to "html" or "xml" with option to add explicit types to whitelist
[tantek] joined the channel
#
gRegorLove
Ah, good point
#
Zegnat
I guess there could be unknown exploitable bugs in the parser. But those will then also exist when people fake the content-type (remember: we are not sniffing it ourselves, we trust the resource provider)
#
gRegorLove
Should we do a HEAD request before downloading?
#
Zegnat
HEAD requests make sense if we want to do pre-checks, yes.
#
sknebel
can curl be told to abort on content-types that don't match?
#
Zegnat
... maybe
#
Zegnat
you can define a callback function for HTTP headers. That may be able to force quit the download after seeing an unsupported content-type header
#
Zegnat
In PHP, that is. For raw curl, who knows
#
sknebel
sure, talking about parser ocntext
#
Zegnat
I am heading for bed. But if people want to drop some brainstorms on a GitHub issue please do! We should define some sort of baseline what we think is up to us to protect users against, and what is up to users, though.
#
Zegnat
At some point users will always be better of grabbing Guzzle or another dedicated HTTP lib for fetching the external data, and then feeding it to the mf2 parser manually (perhaps even after validating and parsing the HTML themselves)
#
Zegnat
Gonna sleep on this, maybe I’ll wake up with all the answers :D Cheers all!
#
sknebel
that's kind of the problem.. if you offer fetch, people will use it, and it makes some sense to offer a fetch with sane defaults so not everyone has to discover those themselves. o nthe other hand, you're now on the hook to have somewhat sane defaults
#
[tantek]
right sknebel.
#
[tantek]
also we used to have a more generic applicability of microformats, to "in HTML/HTML5, and Atom/RSS/XHTML or other XML. "
#
[tantek]
but all the mf2 vocabs say "in HTML" (though that should likely be loosened and/or reference the parsing spec instead)
#
sknebel
the one noticable case we had recently was someone with atom (or atom-like? didn't check too closely) xml (and XSLT for browsers) who wondered why indiewebify.me didn't like their rel=me
#
[tantek]
I think we dropped them (like old properties) because of lack of real world examples
#
[tantek]
e.g. even SVG has a class attribute, and despite the presence of lots of SVG, there's no critical mass, culture or community for using SVG for anything "semantic" (though that was its design)
#
[tantek]
SVG seems to be just a way to provide more (vector) efficient images, icons, etcc.
#
gRegorLove
The parsing spec includes "follow the HTML parsing rules" so maybe that should be clarified
#
[tantek]
HTML is a spec. It has parsing rules. Do you mean a specific link / reference?
#
gRegorLove
* The mf2 parsing spec includes
#
[tantek]
Atom lacks a class attribute. Any of use microformats in Atom must actually be in XHTML inside an entry content, possibly entry title
#
gRegorLove
Atom does have rels, though, which is what started this conversation
#
[tantek]
huh. is it useful to use an mf2 parser just to extract rels from an Atom file
#
gRegorLove
Outside of this indiewebify.me use-case, I'm not aware of one. The person expected indiewebify.me to validate the rel-me, which do link to each other. It failed because php-mf2 doesn't parse text/xml content-type.
#
Loqi
[mro] #78 false negative testing https://try.gogs.io/issue-5008
#
[tantek]
I think that's reasonable (to only support HTML) until there's non-trivial real world examples / uses of XML, whether Atom or rando
#
gRegorLove
Another option I suggested in the comments could be to expand indieweb/rel-me lib to parse rels from headers
#
gRegorLove
To clarify, is a single real world example considered trivial?
#
[tantek]
on the scale of the web? certainly
#
[tantek]
single real world examples pop into and out of existence all the time. like quantum particles in the ether.
#
[tantek]
(except the latter are far more frequent)
#
sknebel
it really only came up for consideration because the parsers more or less already do it
#
sknebel
mf2py did it, the go one did it, php-mf2 only rejected the file because it checked the content-type before parsing, the actual parsing logic doesn't really care
#
sknebel
an HTML5 parser apparently happily will turn XML into a DOM with some unusual elements
#
[tantek]
that's more interesting then. if there appears to be consensus among parsers for how to do some minimal XML parsing for mf2, we can try to make sense of it and document something minimal that appears to match use-cases
#
sknebel
which I guess makes sense given XHTML etc
#
[tantek]
ah in that way, interesting
#
gRegorLove
Yeah, I was concerned a bit about formalizing the parsing spec to cover XML vs. the current accidental parsing.
#
[tantek]
we can formalize what happens if you happen to just get random XML that an HTML parser makes some sense of
#
sknebel
realistically we are mostly bound to our HTML parsers for that I think. we're not writing or modifying those ourselves, and we don't want to. So I think treating the text about the parsing rules as as "we work on the DOM we get from an HTML parser, if you're content works for that we'll take it" sort of works, with the understanding that the weirder your "HTML" gets the more problems will pop up and its best effort
#
sknebel
i.e. we had (and might still have) problems with people using highly minimized HTML5, because older parser don't quite understand that right
#
sknebel
which python and php fixed by recommending HTML5 parsers (and I guess go has one by default, given Google and all), but e.g. bridgy uses a different HTML parser with mf2-py because the good one is slow and thus more expensive
#
[tantek]
is there an opportunity here like microformats on web components?
[schmarty] joined the channel
#
sknebel
not sure, don't know much about them. I think as long as everything is in the tree as it is parsed (so not in weird properties or filled by JS), the parsers will just treat them like any other element
[kevinmarks] joined the channel
#
tantek
web components are different in that regard (parsing, embedding, etc.)
#
tantek
so that's why it's potentially interesting