#ZegnatI am not sure that really changes much in the end though. It is not like we do not download the file if it has the wrong content type. We’ve already downloaded the file, does it then matter whether we make the DOM parser just try to consume it?
#gRegorLoveWe could also change the search to "html" or "xml" with option to add explicit types to whitelist
#ZegnatI guess there could be unknown exploitable bugs in the parser. But those will then also exist when people fake the content-type (remember: we are not sniffing it ourselves, we trust the resource provider)
#gRegorLoveShould we do a HEAD request before downloading?
#Zegnatyou can define a callback function for HTTP headers. That may be able to force quit the download after seeing an unsupported content-type header
#ZegnatI am heading for bed. But if people want to drop some brainstorms on a GitHub issue please do! We should define some sort of baseline what we think is up to us to protect users against, and what is up to users, though.
#ZegnatAt some point users will always be better of grabbing Guzzle or another dedicated HTTP lib for fetching the external data, and then feeding it to the mf2 parser manually (perhaps even after validating and parsing the HTML themselves)
#ZegnatGonna sleep on this, maybe I’ll wake up with all the answers :D Cheers all!
#sknebelthat's kind of the problem.. if you offer fetch, people will use it, and it makes some sense to offer a fetch with sane defaults so not everyone has to discover those themselves. o nthe other hand, you're now on the hook to have somewhat sane defaults
#[tantek]also we used to have a more generic applicability of microformats, to "in HTML/HTML5, and Atom/RSS/XHTML or other XML. "
#[tantek]but all the mf2 vocabs say "in HTML" (though that should likely be loosened and/or reference the parsing spec instead)
#sknebelthe one noticable case we had recently was someone with atom (or atom-like? didn't check too closely) xml (and XSLT for browsers) who wondered why indiewebify.me didn't like their rel=me
#[tantek]I think we dropped them (like old properties) because of lack of real world examples
#[tantek]e.g. even SVG has a class attribute, and despite the presence of lots of SVG, there's no critical mass, culture or community for using SVG for anything "semantic" (though that was its design)
#[tantek]SVG seems to be just a way to provide more (vector) efficient images, icons, etcc.
#gRegorLoveThe parsing spec includes "follow the HTML parsing rules" so maybe that should be clarified
#[tantek]HTML is a spec. It has parsing rules. Do you mean a specific link / reference?
#gRegorLoveOutside of this indiewebify.me use-case, I'm not aware of one. The person expected indiewebify.me to validate the rel-me, which do link to each other. It failed because php-mf2 doesn't parse text/xml content-type.
#sknebelit really only came up for consideration because the parsers more or less already do it
#sknebelmf2py did it, the go one did it, php-mf2 only rejected the file because it checked the content-type before parsing, the actual parsing logic doesn't really care
#sknebelan HTML5 parser apparently happily will turn XML into a DOM with some unusual elements
#[tantek]that's more interesting then. if there appears to be consensus among parsers for how to do some minimal XML parsing for mf2, we can try to make sense of it and document something minimal that appears to match use-cases
#gRegorLoveYeah, I was concerned a bit about formalizing the parsing spec to cover XML vs. the current accidental parsing.
#[tantek]we can formalize what happens if you happen to just get random XML that an HTML parser makes some sense of
#sknebelrealistically we are mostly bound to our HTML parsers for that I think. we're not writing or modifying those ourselves, and we don't want to. So I think treating the text about the parsing rules as as "we work on the DOM we get from an HTML parser, if you're content works for that we'll take it" sort of works, with the understanding that the weirder your "HTML" gets the more problems will pop up and its best effort
#sknebeli.e. we had (and might still have) problems with people using highly minimized HTML5, because older parser don't quite understand that right
#sknebelwhich python and php fixed by recommending HTML5 parsers (and I guess go has one by default, given Google and all), but e.g. bridgy uses a different HTML parser with mf2-py because the good one is slow and thus more expensive
#[tantek]is there an opportunity here like microformats on web components?
[schmarty] joined the channel
#sknebelnot sure, don't know much about them. I think as long as everything is in the tree as it is parsed (so not in weird properties or filled by JS), the parsers will just treat them like any other element
[kevinmarks] joined the channel
#tantekweb components are different in that regard (parsing, embedding, etc.)