ZegnatI am not sure that really changes much in the end though. It is not like we do not download the file if it has the wrong content type. We’ve already downloaded the file, does it then matter whether we make the DOM parser just try to consume it?
ZegnatI guess there could be unknown exploitable bugs in the parser. But those will then also exist when people fake the content-type (remember: we are not sniffing it ourselves, we trust the resource provider)
ZegnatI am heading for bed. But if people want to drop some brainstorms on a GitHub issue please do! We should define some sort of baseline what we think is up to us to protect users against, and what is up to users, though.
ZegnatAt some point users will always be better of grabbing Guzzle or another dedicated HTTP lib for fetching the external data, and then feeding it to the mf2 parser manually (perhaps even after validating and parsing the HTML themselves)
sknebelthat's kind of the problem.. if you offer fetch, people will use it, and it makes some sense to offer a fetch with sane defaults so not everyone has to discover those themselves. o nthe other hand, you're now on the hook to have somewhat sane defaults
sknebelthe one noticable case we had recently was someone with atom (or atom-like? didn't check too closely) xml (and XSLT for browsers) who wondered why indiewebify.me didn't like their rel=me
[tantek]e.g. even SVG has a class attribute, and despite the presence of lots of SVG, there's no critical mass, culture or community for using SVG for anything "semantic" (though that was its design)
gRegorLoveOutside of this indiewebify.me use-case, I'm not aware of one. The person expected indiewebify.me to validate the rel-me, which do link to each other. It failed because php-mf2 doesn't parse text/xml content-type.
sknebelmf2py did it, the go one did it, php-mf2 only rejected the file because it checked the content-type before parsing, the actual parsing logic doesn't really care
[tantek]that's more interesting then. if there appears to be consensus among parsers for how to do some minimal XML parsing for mf2, we can try to make sense of it and document something minimal that appears to match use-cases
sknebelrealistically we are mostly bound to our HTML parsers for that I think. we're not writing or modifying those ourselves, and we don't want to. So I think treating the text about the parsing rules as as "we work on the DOM we get from an HTML parser, if you're content works for that we'll take it" sort of works, with the understanding that the weirder your "HTML" gets the more problems will pop up and its best effort
sknebelwhich python and php fixed by recommending HTML5 parsers (and I guess go has one by default, given Google and all), but e.g. bridgy uses a different HTML parser with mf2-py because the good one is slow and thus more expensive
sknebelnot sure, don't know much about them. I think as long as everything is in the tree as it is parsed (so not in weird properties or filled by JS), the parsers will just treat them like any other element