barnabywaltersas I mentioned in #23, I think that complexity of authoring UIs and consumers (as well as back-compatibility for consumers) are strong arguments for permitting u-photo, -audio, -video etc within e-content
barnabywaltersif you allow them inside e-content, then any post authoring UI which allows HTML editing of e-content immediately natively supports image posts with re-ordering, alt text etc all via text editing
gRegorLoveI guess I'm not clear what the outcome of #23 will be other than a possible recommendation for publishers. It's not a parsing spec change, so u-photo will continue to be consumed regardless where it appears.
barnabywalterswell mostly it’s about the official definition of the u-photo and related u-[main content] properties, right? and how their presence should alter what e-content is used for, if at all
barnabywaltersI think part of that problem is that consuming HTML is hard. The mf2 parser makes it easier by narrowing down what HTML you have to deal with, but as soon as e-* properties are involved, consumers have to worry about potentially non-trivial transformations if they want to work with it
barnabywalterse.g. if the consumer wants to use the plaintext e-content value, and sees that there’s a u-photo property, they could replace instances of the u-photo URL in the plaintext content with an empty string
barnabywaltersor if they want to use the html e-content value, they could parse it and check for an img element with the u-photo url. If they find one, either remove it if they want to display the image themselves, or leave it in and know not to display the image a second time
barnabywaltersIMO documenting cases like this, and coming up with algorithms, recommendations and software to help consumers handle them is the more productive approach
aaronpk"could replace instances of the u-photo URL in the plaintext content with an empty string" sounds simple but it is not and it is very error prone
sknebelright, the "several ways" is part of the problem. now everyone gets to implement a long list of special cases, and not all software will implement them identically
barnabywaltersaaronpk: regarding the suggestion of handling plaintext content by replacing occurrences of the photo url with empty string: where would this not work?
barnabywalterswell in that case, I don’t see why having photo URLs show up in plaintext content is any worse than having completely broken code samples show up there
barnabywaltersindentation and formatting are likely to be broken unless the entire plaintext content is presented respecting whitespace, which is likely to cause other whitespace problems when displaying HTML content
barnabywaltersIMO, for anything other than the most basic content, the plaintext version of e-* properties is a convenience mostly useful for debugging or very basic usage, and any more serious consumer is likely going to have to wrestle with the html and do whatever parsing, sanitizing and processing is necessary for their use-case
aaronpkthat is probably true, but there's also a huge difference between handing off the HTML to an HTML sanitizer vs going and pulling out individual HTML tags from the document
sknebeland you can make their job a lot easier, or at least reduce the amount of breakage trhough cases they havent covered, by recommending to not put the u- in the e-content
aaronpkfor example i'm able to throw the e-content HTML at the main PHP HTML sanitizer and trust that the result will be usable without any further DOM fiddling
barnabywaltersat least in PHP, it’s not too difficult to parse the HTML into a DOMDocument and e.g. search it for an occurrence of <img> with an src matching the contents of a parsed u-photo property
barnabywaltersif you’re using php-mf2, then you already have a DOMDocument available, which has done a lot of the hard work of resolving URLs, dealing with encodings, etc
barnabywaltersthis reminds me of a topic I was thinking about a little back when I was actively working on php-mf2, which was how can we improve parsers to make consuming mf2 and HTML even easier
barnabywaltersone of the things I was thinking about was to have a parsing mode where each property additional contains a key which maps to the DOMElement it was parsed from, allowing consuming code to use the mf2 output to “reach into” the DOMDocument and get additional information, make changes etc
barnabywaltersso say you have e-content and a u-photo. You get references to the DOMElements they were parsed from, check to see if the photo was inside the content, and if so, call a method to remove the photo DOMElement from its parent
barnabywaltersanother thing which could be useful is to make the function which converts to whitespace more readily available, so it can be called on any DOMElement
barnabywaltersthat way, consumers can find content, make whatever changes they want to its DOM representation, then call the toPlaintext function on the result
barnabywaltersand IMO the mf2 parsing spec doesn’t have to have all the answers, provided parser implementations give their users sufficient tools
PooPSGTech, [tantek], ben_thatmustbeme, [tw2113_Slack_], [snarfed], [chee], [aciccarello], [jacky], TallTed, barnabywalters and justBull joined the channel; kiroul left the channel