#microformats 2021-05-25

2021-05-25 UTC
iwaim, jeremycherfas, ChanServ, easrng, MylesBraithwaite, [KevinMarks], hendursaga, twisted`, KartikPrabhu and [chee] joined the channel
#
@JamieTanna
I had some fun using the open standard #Microformats2 to update my CV to be machine-parseable - you can check it out at https://hire.jvt.me and there's a bit more info on https://www.jvt.me/posts/2021/05/25/microformats-resume/ (https://www.jvt.me/mf2/2021/05/e5drf/)
(twitter.com/_/status/1397127613534068737)
barnabywalters joined the channel
#
[KevinMarks]
suggestions
tweet[m] and jamietanna[m] joined the channel
#
barnabywalters
not sure if it’s worth sending a PR to a deprecated repo
#
barnabywalters
looks like this is the new one https://github.com/HTTPArchive/httparchive.org — it doesn’t have any mention of mf2, and I can’t find a similar custom metrics file
kir0ul, kiroul, [KevinMarks], strugee and justBull joined the channel
#
[KevinMarks]
I'm talking to them about it - they need h-cite at least and some mf1 markers
indy, ben_thatmustbeme, hey, GWG, hendursaga, ivc, omz13, Phae, Kaja, beko, timotimo and Saphire joined the channel
indy_, iwaim, KartikPrabhu, MylesBraithwaite, jeremycherfas, ChanServ, barnabywalters, jamietanna[m] and globbot joined the channel
#
[KevinMarks]
should h-cite be on the front page now?
JackyAlcin[m], astralbijection[, tweet[m] and KartikPrabhu joined the channel
#
barnabywalters
the /h-cite page doesn’t have a Status section, but mentions that it’s a draft specification
#
barnabywalters
h-cite isn’t listed on https://microformats.org/wiki/microformats2#v2_vocabularies, but is mentioned on that page in several of the “examples in the wild”
#
barnabywalters
so it probably needs a Status section a la the one on h-entry, and then be added to /Main_Page and /microformats2
KartikPrabhu joined the channel
#
barnabywalters
hmm, changing the redirect syntax didn’t fix it. looks like there’s something wrong with how the mf wiki serves redirects
[chee], [KevinMarks], TallTed, [snarfed], barnabywalters and gRegorLove joined the channel
#
gRegorLove
tantek, I hadn't seen that conversation. Interesting. I saw chee suggest "still image or set of still images"
#
gRegorLove
I'm realizing how that definition interacts with https://github.com/microformats/h-entry/issues/23 too
#
gRegorLove
If the u-photo dfn included "if the entry has a content property, that should be used as the description for the photo(s)
#
gRegorLove
" that implies the u-photo shouldn't be embedded in e-content.
#
gRegorLove
I haven't thought about #23 much since the IWC Austin 2020 conversation that touched on it, so don't have a strong opinion
#
barnabywalters
as I mentioned in #23, I think that complexity of authoring UIs and consumers (as well as back-compatibility for consumers) are strong arguments for permitting u-photo, -audio, -video etc within e-content
#
barnabywalters
if you allow them inside e-content, then any post authoring UI which allows HTML editing of e-content immediately natively supports image posts with re-ordering, alt text etc all via text editing
#
barnabywalters
and people can also build dedicated UIs for managing all of that programatically if they want to, but it’s not required
#
gRegorLove
I guess I'm not clear what the outcome of #23 will be other than a possible recommendation for publishers. It's not a parsing spec change, so u-photo will continue to be consumed regardless where it appears.
#
gRegorLove
I also publish u-photo inside e-content
#
sknebel
see the linked xray issue for why thats trouble for consumers
#
barnabywalters
well mostly it’s about the official definition of the u-photo and related u-[main content] properties, right? and how their presence should alter what e-content is used for, if at all
#
barnabywalters
I think part of that problem is that consuming HTML is hard. The mf2 parser makes it easier by narrowing down what HTML you have to deal with, but as soon as e-* properties are involved, consumers have to worry about potentially non-trivial transformations if they want to work with it
#
barnabywalters
and I’m skeptical about how much this can be reduced by trying to force publishers to use properties in very specific ways
#
sknebel
forcing is obviously not happening
#
barnabywalters
e.g. in the examples aaronpk gives here, https://github.com/aaronpk/XRay/issues/52, there are several ways of dealing with either the plaintext value or the html value of e-content
#
Loqi
[aaronpk] #52 Remove images from posts containing a photo
#
barnabywalters
e.g. if the consumer wants to use the plaintext e-content value, and sees that there’s a u-photo property, they could replace instances of the u-photo URL in the plaintext content with an empty string
#
barnabywalters
or if they want to use the html e-content value, they could parse it and check for an img element with the u-photo url. If they find one, either remove it if they want to display the image themselves, or leave it in and know not to display the image a second time
#
barnabywalters
IMO documenting cases like this, and coming up with algorithms, recommendations and software to help consumers handle them is the more productive approach
#
aaronpk
"could replace instances of the u-photo URL in the plaintext content with an empty string" sounds simple but it is not and it is very error prone
#
aaronpk
same with HTML, it's a giant mess
#
barnabywalters
but regarding that issue: I do agree that it’d be worth reviewing how plaintext values are generated, where to imply u-photo
#
aaronpk
also see the examples i documented with alt text
#
sknebel
right, the "several ways" is part of the problem. now everyone gets to implement a long list of special cases, and not all software will implement them identically
#
barnabywalters
aaronpk: yeah, consuming HTML is a giant mess, and microformats can’t solve all of the problems
#
sknebel
and everyone who doesnt do it like xray gets complaints ;)
#
aaronpk
it is *so close* to solving all the problems tho
#
barnabywalters
aaronpk: regarding the suggestion of handling plaintext content by replacing occurrences of the photo url with empty string: where would this not work?
#
aaronpk
simple but contrived example is if the URL is also actually in the text for some reason
#
barnabywalters
in what context is an exact copy of the photo URL going to find its way into the plaintext content
#
aaronpk
like a blog post that contains code samples
#
barnabywalters
code samples are going to look like shit in plaintext anyway, sadly
#
aaronpk
doesn't mean they should be broken tho
#
barnabywalters
and anyway, a blog post with code samples in isn’t likely to have a u-photo property, as it’s a blog post not a photo post
#
aaronpk
well that's the other part of this discussion...which is what exactly does it mean to have a u-photo property and when should a post use it
#
barnabywalters
well in that case, I don’t see why having photo URLs show up in plaintext content is any worse than having completely broken code samples show up there
#
sknebel
what is "completely broken" about a plaintext code sample in a plaintext post?
#
sknebel
its ... plain text.
#
sknebel
(improved whitespace handling would help, but that's also somewhere on the todo pile ;))
#
barnabywalters
indentation and formatting are likely to be broken unless the entire plaintext content is presented respecting whitespace, which is likely to cause other whitespace problems when displaying HTML content
#
barnabywalters
IMO, for anything other than the most basic content, the plaintext version of e-* properties is a convenience mostly useful for debugging or very basic usage, and any more serious consumer is likely going to have to wrestle with the html and do whatever parsing, sanitizing and processing is necessary for their use-case
#
aaronpk
that is probably true, but there's also a huge difference between handing off the HTML to an HTML sanitizer vs going and pulling out individual HTML tags from the document
#
sknebel
and you can make their job a lot easier, or at least reduce the amount of breakage trhough cases they havent covered, by recommending to not put the u- in the e-content
#
aaronpk
for example i'm able to throw the e-content HTML at the main PHP HTML sanitizer and trust that the result will be usable without any further DOM fiddling
TallTed joined the channel
#
barnabywalters
at least in PHP, it’s not too difficult to parse the HTML into a DOMDocument and e.g. search it for an occurrence of <img> with an src matching the contents of a parsed u-photo property
#
barnabywalters
that’s maybe three lines?
#
barnabywalters
if you’re using php-mf2, then you already have a DOMDocument available, which has done a lot of the hard work of resolving URLs, dealing with encodings, etc
KartikPrabhu joined the channel
#
sknebel
and another few lines to special case <picture> tags, and ...
#
barnabywalters
this reminds me of a topic I was thinking about a little back when I was actively working on php-mf2, which was how can we improve parsers to make consuming mf2 and HTML even easier
#
barnabywalters
one of the things I was thinking about was to have a parsing mode where each property additional contains a key which maps to the DOMElement it was parsed from, allowing consuming code to use the mf2 output to “reach into” the DOMDocument and get additional information, make changes etc
#
barnabywalters
so say you have e-content and a u-photo. You get references to the DOMElements they were parsed from, check to see if the photo was inside the content, and if so, call a method to remove the photo DOMElement from its parent
#
barnabywalters
then you’d be able to get the inner content from the content DOMElement, knowing that it no longer contains the photo element
#
barnabywalters
(that’d at least handle the <picture> special casing you bought up…)
#
barnabywalters
aaronpk: something like this might help close the *so close* gap you mentioned
#
barnabywalters
another thing which could be useful is to make the function which converts to whitespace more readily available, so it can be called on any DOMElement
#
barnabywalters
s/whitespace/plaintext
#
barnabywalters
that way, consumers can find content, make whatever changes they want to its DOM representation, then call the toPlaintext function on the result
#
barnabywalters
there are always going to be fewer consumers than publishers, so IMO it makes sense to concentrate complexity at the consumers
#
barnabywalters
and IMO the mf2 parsing spec doesn’t have to have all the answers, provided parser implementations give their users sufficient tools
PooPSGTech, [tantek], ben_thatmustbeme, [tw2113_Slack_], [snarfed], [chee], [aciccarello], [jacky], TallTed, barnabywalters and justBull joined the channel; kiroul left the channel