#microformats 2021-05-25

2021-05-25 UTC
iwaim, jeremycherfas, ChanServ, easrng, MylesBraithwaite, [KevinMarks], hendursaga, twisted`, KartikPrabhu and [chee] joined the channel
# 09:50 
@JamieTanna I had some fun using the open standard #Microformats2 to update my CV to be machine-parseable - you can check it out at https://hire.jvt.me and there's a bit more info on https://www.jvt.me/posts/2021/05/25/microformats-resume/ (https://www.jvt.me/mf2/2021/05/e5drf/) (twitter.com/_/status/1397127613534068737)
barnabywalters joined the channel
# 10:09 
[KevinMarks] so, this needs updating https://github.com/HTTPArchive/legacy.httparchive.org/blob/0a07b6126fd7cde65410737b36f87740130117cc/custom_metrics/wpt_bodies.js#L997
# 10:09 
[KevinMarks] suggestions
tweet[m] and jamietanna[m] joined the channel
# 10:13 
barnabywalters not sure if it’s worth sending a PR to a deprecated repo
# 10:13 
barnabywalters looks like this is the new one https://github.com/HTTPArchive/httparchive.org — it doesn’t have any mention of mf2, and I can’t find a similar custom metrics file
kir0ul, kiroul, [KevinMarks], strugee and justBull joined the channel
# 10:17 
[KevinMarks] I'm talking to them about it - they need h-cite at least and some mf1 markers
indy, ben_thatmustbeme, hey, GWG, hendursaga, ivc, omz13, Phae, Kaja, beko, timotimo and Saphire joined the channel
# 10:20 
@lobsters Marking up my Curriculum Vitae with Microformats2
https://lobste.rs/s/ygr5sf #web
https://www.jvt.me/posts/2021/05/25/microformats-resume/ (twitter.com/_/status/1397134148414976001)
indy_, iwaim, KartikPrabhu, MylesBraithwaite, jeremycherfas, ChanServ, barnabywalters, jamietanna[m] and globbot joined the channel
# 10:40 
[KevinMarks] should h-cite be on the front page now?
# 10:40 
[KevinMarks] https://microformats.org/wiki/Main_Page#Specifications
JackyAlcin[m], astralbijection[, tweet[m] and KartikPrabhu joined the channel
# 12:35 
barnabywalters the /h-cite page doesn’t have a Status section, but mentions that it’s a draft specification
# 12:35 
barnabywalters h-cite isn’t listed on https://microformats.org/wiki/microformats2#v2_vocabularies, but is mentioned on that page in several of the “examples in the wild”
# 12:37 
barnabywalters so it probably needs a Status section a la the one on h-entry, and then be added to /Main_Page and /microformats2
# 12:38 
barnabywalters hmm https://microformats.org/wiki/FAQ seems to be broken
KartikPrabhu joined the channel
# 12:40 
barnabywalters edited /FAQ (+0) (view diff)
# 12:40 
barnabywalters hmm, changing the redirect syntax didn’t fix it. looks like there’s something wrong with how the mf wiki serves redirects
[chee], [KevinMarks], TallTed, [snarfed], barnabywalters and gRegorLove joined the channel
# 16:37 
gRegorLove tantek, I hadn't seen that conversation. Interesting. I saw chee suggest "still image or set of still images"
# 16:38 
gRegorLove I'm realizing how that definition interacts with https://github.com/microformats/h-entry/issues/23 too
# 16:39 
gRegorLove If the u-photo dfn included "if the entry has a content property, that should be used as the description for the photo(s)
# 16:39 
gRegorLove " that implies the u-photo shouldn't be embedded in e-content.
# 16:40 
gRegorLove I haven't thought about #23 much since the IWC Austin 2020 conversation that touched on it, so don't have a strong opinion
# 16:59 
barnabywalters as I mentioned in #23, I think that complexity of authoring UIs and consumers (as well as back-compatibility for consumers) are strong arguments for permitting u-photo, -audio, -video etc within e-content
# 17:01 
barnabywalters if you allow them inside e-content, then any post authoring UI which allows HTML editing of e-content immediately natively supports image posts with re-ordering, alt text etc all via text editing
# 17:01 
barnabywalters and people can also build dedicated UIs for managing all of that programatically if they want to, but it’s not required
# 17:09 
gRegorLove I guess I'm not clear what the outcome of #23 will be other than a possible recommendation for publishers. It's not a parsing spec change, so u-photo will continue to be consumed regardless where it appears.
# 17:09 
gRegorLove I also publish u-photo inside e-content
# 17:10 
sknebel see the linked xray issue for why thats trouble for consumers
# 17:10 
barnabywalters well mostly it’s about the official definition of the u-photo and related u-[main content] properties, right? and how their presence should alter what e-content is used for, if at all
# 17:14 
barnabywalters I think part of that problem is that consuming HTML is hard. The mf2 parser makes it easier by narrowing down what HTML you have to deal with, but as soon as e-* properties are involved, consumers have to worry about potentially non-trivial transformations if they want to work with it
# 17:14 
barnabywalters and I’m skeptical about how much this can be reduced by trying to force publishers to use properties in very specific ways
# 17:15 
sknebel forcing is obviously not happening
# 17:16 
barnabywalters e.g. in the examples aaronpk gives here, https://github.com/aaronpk/XRay/issues/52, there are several ways of dealing with either the plaintext value or the html value of e-content
# 17:16 
Loqi [aaronpk] #52 Remove images from posts containing a photo
# 17:17 
barnabywalters e.g. if the consumer wants to use the plaintext e-content value, and sees that there’s a u-photo property, they could replace instances of the u-photo URL in the plaintext content with an empty string
# 17:18 
barnabywalters or if they want to use the html e-content value, they could parse it and check for an img element with the u-photo url. If they find one, either remove it if they want to display the image themselves, or leave it in and know not to display the image a second time
# 17:18 
barnabywalters IMO documenting cases like this, and coming up with algorithms, recommendations and software to help consumers handle them is the more productive approach
# 17:19 
aaronpk "could replace instances of the u-photo URL in the plaintext content with an empty string" sounds simple but it is not and it is very error prone
# 17:19 
aaronpk same with HTML, it's a giant mess
# 17:20 
barnabywalters but regarding that issue: I do agree that it’d be worth reviewing how plaintext values are generated, where to imply u-photo
# 17:20 
aaronpk also see the examples i documented with alt text
# 17:20 
sknebel right, the "several ways" is part of the problem. now everyone gets to implement a long list of special cases, and not all software will implement them identically
# 17:20 
barnabywalters aaronpk: yeah, consuming HTML is a giant mess, and microformats can’t solve all of the problems
# 17:20 
sknebel and everyone who doesnt do it like xray gets complaints ;)
# 17:20 
aaronpk it is *so close* to solving all the problems tho
# 17:21 
barnabywalters aaronpk: regarding the suggestion of handling plaintext content by replacing occurrences of the photo url with empty string: where would this not work?
# 17:22 
aaronpk simple but contrived example is if the URL is also actually in the text for some reason
# 17:22 
barnabywalters in what context is an exact copy of the photo URL going to find its way into the plaintext content
# 17:22 
aaronpk like a blog post that contains code samples
# 17:22 
barnabywalters code samples are going to look like shit in plaintext anyway, sadly
# 17:22 
aaronpk doesn't mean they should be broken tho
# 17:23 
barnabywalters and anyway, a blog post with code samples in isn’t likely to have a u-photo property, as it’s a blog post not a photo post
# 17:24 
aaronpk well that's the other part of this discussion...which is what exactly does it mean to have a u-photo property and when should a post use it
# 17:24 
barnabywalters well in that case, I don’t see why having photo URLs show up in plaintext content is any worse than having completely broken code samples show up there
# 17:24 
sknebel what is "completely broken" about a plaintext code sample in a plaintext post?
# 17:25 
sknebel its ... plain text.
# 17:25 
sknebel (improved whitespace handling would help, but that's also somewhere on the todo pile ;))
# 17:25 
barnabywalters indentation and formatting are likely to be broken unless the entire plaintext content is presented respecting whitespace, which is likely to cause other whitespace problems when displaying HTML content
# 17:26 
barnabywalters IMO, for anything other than the most basic content, the plaintext version of e-* properties is a convenience mostly useful for debugging or very basic usage, and any more serious consumer is likely going to have to wrestle with the html and do whatever parsing, sanitizing and processing is necessary for their use-case
# 17:27 
aaronpk that is probably true, but there's also a huge difference between handing off the HTML to an HTML sanitizer vs going and pulling out individual HTML tags from the document
# 17:28 
sknebel and you can make their job a lot easier, or at least reduce the amount of breakage trhough cases they havent covered, by recommending to not put the u- in the e-content
# 17:28 
aaronpk for example i'm able to throw the e-content HTML at the main PHP HTML sanitizer and trust that the result will be usable without any further DOM fiddling
TallTed joined the channel
# 17:31 
barnabywalters at least in PHP, it’s not too difficult to parse the HTML into a DOMDocument and e.g. search it for an occurrence of <img> with an src matching the contents of a parsed u-photo property
# 17:31 
barnabywalters that’s maybe three lines?
# 17:32 
barnabywalters if you’re using php-mf2, then you already have a DOMDocument available, which has done a lot of the hard work of resolving URLs, dealing with encodings, etc
KartikPrabhu joined the channel
# 17:36 
sknebel and another few lines to special case <picture> tags, and ...
# 17:36 
barnabywalters this reminds me of a topic I was thinking about a little back when I was actively working on php-mf2, which was how can we improve parsers to make consuming mf2 and HTML even easier
# 17:37 
barnabywalters one of the things I was thinking about was to have a parsing mode where each property additional contains a key which maps to the DOMElement it was parsed from, allowing consuming code to use the mf2 output to “reach into” the DOMDocument and get additional information, make changes etc
# 17:39 
barnabywalters so say you have e-content and a u-photo. You get references to the DOMElements they were parsed from, check to see if the photo was inside the content, and if so, call a method to remove the photo DOMElement from its parent
# 17:39 
barnabywalters then you’d be able to get the inner content from the content DOMElement, knowing that it no longer contains the photo element
# 17:40 
barnabywalters (that’d at least handle the <picture> special casing you bought up…)
# 17:43 
barnabywalters aaronpk: something like this might help close the *so close* gap you mentioned
# 17:49 
barnabywalters another thing which could be useful is to make the function which converts to whitespace more readily available, so it can be called on any DOMElement
# 17:50 
barnabywalters s/whitespace/plaintext
# 17:50 
barnabywalters that way, consumers can find content, make whatever changes they want to its DOM representation, then call the toPlaintext function on the result
# 17:53 
barnabywalters there are always going to be fewer consumers than publishers, so IMO it makes sense to concentrate complexity at the consumers
# 17:53 
barnabywalters and IMO the mf2 parsing spec doesn’t have to have all the answers, provided parser implementations give their users sufficient tools
PooPSGTech, [tantek], ben_thatmustbeme, [tw2113_Slack_], [snarfed], [chee], [aciccarello], [jacky], TallTed, barnabywalters and justBull joined the channel; kiroul left the channel