#microformats 2018-02-24

2018-02-24 UTC
[cleverdevil] joined the channel
#
@Jim_Munro
VAULT | 4Jan14: Where is the best place to learn about microformats? Q&A: http://dumbseoquestions.com/v/988 #DSQLive #SEO
(twitter.com/_/status/967202401923944448)
tantek, [asuh], [jeremycherfas], nitot, [kevinmarks], [mifga], sebsel, [eddie], [mrkrndvs], Loqi_, [miklb], webchat211, barpthewire and [cleverdevil] joined the channel
#
KartikPrabhu
in the last step for parsing a p-* and e-* http://microformats.org/wiki/microformats2-parsing one has to replace <img> by alt but not in dt-* and u-* is this correct?
#
Loqi
[Tantek Çelik] microformats2 parsing specification
#
Zegnat
Correct KartikPrabhu
#
KartikPrabhu
Zegnat: do you know the reasoning behind those?
#
KartikPrabhu
as in is it that p-* and e-* should have text equivalent of the images but not u-* and dt-* ?
#
Zegnat
I do not. I actually thought there may have been discussion about that on the issue tracker, but can’t find it (and might have dreamt it)
#
KartikPrabhu
ok will have to see how to fix mf2py to do this
#
KartikPrabhu
will have fiddle with different parsers again :|
KartikPrabhu joined the channel
#
Zegnat
I wonder if it is worth bringing the IMG parsing into the overall textContent discussion (https://github.com/microformats/microformats2-parsing/issues/15)
#
Loqi
[Zegnat] #15 What should mf2 textContent parsing result in? User expectation vs. DOM specification.
#
Zegnat
Rather than special casing SCRIPT/STYLE removal and IMG normalisation.
#
KartikPrabhu
there is also the weird "adding a space at the beginning and end"
#
Zegnat
That’s just there so `text<img alt="hi">` turns into `text hi` and not into `texthi`.
#
Zegnat
I am guessing because the first thing is what most people would expect their markup to be turned into.
#
Zegnat
The next step removes leading and trailing spaces, so it never accidentally leaves those hanging.
#
KartikPrabhu
yeah, but it will be some weird code to do that :P
#
KartikPrabhu
also is "next line" "\r\n" or just "\n" :P
#
KartikPrabhu
PHP parser gives "\r\n"
#
KartikPrabhu
Zegnat: tracking here https://github.com/kartikprabhu/mf2py/issues/61 feel free to chime in
#
Loqi
[kartikprabhu] #61 include img alt and src in p-* and e-* parsing
#
Zegnat
My browser’s textContent gives me \n from your example code.
#
KartikPrabhu
yeah that's what I would expect too
#
Zegnat
In actuality, I think neither is “expected”. The mf2 output should match whatever was used by the HTML input.
#
KartikPrabhu
but again that might depend on the HTML parser used and possibly the OS
#
Zegnat
It shouldn’t. I don’t recall the HTML spec normalising between \r\n and \n. So HTML parsers should not touch it.
#
Zegnat
Your OS shouldn’t be changing the bytes of an HTML file either, without you triggering anything.
#
KartikPrabhu
yeah I won't worry about that part for now
#
Zegnat
It might be up to the HTML author’s OS. But that’s of no concern. The Text Node’s bytes should be copied verbatim at all times. So neither \r\n or \n is “expected”.
#
Zegnat
If there is a normalisation step somewhere, I’d be happy if someone could point me at it :)
#
Zegnat
If you are feeding mf2php HTML with \n and it gives you \r\n back, that’s probably a bug
#
KartikPrabhu
yeah, I am using \n since I am on a Unix machine
[kevinmarks] joined the channel
#
[kevinmarks]
There was some whitespace stripping
#
[kevinmarks]
Leading/trailing whitespace is always stripped
#
[kevinmarks]
So I suppose if you have multiple of a property with whitespace between, you may lose whitespace if you join them again
#
[kevinmarks]
You'd see this in h-recipe perhaps, where there are ingredients lists and steps
#
Zegnat
I am not sure what you mean with “join them”, [kevinmarks]. Why would I join separate mf values?
[snarfed] joined the channel
#
aaronpk
we haven't really done much with that in indieweb usage, but you could imagine multiple "content" properties, one for each paragraph
#
sknebel
hm, I wonder how many tools actually handle that
#
aaronpk
I suspect not many, since that behavior was never explicitly mentioned in the vocabularies like h-entry
#
sknebel
would also require some conversion e.g. for microsub, since jf2 does not allow it
#
aaronpk
i'm not sure it really makes sense for the "content" property either, I woudn't want to encourage it
#
aaronpk
technically jf2 allows it since jf2 isn't vocab aware, but the tools I've been writing are using a vocab-aware representation in jf2
#
aaronpk
now that I think about it, seems like it would make sense for h-entry to define which properties are allowed to have multiple values http://microformats.org/wiki/h-entry#Core_Properties
#
Loqi
[Tantek Çelik] h-entry is a simple, open format for episodic or datestamped content on the web. h-entry is often used with content intended to be syndicated, e.g. blog posts. h-entry is one of several open microformat standards suitable for embedding data in HTML. ...
#
sknebel
true, I always assumed specific jf2 profiles when mentioning jf2
#
sknebel
oh, and no, jf2 reserves html and content as names for content, explicitly only allowing single-values
#
aaronpk
oh! I missed that somehow
#
aaronpk
edited /h-entry (+105) "no h-entry or h-card found on the site or permalinks. Undo revision 66594 by [[Special:Contributions/Okinawan-lyrics|Okinawan-lyrics]] ([[User talk:Okinawan-lyrics|Talk]])"
(view diff)
#
sknebel
if I didn't have *just* read through the spec...
#
[kevinmarks]
With mf1 there was some implied joining iirc - for dates before we had dt
#
[kevinmarks]
And value class pattern makes it explicit http://microformats.org/wiki/value-class-pattern
#
Zegnat
Yeah, but that's not merging separate properties.
#
Zegnat
VCP is also very much mf2 still :-)
[mifga], [mrkrndvs] and KartikPrabhu joined the channel
#
KartikPrabhu
woot! mf2py now drops <script> and <style> from name parsing and also adds in alt and src of img
#
Loqi
😄
#
sknebel
KartikPrabhu++ nice!
#
Loqi
kartikprabhu has 8 karma in this channel (171 overall)
#
Zegnat
Those are some fast updates coming in KartikPrabhu, Great!
#
Zegnat
KartikPrabhu++
#
Loqi
kartikprabhu has 9 karma in this channel (172 overall)
vivus and j12t_ joined the channel
#
j12t_
Q: where is rel=self defined? It's mentioned here http://microformats.org/wiki/microformats2 but without explanation.
#
Loqi
microformats2
#
Zegnat
j12t, ^^^
#
j12t
Ah. Thank you.
#
Zegnat
Although that seems to say self was something that was already registered prior to Atom? So there might be an even older source?
#
j12t
I don't need it to be authoritative ... thanks!
#
Zegnat
In that case, there you have it :) The /existing-rel-values page is a good one to keep in mind, as it is even linked to by the HTML spec
tantek, [eddie], [miklb] and [kevinmarks] joined the channel