#microformats 2018-01-11

2018-01-11 UTC
#
[kevinmarks]
There is a cobblers children aspect to the css there
#
tantek
[kevinmarks]: the CSS or the microformats themselves?
#
[kevinmarks]
That it has fixed width margins on my phone
#
[kevinmarks]
My phone has a 4k screen and I have a near point about 6" from my eyes with glasses off, so I can read it, but that might not be broadly true
#
tantek
sure but microformats is not about CSS advice
#
KartikPrabhu
microformats might not be about CSS advice, but it would be good if the site were easier to read on phones so that more people read it
#
KartikPrabhu
also, having a "old" look seems to deter people from taking content seriously
#
tantek
having a *dated* look yeah
#
tantek
at least for a while, then it becomes retro / appreciable eventually
#
KartikPrabhu
yeah, i didn't say "retro"
#
tantek
yeah too soon for it to be retro
#
KartikPrabhu
also, in a largely mobile world fixed width is not great even if it is "Retro"
#
KartikPrabhu
but, I don't want to go anywhere near styling wikis
tantek, KartikPrabhu, Kyle-K, nitot and [mlopatka] joined the channel; Kyle-K left the channel
#
sknebel
ouch, php-mf2 doesn't handle the microformats.org homepage well
#
Zegnat
Huh. How did it even get all those URL properties on the h-feed :/
#
Zegnat
Although I guess you should test in the latest dev version of the PHP parser? Since gRegorLove has been working on implementing the back-compat discussion.
#
tantek
maybe we need to improve the backcompat spec?
#
sknebel
Zegnat: I thought that was primarily about mixed mf1 and mf2, where mf.org is purely mf1 if I saw it right? but maybe
#
Zegnat
Possibly? The dangerous thing is if it means creating an entire mf1 parsing spec inside the mf2 parsing spec.
#
Zegnat
Maybe sknebel, really don’t know. I don’t even know why those URLs are even being picked up even in mf1 syntax? But I would need to revisit mf1 parsing specs
#
sknebel
the python parser chockes on a value-title pattern for the dates, but otherwise seems to get it right: posts are actually children of h-feed, titles seem right, ...
#
tantek
Zegnat 'Entire mf1 parsing spec' no
#
tantek
it's about carefully picking mf1 classes to translate into mf2 e
#
tantek
equivalents
#
Zegnat
http://microformats.org/wiki/hfeed doesn’t suggest anything about url properties on feeds to me. The HTML doesn’t seem to have any classes like it either. So I really have no idea where the URL properties are coming from :S
#
Loqi
hAtom 0.1
#
sknebel
(https://github.com/microformats/microformats2-parsing/issues/11 was the issue for mixed mf1/mf2 we had recently again)
#
Loqi
[sknebel] #11 backcompat child of mf2 root
#
Zegnat
I don’t think this is a spec issue, seems just like a bug in the PHP parser. Though what would cause it to find those URL properties and stuff, I have no idea.
[kevinmarks] joined the channel
#
Zegnat
[kevinmarks], which parser is that? Is it the same Python one was sknebel linked or a different one?
#
[kevinmarks]
Python one, yes
#
[kevinmarks]
I have extra backcompat for WordPress, but looks like they weren't in that theme
#
[kevinmarks]
It does have 2 identical urls for each post
#
Zegnat
It does that. Both title and timestamp are marked as permalinks (rel="bookmark")
#
[kevinmarks]
I wonder if we should change parsing to strip duplicates. The rels parsing does, but it is harder in the broader case.
#
Zegnat
I don’t think we need to bog down the parsing spec with de-duping steps. If the canonical HTML supplies the permalink twice, have it in the parsed output twice. I see no real harm.
#
Zegnat
although it may be an idea for something like jf2 to incorporate de-duping.
nitot joined the channel
#
ben_thatmustbeme
Sigh, and no one mentions the most recently (re)written parser... The Ruby one
#
sknebel
ben_thatmustbeme: generally, php is most used and python is the one I feel most comfortable fixing myself if I find issues
#
Zegnat
Interesting difference between the Python and Ruby one there: Ruby has no name property on the feed.
#
sknebel
but understands the "updated" properties
#
Zegnat
I think Ruby is right in having no name property on the feed, per spec.
#
Zegnat
Names should not be implied for backcompat parsing, and microformats.org does not specify a “site-title”
#
sknebel
Zegnat: the python one seems to grab the "home-title", [kevinmarks] is that the wp-specific rule you mentioned?
#
Zegnat
Interesting. http://microformats.org/wiki/h-feed#Backward_Compatibility says “site-title” is the WordPress thing. If that is wrong, hopefully [kevinmarks] can give it an update!
#
Loqi
h-feed
nitot, adactio, [mail], [kevinmarks], KartikPrabhu, [mlopatka], [pfefferle] and [miklb] joined the channel
#
gRegorLove
This is my latest php-mf2 improvements parsing microformats.org: http://gregorlove.com/php-mf2/test.php?id=39
#
gRegorLove
rel=bookmark gets upgraded to u-url
#
gRegorLove
I think the backcompat spec is fine for now, tantek, appears to be a bug in php-mf2 0.3.2 when parsing nested mf1, like hEntry inside hFeed.
#
sknebel
gRegorLove++ looks good. (can never remember if the timestamps with the "T" inbetween are fine or not?)
#
Loqi
gregorlove has 16 karma in this channel (209 overall)
#
gRegorLove
I think T only if authored. That's in a pending PR.
#
Zegnat
all strings are valid for time in mf2 ;)
#
Zegnat
But both T and space are allowed in HTML datetime values, so I would say mf2 follows that and definitely accept both: https://html.spec.whatwg.org/dev/text-level-semantics.html#datetime-value
#
sknebel
I misremembered the source, it is authored with the T
tantek joined the channel
#
gRegorLove
Different question, Zegnat. Parser was formerly normalizing to add the "T" even if it wasn't authored that way.
#
gRegorLove
But the spec says it shouldn't add it if it isn't authored that way
#
Zegnat
gRegorLove, caveat first: I really dislike dt- parsing as it currently stands. Unless the datetime value is coming from VCP, you should not normalise anything according to current spec.
#
Loqi
[Zegnat] My answer to the question in the title would be **Yes**. I feel like `dt-*` handling should describe how a string gets turned into a datetime stamp. No matter where the string is coming from (textContent, attribute, VCP, …). I also think this wo...
#
Zegnat
So you have my blessing in normalising timestamps. But per-spec, the parser is doing it wrong if it changes anything about the string value found in the HTML.
#
gRegorLove
Understood :) In this example it's not changing the string. The "T" is authored.
#
gRegorLove
At least in the first example, didn't spot check the other h-entry on the homepage
KartikPrabhu joined the channel
#
Zegnat
Someday, when I have a string of free hours, maybe I’ll finally author my proposal for different dt- parsing :)
[colinwalker], chrisaldrich, tantek and [keithjgrant] joined the channel
#
@dissolve333
@Blogger any chance you will ever add microformats-2 to your templates, would love to be able to parse any blogger ever with it
(twitter.com/_/status/951528572174336000)
#
ben_thatmustbeme
pokes the bear
#
aaronpk
is anyone even still developing Blogger?
#
aaronpk
whoa, they added mf2 to common crawl
#
aaronpk
at least "h-adr"
#
aaronpk
oh look, snarfed requested mf2 support back in march https://groups.google.com/forum/#!topic/web-data-commons/WOeSOODtj3A
[kevinmarks] joined the channel
#
[kevinmarks]
maybe plindner could have a chat with the blogger team
[mail] joined the channel
#
ben_thatmustbeme
i think the mf2 processing on apache's any23 is broken
#
ben_thatmustbeme
it processes part way, but seems to FAIL with errors
[miklb] joined the channel
#
Zegnat
I didn’t even know we had a Java based mf2 parser. More you learn…
#
Zegnat
Interesting!
#
ben_thatmustbeme
it fails parsing tantek.com and my site certainly
[dariusdunlap] joined the channel
#
ben_thatmustbeme
or rather it does, but just throws out a random error
#
ben_thatmustbeme
[Fatal Error] :170:3: The element type "input" must be terminated by the matching end-tag "</input>"
#
sknebel
sounds like the parser is more XML than HTML?
#
Zegnat
That sounds like a really bad plan for a web crawler to be doing
#
sknebel
which is weird, since thep arser docs say that it's HTML since 2011
#
sknebel
ben_thatmustbeme: are you testing the online thing, or are you running it locally?
#
ben_thatmustbeme
running it locally
#
Zegnat
Maybe the Java mf2 parser is xmllib dependent?
#
ben_thatmustbeme
any specific page you want me to try?
#
sknebel
try to update jsoup to a current one?
#
sknebel
assuming it's still compatible and doesn't explode
#
ben_thatmustbeme
seems like it just blows up when its malformed html
#
ben_thatmustbeme
very rigid parsing
#
ben_thatmustbeme
don't have time to hack on it, not a java person
#
sknebel
yeah, that'S why I'd just try updating dependency and check if that fixes it
KartikPrabhu joined the channel
#
ben_thatmustbeme
don't even know how to do that
[eddie] joined the channel
#
sknebel
fair point. at least we now know more about the Java parser: basic structure works, some things missing, something off with the HTML parser
#
ben_thatmustbeme
it seems to be using jsoup 1.7.2 by default, trying to build it with maven
#
ben_thatmustbeme
wow, that was from Jan 27, 2013
[colinwalker] and [kevinmarks] joined the channel
#
[kevinmarks]
The python one is a bit like that, the parsing depends which lib you are using
#
ben_thatmustbeme
looks like its here https://tika.apache.org/, this depends on edu.ucar:grib:jar:4.5.5 which is up to v 8.0 (
#
ben_thatmustbeme
though .... Unidata's NetCDF Decoders package is no longer being actively maintained or supported.
#
sknebel
jsoup still seems maintained
#
Zegnat
I don’t think any of the mf2 parsers come with their own HTML parser? PHP’s also depends on DOMDocument functions being available, and their stability depends on what libxml was used for the PHP instance. I guess for Python it is just easier to select the parser lib.
#
ben_thatmustbeme
taht was for grib which contains jsoup
#
ben_thatmustbeme
anyway, gotta go catch a train
#
ben_thatmustbeme
someone should figure out how to point the common crawl people at this
#
ben_thatmustbeme
i can't imagine they are too happy with how much this must error out
tantek, [mail], [eddie], sebsel, [kevinmarks] and chrisaldrich joined the channel