#microformats 2015-06-09

2015-06-09 UTC
#
tantek
catches up on logs here too
#
tantek
wonders if we should provide a nice short recipe example that's actually a good recipe, e.g. for a smoothie
#
tantek
has apples, oranges, and bananas at home and may have to try to make one (recipe and smoothie)
#
KevinMarks
I have a photo
#
KevinMarks
that closes a <p> it didn't open
#
KevinMarks
so html.parser puts the h-cite after the </html>
#
KevinMarks
should I fix the html to be more valid?
#
KevinMarks
or should we just reject html.parser outright ?
#
KevinMarks
or maybe the lenient flag is what we need
#
KevinMarks
kylewm: what was the lenient trick?
#
kylewm
'permissive' was the name of the feature that {lxml, html5lib} and !{html.parser}
#
KevinMarks
so that will stop the bad markup restructuring problem of html.parser
#
KevinMarks
but we still have whitespace variation between lxml and html5lib
#
KevinMarks
which is another argument for whitespace collapsing IMO
#
kylewm
i wonder if the python parser is more problematic than the others in this regard
#
KevinMarks
it may be an artefact of line-oriented parsing
#
kylewm
so... another option would be to collapse whitespace in the test harness, when doing string comparisons
#
kylewm
and basically punt on the right way for parsers to do it
#
tantek
edited /picoformats (-122) "removals: hashtags.org corp service site, microsyntax(.)org looks like expired & registered by nothing do with this topic, twitterdata(.)org similarly appears to have been parked after being abandoned"
(view diff)
#
kylewm
and I'd probably vote to go ahead and do that for now, so we can worry about more interesting parsing issues
#
tantek
edited /picoformats (-4) "/* Generic */ civilities.net 404 - thus removing - assumed dead site, note @-mentions different from @-replies, add #-hashtags as generic example"
(view diff)
#
@AllTheTwits
Trying to find a way of inputting microformats that doesn't make you want to smash through walls.
(twitter.com/_/status/608083293544828929)
#
KevinMarks
I want to actually resolve the whitespace issue for parser users though
#
KevinMarks
as I don't think anyone actually wants ' \n\r \t ' in their json
KartikPrabhu joined the channel
#
KevinMarks
and the "we should preserve it" is a theoretical problem
fuzzyhorns joined the channel
#
KevinMarks
i wonder if mf2py should put a note in the output if they are using html.parser that warns them
#
kylewm
lol, do you mean stdout, or the JSON output?
#
tantek
heh
#
KevinMarks
I mean the json output
#
KevinMarks
thinking about how long it has taken us to work this out
#
KevinMarks
that is what a dev will be looking at
#
tantek
edited /microformats2-parsing (+172) "remove leading/trailing whitespace per part of issue whitespace collapsing revisited consensus, and implementation in mf2py"
(view diff)
#
tantek
btw one way to explicitly include leading/trailing whitespace with with the data element and value attribute
#
tantek
I'm leaving that in because the only reason you would have whitespace in a quoted attribute value is deliberate
#
tantek
edited /microformats2-parsing-issues (+78) "note whitespace collapsing revisited 2015-06-08 option 2 resolved by consensus/mf2py impl - drop leading/trailing whitespace"
(view diff)
#
tantek
ok I think I'm caught up to consensus + implementation issues
#
tantek
next I'm going to start incorporating the strong consensus issues that are TBI
#
tantek
in particular: " uf2 children inside a classic microformats root class name", " any h- root class name overrides and stops backcompat root", " backcompat classic microformats should only see backcompat properties", " microformats2 root class names should only see microformats2 properties", " implied properties on backcompat parsing unlikely to be intended", " implied properties when an explicit class is provided", " link elements and u-
#
tantek
parsing"
#
tantek
the first 7 issues listed here: http://microformats.org/wiki/microformats2-parsing-issues please (re)read and speak up if you have had any change of opinion since consensus at the mf2 dev meetup where they were resolved
fuzzyhorns, eschnou, tantek, Phae, benward, tommorris, gRegorLove, Left_Turn and JonathanNeal joined the channel
#
@pypi_updates
mf2util 0.2.0: Python Microformats2 utilities, a companion to mf2py http://pypi.python.org/pypi/mf2util/0.2.0
(twitter.com/_/status/608156574889218048)
bret, csarven, twisted`, kez and glennjones joined the channel
#
csarven
hAtom2Atom e.g., http://tools.microformatic.com/help/xhtml/hatom/ , may be broken. Is the source up somewhere?
voxpelli joined the channel
#
@pypi_updates
mf2util 0.2.1: Python Microformats2 utilities, a companion to mf2py https://pypi.python.org/pypi/mf2util/0.2.1
(twitter.com/_/status/608166638513463297)
csarven, rknLA, eschnou and Garbee joined the channel
#
@artwisanggeni
#python mf2util 0.2.1: Python Microformats2 utilities, a companion to mf2py https://pypi.python.org/pypi/mf2util/0.2.1
(twitter.com/_/status/608175462947262465)
pfefferle, chiui, eschnou, KevinMarks_ and glennjones joined the channel
#
@csarven
#microformats 2 is #RDFa + #SchemaOrg "simplified".
(twitter.com/_/status/608194508711444480)
pfefferle_, KartikPrabhu and Zegnat joined the channel
elf-pavlik, KevinMarks_, adactio, csarven and pfefferle joined the channel
#
@fakebaldur
@bbirdiman @dauwhe @acutebit 'e-content' is a part of Microformats2 h-news which makes life simpler for Instapaper, Readability & scrapers +
(twitter.com/_/status/608237870072131584)
pfefferle joined the channel
#
@neogeografen
Skal du til #fm15 på Bornholm? Så findes der også gratis off-line digitale kort til Garmin som er #osm baseret http://www.microformats.dk/2014/05/26/endnu-en-installeringsguide-til-gratis-openstreetmap-cykelkort-pa-garmin-gpsere/
(twitter.com/_/status/608248647923433473)
#
@neogeografen
Skal du til #fm15 pÃ¥ Bornholm? SÃ¥ findes der ogsÃ¥ gratis off-line digitale kort til Garmin som er #osm baseret http://www.microformats.dk/2014/05/26/endnu-en-installeringsguide-til-gratis-openstreetmap-cykelkort-pa-garmin-gpsere/ … 2/2
(twitter.com/_/status/608248749865992192)
kez_ joined the channel
#
@neogeografen
PÃ¥ #fm15 ? hvad med lave datamotion i pauserne med Mapillary app (crowdsourcet billeder) af øens skønne natur? http://www.microformats.dk/2015/02/16/opvarmning-til-open-data-day-med-en-omgang-high-impack-datamotion/
(twitter.com/_/status/608252264134942721)
fuzzyhorns and tantek joined the channel
KartikPrabhu, elux, kez, eschnou, TallTed, adactio, pfefferle, ben_thatmustbeme and dym_cx joined the channel
#
dym_cx
is ".p-language" an acceptable microformats class for spoken/witen languages on h-card/h-resume? or ".p-skill" is generally prefered?
#
Zegnat
skill seems to be the generally assumed class, though separating languages has been discussed in brainstorming. There are also some good mark-up examples there: http://microformats.org/wiki/hresume-skill-brainstorm#Separating_language_from_skills
#
Zegnat
(I would have given that link in #indiewebcamp, but needed some time to dig it up again.)
tantek, gRegorLove and pfefferle joined the channel
#
tantek
good morning #microformats!
#
tantek
!tell csarven re: https://twitter.com/csarven/status/608194508711444480 what's been your personal experience in using microformats2 to markup your HTML, vs. RDFa with/without schemaorg etc.? difficulty / impact on your markup / # of changes / time to do etc.
#
Loqi
Ok, I'll tell them that when I see them next
#
csarven
tantek To be absolutely clear, *for me*, there is hardly any significant difference between writing mf2 and RDFa. In fact, mf2 is so close to RDFa (at least the way I see it), they are virtually interchangeable. Yes, there are plenty of differences if we look closely, but I don't think those are fundamental to picking one over the other. *As I see it*, if one can do either one of those, they can handle the other.
#
Loqi
csarven: tantek left you a message 4 minutes ago: re: https://twitter.com/csarven/status/608194508711444480 what's been your personal experience in using microformats2 to markup your HTML, vs. RDFa with/without schemaorg etc.? difficulty / impact on your markup / # of changes / time to do etc.
#
csarven
Difficulty of vocab use (e.g., looking up a term to use) is about equivalent.
#
csarven
class="p-name" vs. property="foaf:name" --- hardly calls for a debate.
#
csarven
class="u-url" vs. rel="foaf:homepage"
#
tantek
except mf2 does not use nor have fragile cnames
#
tantek
s/cnames/qnames
#
Loqi
tantek meant to say: except mf2 does not use nor have fragile qnames
#
tantek
which becomes more obvious once you start marking up multiple types of things
#
csarven
Can you elaborate on the "fragility"?
#
tantek
yeah it's long discussed and posted in lots of places
#
csarven
Let me guess.. in the mf/wiki written my people which have strong dislike to namespaces? :)
#
csarven
s/my/by
#
Loqi
csarven meant to say: Let me guess.. in the mf/wiki written by people which have strong dislike to namespaces? :)
#
tantek
I think only citations there
#
csarven
At the end of the day, whether you tell the machine "foo" or "foo:bar", it doesn't make a difference. Bunch of sufficiently unique-enough strings
#
tantek
since mf folks don't have much direct experience with using qnames, others have documented the problems
#
csarven
p-name is a "simple" QName.
#
tantek
csarven - sure, prefixes, whether -webkit- or -foaf- work
#
tantek
but that's not a qname - qname's are bound with URIs (somewhere else) which is what makes them fragile
#
tantek
s/qname's/qnames
#
Loqi
tantek meant to say: but that's not a qname - qnames are bound with URIs (somewhere else) which is what makes them fragile
#
tantek
saying p-name is a "simple" QName is as much of a lie / deception as saying that Facebook's OGP is "simple" RDFa.
#
csarven
Well, if you treat it as an URN, the problem goes away.
#
tantek
"treat as an URN" - something no one does in actual practice with anything
#
tantek
so that's like saying "if (this hypothetical thing), ..."
#
csarven
Your definition of "actual practice" is bound to what's on webpages. I would say that, that is not the best or complete (although I don't disagree that it is a good sample).
#
tantek
right, the web is my dataset, all else is handwaving
#
csarven
To be clear: Open/Public/Visible Web is your dataset.
#
tantek
I don't believe in angels on the head of a pin either. What's your point?
#
csarven
Just because you can't access or don't want to work with, it doesn't mean that others are not.
#
csarven
senses a meta-discussion at this point.
#
csarven
I'm happy to get back to the main discussion point.
#
tantek
nah - I like stuff that is citable. that typically means open/public/visible web. have pretty much given up on armchair architecture about uncitable things.
#
tantek
science prefers the citable, all else is good material for art
#
csarven
If the QName breaks or fragile, information can be extracted just the same. Even if http://example.org/ behind that example:foo disappears you can still make sense of it. Just as you make sense of p-name in different context, i.e., offloading that interpretation to the scripts.
#
tantek
then just use prefixes in the first place, and forget about qnames. it makes the model simpler.
#
csarven
Unfortunately we live in a world where not every phenomenon is easily reproducible or even replicable.
fuzzyhorns joined the channel
#
tantek
who said anything about solving "every phenomenon"? <-- boiling the ocean strawman
#
tantek
in short: qnames break copy/paste
#
tantek
or vice versa
#
tantek
btw - lots more citations in that Hixie email
#
csarven
I think whether the model is simpler or to what extent it is, it is not that clear. The model is arguably "simpler" for your dataset, but I don't think that simplicity is obvious nor is it the case that the fragility of QNames is something to worry about (in a force-able future) - if you think that's the case, show me what breaks and everything falls apart.
#
tantek
see citations to all the things that have broken over the years in https://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009Aug/0035.html
#
csarven
The fact that mf2 exists is a good testament of mf was "too simple".
#
csarven
There is a balance somewhere.
#
tantek
qnames are in the basic category of "YAGNI"
#
tantek
some aspects of classic microformats were very simple, other aspects (needing to write special parsing code per new microformat) were not
#
tantek
we thought that was a reasonable trade-off, turned out with some more thinking / experience / innovation, it was a trade-off we didn't need to actually make
#
csarven
How is the parsing of mf2 fundamentally different than RDFa?
#
csarven
I don't want to discuss edge-cases
TallTed joined the channel
#
tantek
lol qnames are all about edgecases
#
csarven
mf2 is now ever close to RDF(a) than it ever was.
#
tantek
so you already have by trying to justify them
#
tantek
you also brought up non-open non-public non-visible web things - also edge case
#
csarven
[Citation needed]
#
tantek
parsing of mf2 is much simpler
#
tantek
just compare size of specs for starters
#
tantek
I agree there is much potential compatibility between mf2 and RDF(a)
#
csarven
Do you think that ... say the World Bank or.... you know NSA and other teams which wear black all day publish their data on webpages? It is even commonly accepted that there is more "invisible" data than visible data.
#
tantek
"commonly accepted" - science is not a democracy - so "commonly accepted" is never a good argument
#
csarven
Right. Is that why /triples suggests "unnecessarily complicated"?
#
csarven
And that's a good argument right?
#
tantek
right, the proof is the property/value systems work fine
#
csarven
While you pick on my "commonly accepted". Will you use the same reasoning towards "unnecessarily complicated"?
#
tantek
AKA I suppose what you'd call "doubles"
#
tantek
so if doubles work fine, triples are unnecessary
#
csarven
You can arbitrarily come up with "working systems"
#
csarven
p-name will work just as well as xyz-name or xyx:name
#
tantek
csarven: hah - seems pretty difficult for most folks to actually ship working systems
#
tantek
e.g. on their own websites
#
tantek
except that makes it clear you don't understand what the p- is
#
Zegnat
I think csarven’s first statement is the real point here: it is just as easy for an implementer to look up the meaning behind “p-name” as it is to find it behind “foaf:name”. The only real pro for mf2 is in the parsing through prefixes, even if the parser doesn’t know what “name” is, it knows it will be text
#
tantek
it's just a parsing instruction, not present in the parsed JSON result
#
Zegnat
(I just spent 20 seconds writing what tantek just did in 2)
#
tantek
Zegnat - your explanation is good too
#
csarven
Again, I don't see fundamental differences neither for the authors or consumers.
#
csarven
s/authors/publishers
#
Loqi
csarven meant to say: Again, I don't see fundamental differences neither for the publishers or consumers.
#
csarven
Differences are minimal at thi spoint.. and I think that itself is a good thing to take note of and learn from. There is some convergence.
#
tantek
for authors, mf2 is less markup
#
tantek
for consuming code, there are only 5 prefixes h- p- u- dt- e-
#
tantek
so it makes sense for both publishers and developers to use mf2 - less work, simpler
#
csarven
There is an EAV in place. That's the generalized form of mf2 and RDF(a).
#
aaronpk
for parsers, there are only 5 prefixes. for consuming code, there are none, because anything using the parsed result doesn't even see the prefixes
#
tantek
for anyone that has legacy systems / investments in "triples" in their backend, yes they can do conversions
#
tantek
aaronpk is right - once you've run a parser, you don't deal with any prefixes at all
#
csarven
If we are going to bring our calculators out... well, RDF has one way to parse ;) spo. Crawl the graph..
#
csarven
Not to forget that, it is practically the same way one would query that graph!
#
tantek
csarven more for your "qnames considered harmful" folder of citations: https://lists.w3.org/Archives/Public/www-tag/2002Jun/0126
#
csarven
So, not only do you write the data but get it out the same way.
#
tantek
csarven: this is pretty funny - what was the context? https://twitter.com/csarven/status/608214938373517312
#
@csarven
Argh.. Go away extra whitespace, you are drunk!
(twitter.com/_/status/608214938373517312)
#
csarven
Oh, don't remind me.. I still have to fix that. Not related to mf2/RDF.. Just some PHP echo or new line somewhere
#
@csarven
.@SemanticsConf ..where the conference can't tell the difference between print centric and Web based knowledge sharing #SemanticWeb #FAIL
(twitter.com/_/status/608174982523437056)
#
@csarven
.@SemanticsConf Self-explanatory. You run a conf on #SemanticWeb. You ask for LaTeX b/c you're not convinced/understand the Web(stack)
(twitter.com/_/status/608181029866729472)
#
tantek
I just don't even
#
@csarven
We'd understand & embrace the impact of lower-case semantic web efforts on the upper-case #SemanticWeb #Web eg @microformats @schemaorg_dev
(twitter.com/_/status/592973363800383489)
#
tantek
csarven: how do you cope with these people that talk * Web, and yet request LaTeX?!?
#
csarven
I'm allergic to both camps dissing the other side :) plenty to learn from different approaches.
#
csarven
tantek I try to breath in and out very slowly.
#
tantek
that's good advice :)
#
csarven
If we look at microformats/Microdata/RDF(a) history, I think we can see how each evolved due to the other.
#
tantek
certainly agreed - been saying that for quite some time
#
tantek
csarven: and microformats2 is the current latest version that takes into account all the lessons from the previous efforts (microformats/Microdata/RDF(a) history)
#
csarven
Roughly: RDF in HTML -> RDFa -> microformats -> Microdata -> RDFa Lite -> mf2 .. ?
#
tantek
microformats predates RDFa
#
tantek
and emerged independently
benborges joined the channel
#
tantek
from web designer's practices of using semantic class names
#
csarven
I can't recall exact dates right now.. but are you specifically referring to XFN?
#
tantek
no, rel evolved independently
#
tantek
and predates all of that I think
#
csarven
One of my happy moments of realization.
#
tantek
semantic class names evolved out of modern web designer practices once they started splitting presentation into CSS, and out of HTML, and started using semantic HTML
#
aaronpk
wonders how much longer that plus.google.com URL will exist ;)
#
csarven
aaronpk It doesn't matter. Hopefully something archived it already.
#
tantek
csarven doubtful - G+ URLs seem to not be archived/indexed by anyone other than Google
#
csarven
bot Blocked?
#
aaronpk
curl https://plus.google.com/114223893421375686319/posts/aqq9WWHdZd1 --> a bunch of <script> tags... so... doubtful
#
tantek
csarven: people used to think of the "road" as XML as well
#
tantek
and now maybe JSON or JSONLD?
#
tantek
there's always the overengineering crowd that imagines themselves as the road
#
csarven
MOre like the super-highway
#
tantek
when they're actually more like papers on how a superconducting maglev could be built someday
#
csarven
Couldn't capture that in the same photo ;)
eschnou, csarven, KartikPrabhu and TallTed joined the channel
#
tantek
apparently MySpace circa 2012-06-15 supported hCard, according to the "Inspect Element" screenshot in this video, which shows a quite readable class="vcard" on a MySpace profile/friends page: https://youtu.be/iApvUMgk5Mo?t=2m47s
#
tantek
KevinMarks: do you have an example from the "wordpress corpus" that uses rel="bookmark" rel="tag" rel="author" ?
#
tantek
KevinMarks: also rel="author" was never part of hAtom, but if you find examples of hAtom in the wild that seem to depend on it - please provide their URL(s) so we can make an real-world-based back-compat decision
#
tantek
!tell kylewm,KevinMarks does mf2py do backcompat parsing of hAtom's class=hentry?
#
Loqi
Ok, I'll tell them that when I see them next
#
Loqi
KevinMarks: tantek left you a message 1 minute ago: does mf2py do backcompat parsing of hAtom's class=hentry?
#
Loqi
kylewm: tantek left you a message 3 minutes ago: does mf2py do backcompat parsing of hAtom's class=hentry?
#
tantek
edited /h-entry (+491) "/* Parser Compatibility */ move rel=tag to parser compat (from proposed), and add rel=author to proposed per input from KevinMarks, awaiting citations to real w"
(view diff)
#
tantek
KevinMarks: since you brought it up again, I upgraded rel=tag from proposed backcompat to part of the backcompat spec ^^^ - obviously it may require more parsing code due to the special treatment of "last segment of the URL"
#
tantek
s/segment/path segment
#
Loqi
tantek meant to say: KevinMarks: since you brought it up again, I upgraded rel=tag from proposed backcompat to part of the backcompat spec ^^^ - obviously it may require more parsing code due to the special treatment of "last path segment of the URL"
#
tantek
kylewm: they all use all the rels?
#
kylewm
yeah, bookmark, tag, and author. we added hEntry scoped rel=bookmark backcompat parsing to mf2py
#
kylewm
@pwcc requested it for bridgy to support better original-post-discovery on wordpress blogs
#
tantek
upon inspection, looks like rel=author is on the same link as class="url fn n" inside a span with class="author vcard" - thus no need to look at rel=author (existing author vcard backcompat handles it)
#
tantek
KevinMarks - what's your preferred way to handle (and thus specify) "/?tag=boat" ?
#
tantek
(as the last segment of a rel-tag URL?)
#
tantek
since obviously the intent is "boat", yet existing rel=tag defn produces "?tag=boat" as the tag
#
tantek
KevinMarks - re: accomodate WordPress hentry markup, we may have to do some extra work to get the dt-published
#
tantek
as there is no "published" or "updated" classnames in e.g. twentyfourteen
#
tantek
but they do have <time class="entry-date" datetime="…">
#
tantek
I'm going to add that as a proposal to backcompat hentry parsing, and cite the example themes that kylewm provided
#
tantek
being conservative, and *only* looking for time[class|="entry-date"][datetime] in particular
#
tantek
time.entry-date[datetime] to be simpler
elf-pavlik joined the channel
#
tantek
verified that pattern back to twentyten
warehouse13 joined the channel
#
tantek
edited /h-entry (+354) "/* Parser Compatibility */ add notes to rel=author proposal, note examples so far don't need it, class="author vcard" enough for WP default themes"
(view diff)
#
tantek
edited /h-entry (+642) "/* Parser Compatibility */ proposed: time.entry-date[datetime] in the absence of "published", evidence, WP default themes 2011-2014."
(view diff)
#
tantek
KevinMarks, following your musing thought, there's a concrete proposal with citation for "entry-date" ^^^ for backcompat with WP themes 2011-2014.
#
tantek
I'll leave it to you and kylewm to vote with your code, implementation or lack there of :)
#
tantek
edited /h-entry (-63) "/* Proposed Additions */ move u-featured to a solid proposed addition, since there are 2+ indieweb sites with multiple posts using it in the wild"
(view diff)
ben_thatmustbeme, Left_Turn, KevinMarks_ and benborges joined the channel
benborges, voxpelli, KevinMarks_ and benward joined the channel
#
@bbirdiman
@fakebaldur @dauwhe @acutebit -- i woulda found it hard to believe you made such goofy code, but it makes sense the microformats people did.
(twitter.com/_/status/608375483366035456)
hober joined the channel
#
@bbirdiman
@kevinmarks @fakebaldur @dauwhe @acutebit -- i see microformats as the result of obsessive-compulsive need to explicitly label _everything._
(twitter.com/_/status/608379961179836419)
elf-pavlik, elux_, elux and ben_thatmustbeme joined the channel