#microformats 2019-08-25

2019-08-25 UTC
#
@tw2113
New post set sail: Adding my own pronouns as part of potential microformats2 spec https://trxtp.com/62
(twitter.com/_/status/1165417479738249216)
KartikPrabhu, [tantek], mauz555, [KevinMarks], [Rose], [Michael_Beckwit, [bdesham], [asuh] and [snarfed] joined the channel
#
[tantek]
[snarfed] just checked that jasonbriggs post and it has a u-url now to its permalink (as blog posts should). so the fix is to fix the h-entry in general. I think part of the answer here is to check any h-entry with the indiewebify h-entry validator and if it has errors, then blame/fix that first, before blaming anything else. Lots of things can fail when markup is bad
mauz555 joined the channel
#
[snarfed]
sure. all fine debugging advice. the real interesting part here is the original (failed) authoring experience, not the debugging
#
[tantek]
it's not a normal "authoring experience". it's a microblog post which should have been just fine, so somehow it was broken by alterations, that's what's so odd about it
#
[tantek]
point being if someone is a developer skilled enough to mess with something that's already working, the right answer is to set them up with regression testing so they don't introduce a regression. not an "authoring experience"
#
[snarfed]
wait, microblog meaning micro.blog? are you sure? if so than ok yes, i have no idea what happened there
#
[tantek]
yes, check out the styling. it's a self-hosted microblog
#
[tantek]
yes with the . 🙂
#
[snarfed]
ok. random. backing away slowly
#
aaronpk
🤔 micro.blog can't be self-hosted
#
[tantek]
well, curious. how did a microblog post get "broken"?
#
[tantek]
I mean, it can be domain mapped
#
[tantek]
or maybe in a directory
#
aaronpk
it can be domain mapped, i've never seen it done in a subfolder, but in either case, the owner does not have control over the HTML still afaik
#
aaronpk
ok that site is not hosted by micro.blog
#
[snarfed]
just to double check, you're assuming it's altered m.b based on the url path and some similar styles?
#
aaronpk
checked the IP address
#
aaronpk
I think it's just styled to look like micro.blog. it's not even a 100% accurate reproduction
#
[snarfed]
could be an html export. but yeah that seems more likely
#
KartikPrabhu
not sure if that is a POSSE
#
aaronpk
ok yeah the fact that on micro.blog they all have text like "Microblog posted at 12 Aug 2019 19:57:38" leads me to believe it's a homegrown site that micro.blog is importing from
#
[snarfed]
m.b easily syndicates feeds. mine does the same: https://micro.blog/snarfed
#
[snarfed]
yeah. so it may well have been an original failed mf2 authoring experience after all
#
[tantek]
not sure about "easily syndicates feeds" — we never did figure out the photo syndicating problem(s)
#
[snarfed]
it definitely easily syndicates feeds. that doesn't preclude occasional bugs 😎 happens to everything now and then
#
[snarfed]
witness bridgy Twitter replies regressing for the last week and Instagram broken entirely, sigh
[Michael_Beckwit, [jeremycherfas], IWSlackGateway1, gRegorLove, [dshanske], [Rose], [jgmac1106], GWG, [tantek], [bdesham], KartikPrabhu, [grantcodes] and [vendan] joined the channel; SebDiscord[m] left the channel
#
[vendan]
spruced up my microformats parser comparison tool some: https://mf2.andyleap.dev
#
[vendan]
tell it to parse a chunk of html, and it'll render them in pretty printed json in an accordian
#
[vendan]
php and python parsers for now, but should be relatively simple to add more 🙂
#
KartikPrabhu
[vendan] if the goal is to compare the parsers then the accordian is a bit annoying since you can't have multiple ones open to compare
#
KartikPrabhu
but otherwise looks neat
#
vendan
yeah, still working on that some, but part of the goal is to make is so you don't need to eyeball both to compare them
#
KartikPrabhu
so something like you pick a reference parser output and the others show the difference?
#
vendan
yeah, might do something like "calculate minimum set all the parsers agree on" and then diff against that
#
KartikPrabhu
sounds good
#
vendan
I'm also going to be pulling in the test suite, and having it's "expected output" be the comparison base for those
#
vendan
initial goal for this stuff is to serve as a tool for evaluating/refining https://github.com/microformats/tests
#
KartikPrabhu
:thumbsup:
#
[tantek]
vendan++
#
Loqi
vendan has 1 karma over the last year
#
[vendan]
got a select list filled with all the tests now, and if you click a test, it'll parse that test. Still need to get it to render the test html to the textarea
ingoogni, gRegorLove, [KevinMarks] and [vendan] joined the channel
#
[vendan]
and now it renders the test html to the text area, diffs the parser outputs against the test's expected output, and turns the parser's header red if there is a difference
#
[vendan]
sadly, it's triggering a lot on the php parser cause I think PHP's json output is a little... wonky
#
[vendan]
php serializes empty array as `[]`, even though it'd actually be used as an associative array, and thus `{....}` if there was anything in it
#
aaronpk
where are you seeing that? I thouoght the php parser fixed that ages ago
#
[vendan]
ehhh, might just be cause of something I'm doing?
#
[vendan]
use Mf2;
#
[vendan]
namespace MF2TestBench;
#
[vendan]
require 'vendor/autoload.php';
#
[vendan]
$html = file_get_contents('php://input');
#
[vendan]
echo json_encode($output);
#
[vendan]
$output = Mf2\parse($html, 'http://example.com/');
#
[vendan]
ugh, really? can't edit messages...
#
[vendan]
namespace MF2TestBench;
#
[vendan]
require 'vendor/autoload.php';
#
[vendan]
use Mf2;
#
[vendan]
$html = file_get_contents('php://input');
#
[vendan]
$output = Mf2\parse($html, 'http://example.com/');
#
[vendan]
echo json_encode($output);
#
[vendan]
using latest composer installed version as of 2 days ago or so
#
aaronpk
can't edit messages because edit doesn't mean anything in irc ;-)
#
Loqi
[microformats] php-mf2: php-mf2 is a pure, generic microformats-2 parser for PHP. It makes HTML as easy to consume as JSON.
#
[vendan]
looks like I need to turn jsonMode on
#
[vendan]
yeah, that
#
[vendan]
yeah, that cleans up the php parser for most of the tests
#
[vendan]
so far looks like there's some issues with whitespace stuff
#
[vendan]
based off of the parsing spec, the tests seem to be correct, and the parsers are being too generous about sanitizing whitespace
#
KartikPrabhu
welcome to the deep end of whitespace parsing :P
#
KartikPrabhu
here's your life-guard Zegnat
#
[vendan]
ehh, fortunately, the specs are pretty clear that it shouldn't be collapsed
#
KartikPrabhu
yes. and mostly it was found to not be very useful
#
KartikPrabhu
long story
#
KartikPrabhu
not sure where it is documented
#
[vendan]
(which also means it's in a bunch of the parsing rules, as `textContent`
#
KartikPrabhu
yes. which is also ambiguous enough
#
[vendan]
textContent isn't really that ambiguous, it's pretty strongly defined, it's just going to have a bunch of newlines and such in there
#
[vendan]
and also be missing a bunch of newlines
#
[vendan]
(if you are thinking of `<br>` and such)
#
KartikPrabhu
well, I don't understand the definition so...
#
KartikPrabhu
here is another bunch of tests for whitespace while we were deciding the "correct" behaviour https://pin13.net/mf2/whitespace.html
#
[vendan]
concat all text node descendents, in tree order
#
KartikPrabhu
yeah that turned out to be not useful/ideal in practice
#
[vendan]
not saying otherwise, just saying that `textContent` is non-ambiguous. It's just not what most people would expect from looking at the displayed value in a browser
#
[vendan]
there's a bunch of other things, like you can hide a node and it won't be visible in a browser, but textContent would have it
#
[vendan]
I don't personally have a preference either way, I just think it needs to be encoded into the parsing spec and tests set up for it
#
KartikPrabhu
[vendan]: yes. http://microformats.org/wiki/textcontent-parsing is a draft spec for it. not sure which parser is implementing it fully. Possibly phpmf2
#
KartikPrabhu
mf2py might also be doing some parts of it. I have forgotten how much
#
KartikPrabhu
the tests might not have any of this
[Michael_Beckwit joined the channel
#
[vendan]
yeah, doesn't match up with the tests
#
KartikPrabhu
[vendan]: maybe as a first pass ignore the whitespace diffs?
#
KartikPrabhu
since the rest should be fairly speced out
#
[vendan]
mostly everything else is matching
#
[vendan]
like, so far I'm seeing whitespace issues and the php parser doesn't understand MFv1 geo
#
KartikPrabhu
aah so the tests probably do not test for some new stuff. like including the img alt and all that
#
[vendan]
`Latest commit 92b5893 on Sep 12, 2018`
#
[vendan]
so yeah, probably a little out of date
#
KartikPrabhu
oh hmm the mf2py you are using also does not do alt-text parsing, weird
#
KartikPrabhu
oh that is still behind a flag in mf2py
#
[vendan]
also looks like there might be an issue with mf2py not doing relative path resolving for photos
#
KartikPrabhu
[vendan]: why is php showing fail on the microformats-v1/geo/abbrpattern test?
#
KartikPrabhu
it "looks" the same except for ordering
#
[vendan]
ehhh, all the stuff inside the items array should be red? which means it's "deleted" by the diff of expected -> php output
#
[vendan]
php parser doesn't seem to handle the v1 geo stuff at all
#
KartikPrabhu
oh!! I totally mis-read the red part as being the "wrong part"
#
KartikPrabhu
but "red = missing"
#
[vendan]
yeah, it's like a red/green diff of a commit on github or something
#
[vendan]
wide open to suggestions for rendering that kind of thing better
#
[vendan]
also, still building out all the diffing stuff some
#
KartikPrabhu
also mf2py is not resolving the relative path for photos because those test don't have a base url
#
[vendan]
I specify a base url for parsing of `http://example.com` for everything
#
[vendan]
and it works for `<img class="h-card" alt="Jane Doe" src="jane.jpeg"/>`
#
[vendan]
just not for the object `data="jane.jpeg"` ones
#
KartikPrabhu
aah so mf2py might not be doing it for those hmmm
#
KartikPrabhu
that is a bug!
#
[vendan]
woot! win #1 for better test suite stuff!
#
Loqi
😊
#
[vendan]
next steps: clean up diffs, implement "base output calculation" for diffs of non-test suite parses, add more parsers, and start cleaning up and submitting changes for the test suite
#
KartikPrabhu
yup confirmed. you should file that bug at https://github.com/microformats/mf2py with the example
#
Loqi
vendan has 2 karma over the last year
#
KartikPrabhu
there also seem to be conflicting datetime formats