#microformats 2018-02-18

2018-02-18 UTC
tantek, [cleverdevil] and KartikPrabhu joined the channel
#
KartikPrabhu
updated mf2py for the link[title] and video[poster] parsing for p-* and u-*
tantek joined the channel
#
tantek
KartikPrabhu++
#
Loqi
kartikprabhu has 5 karma in this channel (168 overall)
#
KartikPrabhu
now we need to figure out how to incorporate these changes into the actual mf2py code base
#
tantek
still think push to upstream repo is the way
#
tantek
and we can attempt to contact tommorris e.g. on email
#
tantek
to move the repo
#
KartikPrabhu
I did push the new p-* and u-* rules
#
KartikPrabhu
for the implied name parsing I am waiting to see if it makes into the spec
#
tantek
that's reasonable. having at least one parser demonstrate implementability is a condition of making the spec change
#
tantek
would be nice to hear from snarfed about if it works for his use-case
[mrkrndvs] and tantek joined the channel
#
KartikPrabhu
!tell [kevinmarks]: would be awesome if you could test https://github.com/kartikprabhu/mf2py/tree/implied-name-fix on the examples in https://github.com/microformats/microformats2-parsing/issues/6#issuecomment-366473390 for some reason my local install works correctly but not the installation on my server!
#
KartikPrabhu
!tell [kevinmarks]: would be awesome if you could test https://github.com/kartikprabhu/mf2py/tree/implied-name-fix on the examples in https://github.com/microformats/microformats2-parsing/issues/6#issuecomment-366473390 for some reason my local install works correctly but not the installation on my server!
#
Loqi
Ok, I'll tell them that when I see them next
#
KartikPrabhu
and anyone else too ^
[eddie], [kevinmarks], [tantek], [mrkrndvs], nitot, [miklb], iwaim___, [jeremycherfas], 5EXAAPICA, barpthewire, tantek, [cleverdevil] and [squorch] joined the channel
#
KartikPrabhu
[kevinmarks]: would appreciate your help/insight on this https://github.com/kartikprabhu/mf2py/issues/58#issuecomment-366533803
#
Loqi
[kartikprabhu] The output depends on which html parser is being used internally. cc @kevinmarks In the example below `html5lib` gives a `name` and `url` for the `h-enty` but `html.parser` does not (which is the intended output). # Example ## HTML ``` html...
#
Zegnat
html5lib doesn’t even give a correct implied name there, if one goes by the old rules, KartikPrabhu.
#
Zegnat
Also, what’s up with that double url property?
#
KartikPrabhu
Zegnat: yeah not sure what is going on
#
Zegnat
Not to derail you, sorry :P
#
KartikPrabhu
sorry updated comment. the double url is only in html5lib not in html.parser
#
KartikPrabhu
html5lib is supposed to be better but has some funny quirks
#
KartikPrabhu
html5lib works with the old rules but seems to be doing funny things with the new ones!
#
KartikPrabhu
html.parser works with both rules
#
KartikPrabhu
so I am very confused since the mf2py code is the same but the parsers are different
#
Zegnat
Do they both output DOM with a common API, or how do you access their output?
#
Zegnat
is desperately fighting the urge to learn Python just to be part of the conversation
#
[kevinmarks]
Beautiful soup is on top of them both iirc.
KartikPrabhu joined the channel
#
sknebel
mf2ßy only tells BS which one to use
#
KartikPrabhu
Zegnat: this is happening because the <a>-tag in your 3rd example is not closed!
#
KartikPrabhu
and html5lib is doing some crazy stuff with it!
#
Zegnat
Oh, I’ll double check that in a second KartikPrabhu!
#
sknebel
yeah, that seems to be it
#
[kevinmarks]
S/crazy/spec compliant/
#
sknebel
the other examples seem like correct html
#
Zegnat
It is really interesting. Per spec, I thought it was supposed to close that A prior to the </p>
#
Zegnat
Either way, my fault, sorry
#
Loqi
[kartikprabhu] @kevinmarks I am going to disagree in this case. Taking the same example # HTML ``` html <article class="h-entry"> <div class="u-like-of h-cite"> <p>I really like <a class="p-name u-url" href="http://microformats.org/">Microformats</p> ...
#
KartikPrabhu
I have no idea why html5libis supposed to be the "correct" spec compliant one
#
KartikPrabhu
no browser does it I think
#
KartikPrabhu
Zegnat: yes that is what I thought too
#
sknebel
KartikPrabhu: try it in a browser
#
sknebel
Chrome shows me the exact same DOM
#
Zegnat
I have fixed the HTML in the original microformats issue
#
KartikPrabhu
sknebel: with the repeated <a> ?
#
sknebel
I have to correct myself, Chrome repeats it 3 times
#
Zegnat
4 A elements total in Firefox
#
Zegnat
Interesssssting. Not what I expected.
#
KartikPrabhu
yeah FF has 4 elements <a>
#
Zegnat
Might be spec compliant though, in which case html5lib is completely right in what it did
#
[kevinmarks]
I'm not sure why the spec would expect that, but html5lib is very compatible with what browsers do.
#
KartikPrabhu
[kevinmarks]: here it depends on which browser since CHrome and FF do different things
#
sknebel
nah, I didn't exand one element - it's 4 in chrome too
#
KartikPrabhu
still differnet from the html5lib output
#
sknebel
I guess this happens because it tries to wrap the <a> around elements that can't be in an <a>, so it gets split into <a>s around the contents of these elements?
#
KartikPrabhu
<shrug> who knows!
#
KartikPrabhu
so I guess html5lib is the correct one
#
KartikPrabhu
or "more" correct
#
sknebel
or at least not "more wrong"...
#
KartikPrabhu
I wonder how other parsers handle this one
#
KartikPrabhu
mf2 parsers
#
Zegnat
Completely dependent on the error handling of the HTML parser they use. I don’t think any parser does the HTML parsing themselves
#
KartikPrabhu
Zegnat: yes but it would be good to see the differences since the mf2 parsed output depends on that
#
KartikPrabhu
Zegnat: with the corrected example mf2py with html5ib also works
[chrisaldrich] joined the channel
#
Zegnat
I am not sure how it helps anything to know how faulty HTML is being “corrected” by different parsers though. Even if it makes the final mf2 wrong, it will always be an upstream error :/
#
[kevinmarks]
I expect that the js one that uses browser dom would match. Php will depend on lxml doing its own thing.
#
[kevinmarks]
I am wondering if we should force html5lib in mf2py
#
[kevinmarks]
It is a bit slower
#
KartikPrabhu
[kevinmarks]: could you test that branch with lxml?
#
KartikPrabhu
I am not sure how to install a C lib in virtual env
#
sknebel
BeautifulSoup picks the best parser it has by default, so it'll use html5lib if available
#
KartikPrabhu
sknebel: mf2py has the option to specify a parser
#
sknebel
that too. but if you leave that empty, BS picks
#
KartikPrabhu
which it basically passes on the BS
#
sknebel
hm, given that html5lib is python it should be installable everywhere.
#
sknebel
I think snarfed used other parsers somewhere due to performance?
#
sknebel
I guess a step inbetween would be to explicitly make html5lib the default and require the user to override if they want something else?
#
KartikPrabhu
sknebel: yeah that seems reasonable. html5lib default unless user specified. if html5lib is not installed defer to BS defaults
#
sknebel
that's basically what's currently there, since BS will use html5lib by default if it is there
#
KartikPrabhu
no I think it goes for lxml first since that is faster
#
KartikPrabhu
is finding docs for that^
#
sknebel
oh, right
#
sknebel
it's not finding lxml here
#
KartikPrabhu
lxml > html5lib > html.parser
#
sknebel
ah, used the wrong python version. sorry for the confusion
#
KartikPrabhu
now check lxml
#
Loqi
[kartikprabhu] @kevinmarks ~~I am going to disagree in this case.~~ Taking the same example # HTML ``` html <article class="h-entry"> <div class="u-like-of h-cite"> <p>I really like <a class="p-name u-url" href="http://microformats.org/">Microformats<...
#
sknebel
KartikPrabhu++
#
Loqi
kartikprabhu has 6 karma in this channel (169 overall)
#
Loqi
[kartikprabhu] #59 Use html5lib by default
[eddie], [cleverdevil] and [kevinmarks] joined the channel
#
KartikPrabhu
ok whole new branch of mf2py with default html5lib and handling of redirect URLS along wth implied-name-fixes https://github.com/kartikprabhu/mf2py/tree/parsing-into-bs4
#
KartikPrabhu
enough mf2py for this weekend (for now)
#
Zegnat
Woop for active parser development!
#
Zegnat
KartikPrabhu++
#
Loqi
kartikprabhu has 7 karma in this channel (170 overall)
GWG, [pfefferle], [miklb], KartikPrabhu and [cleverdevil] joined the channel