#microformats 2018-03-22

2018-03-22 UTC
[tantek], sebsel, webchat52, tantek, webchat249, webchat140, j12t and [unoabraham] joined the channel; Myth0s left the channel
#
Loqi
[gRegorLove] #161 Add failing tests and fixes for #158, #160
#
aaronpk
gRegorLove++
#
Loqi
gregorlove has 27 karma in this channel (227 overall)
KartikPrabhu joined the channel
#
aaronpk
way too late for me to look at this right now but I will review tomorrow!
tantek, [unoabraham], edsu_, nitot, voxpelli, echarlie and [pfefferle] joined the channel
#
Zegnat
Hmm, I might just have time to file a PR for the rel parsing changes today, gRegorLove! :D
#
Zegnat
About time I get my name on the contributors board for the PHP code
#
Zegnat
All the rel parsing seems to be inside a single method, so it is a good way to get my toes wet with the php-mf2 project
#
Zegnat
Oh, gRegorLove, you didn’t need to escape the hyphens in the regex for the new classes.
#
Zegnat
Hyphens only have a special meaning within character classes ([]). Though it doesn’t hurt to escape them outside it is unneccessary.
#
Zegnat
(The regex looked like the one I wrote for KartikPrabhu, except for the ^ $ and the hyphen escapes, which is why I spotted them in the first place.)
tantek, [kevinmarks] and nitot_ joined the channel
#
Zegnat
!tell gRegorLove,aaronpk reviews welcomed: https://github.com/indieweb/php-mf2/pull/162
#
Loqi
Ok, I'll tell them that when I see them next
#
Loqi
[Zegnat] #162 Improve rel parsing
nitot joined the channel
#
Zegnat
When I read things like this, I feel like tantek missed his calling as a rapper:
#
Zegnat
if url is not in the array of the key rel-value in the rels hash then add url to the array
#
Zegnat
Alright, just had to push one more fix. I think it now matches Python’s output for all the things and adheres to the spec better.
#
Zegnat
Hmm, I can’t request reviewers?
[kevinmarks] joined the channel
#
Zegnat
“ "text": the text content of the element if any ” - does “any” include empty strings or not?
#
Loqi
[Tantek Çelik] microformats2 parsing specification
#
Zegnat
sknebel, maybe an idea? ^^^
#
sknebel
mh. I don't see the value in including empty strings here
#
Zegnat
Neither do I, but e.g. empty title-attributes do get included as there is no such check on there
#
sknebel
so I'd probably not allow empty strings here
#
Zegnat
PHP and Python both support empty string for title. Python accepts an empty string for text, but PHP does not.
#
Zegnat
So expected behaviour is “the textContent of the element unless that is an empty string”?
#
sknebel
I'd think so. a <link> doesn't have textContent anyways, so there can be cases where there is no content right?
#
Zegnat
I think textContent is an empty string on link elements
#
Zegnat
Pretty sure of that, actually. I think I made them clarify that in the DOM spec :P
#
Zegnat
Double checked. And yes: textContent of a link element will be an empty string. So discarding empty strings there sounds right.
#
ahliqiu
edited /get-started () "(-3477)"
(view diff)
barpthewire joined the channel
#
Loqi
[Zegnat] #32 Clarify attribute properties added to objects in rel-urls.
#
sknebel
Zegnat: sorry for bugging you about it, don't have my wiki login handy: spam to kill ^^^
#
zegnat
edited /get-started (+3477) "Undo revision 66729 by [[Special:Contributions/Ahliqiu|Ahliqiu]] ([[User talk:Ahliqiu|Talk]]) - spam"
(view diff)
#
Zegnat
Wow. Only the word “SHUTDOWN” is enough to pass the captcha now!
#
sknebel
I don't think it checks at all for those
#
Zegnat
I couldn’t submit with an empty field
#
Zegnat
I didn’t try a single character only
#
Zegnat
GitHub confirms it, this is my first PR on the mf2 PHP parser. I almost can’t believe that. Guess my contributions so far have just been bickering about the spec itself.
[kevinmarks] and Garbee joined the channel
#
KartikPrabhu
Zegnat: I think textContent should be specified more clearly in almost all places it occurs
#
Zegnat
Yes, although for the parsing of values on hyperlinks, I think it is fine to at least specify “text content” as being the true textContent property of the element
#
KartikPrabhu
sure but whether empty strings are allowed or not (and before/after leading space stripping) should be mentioned
#
Zegnat
Ah, I didn’t bring up space stripping on that issue I believe
#
Zegnat
Feel free to comment with that!
#
KartikPrabhu
also not sure if the "remove <style> and <script>" and "replace img" is relevant
#
KartikPrabhu
looking for issue
#
Loqi
[Zegnat] #32 Clarify attribute properties added to objects in rel-urls.
[manton] joined the channel
#
Zegnat
Since we are making some good process on the rel parsing now, it would be nice to get that clarified (and shipped) before the next stable versions
#
Loqi
[kartikprabhu] `text` should also specify the following 1. Is empty string checked before/after stripping leading and trailing spaces i.e. is `text: " "` considered valid? 2. Should child `<style>` and `<script>` elements be dropped before? 3. Should child `...
#
Zegnat
Thanks KartikPrabhu! Updated my proposal :)
#
KartikPrabhu
Zegnat: your alt and src rule is missing <img>
#
KartikPrabhu
it might be better in the spec to specify textContent in one place and refer it from others
#
Zegnat
Huh, that must have been a weird GitHub & HTML bug. I coped this from #17. Will fix in a second
#
Zegnat
That’s probably true
#
Zegnat
I think bringing textContent out into its own chapter is something that must be done for #17, so I am willing to wait with that if we can get rel parsing clarified sooner
#
Zegnat
Ha, when I click edit on my comment, the <img> shows! Well done GitHub Markdown. I’ll go put ` around it
#
Zegnat
(updated)
[kevinmarks] joined the channel
#
Zegnat
Who has commit access to microformats/tests ? [kevinmarks], you are in the org right? Can you check?
#
Zegnat
would also like to submit a motion to grant more people access
[eddie], [cb], [tantek], KartikPrabhu, [colinwalker], j12t, nitot and tantek joined the channel
#
Zegnat
Working on the aaronpk-plaintext-whitespace-variant again :D
#
aaronpk
the what now
#
Loqi
aaronpk: Zegnat left you a message 8 hours, 42 minutes ago: reviews welcomed: https://github.com/indieweb/php-mf2/pull/162
#
aaronpk
ah nice
#
Zegnat
I have that testing JS implementation. But actually writing it down into a implementable (and comprehendable) spec, rather than just pointing at that code, is a different matter
#
Zegnat
Hmm, I am still not 100% clear on the process. Should I implement https://github.com/microformats/microformats2-parsing/issues/32 in my rel patch for php-mf2 so the spec can be updated to reflect the parser, or do I wait for more reactions on the parsing spec issue?
#
Loqi
[Zegnat] #32 Clarify attribute properties added to objects in rel-urls.
nitot, Kyle-K and KartikPrabhu joined the channel
#
KartikPrabhu
Zegnat: I have +1 ed it. Maybe get gRegorLove's thoughts too and we will satisfy "change control"
#
Zegnat
Yep, hoping gRegorLove will find the time to review my PR against php-mf2 anyway :)
#
KartikPrabhu
Zegnat: maybe add a test example with expected output that verifies your change. I can put this change in experimental mf2py too
#
Zegnat
I am not sure there are any proper tests for current expected output. So would need multiple tests to be sure to cover all cases.
#
Zegnat
Currently (still) busy writing up and testing a textContent algo, so those tests will have to wait a minute
#
KartikPrabhu
sure no worries
hurdygurd, tantek and nitot joined the channel
#
tantek
edited /Special:Log/block () "blocked [[User:Ahliqiu]] with an expiry time of infinite (account creation disabled): Spamming links to external sites: vandalism"
(view diff)
#
tantek
edited /Special:Log/block () "blocked [[User:Dagototo]] with an expiry time of infinite (account creation disabled): Spamming links to external sites"
(view diff)
#
tantek
edited /Special:Log/block () "blocked [[User:Parlaybola]] with an expiry time of infinite (account creation disabled): Spamming links to external sites"
(view diff)
#
gRegorLove
Zegnat: Re #32, do we have any real world examples of markup like `<a href="#a" rel="a" hreflang=""></a>
#
Loqi
gRegorLove: Zegnat left you a message 11 hours, 49 minutes ago: reviews welcomed: https://github.com/indieweb/php-mf2/pull/162
#
gRegorLove
<a href="#a" rel="a" hreflang="en"></a>`?
#
gRegorLove
I'm mostly +1 on that issue, but not sure about the "and not an empty string" part
#
Zegnat
No. That’s a synthetic example of why we would not want to keep empty strings when the document may later provide an actual value
[jeremycherfas] joined the channel
#
Zegnat
Actually, it looks like Python may already be overwriting empty values, so could be we already have parser implementation on that
#
gRegorLove
I'll defer to others on that, but my inclination is to keep it if it's authored.
#
Zegnat
But both values are authored.
#
Zegnat
Basically I am saying “keep the first non-empty value”, rather than “keep the first value”.
#
Zegnat
Please do comment with hesitations though! All is important :)
#
gRegorLove
Understood. But it's discarding the first (empty) authored one. It "feels" like the parser's trying to fix a publisher mistake.
chrisaldrich joined the channel
#
KartikPrabhu
Zegnat: which mf2py are you looking at for that?
#
Zegnat
I always look at your dev version these days KartikPrabhu
#
KartikPrabhu
hmm it really shouldn't discard the empty values!
#
tantek
indeed, consider the author perspective before any theoretical / academic / purity perspectives
#
tantek
if an author expliictly provides an empty attribute, they went to some work to do so, therefore there is likely some intent there
#
Zegnat
I was considering an empty title for a URL, followed in the same document by a specified title for the same URL, to be likely not worth keeping. Instead keeping the authored value.
#
tantek
every title attribute is an authored value
#
KartikPrabhu
yeah it seems mf2py is discarding empty values. I thought I fixed that
#
Zegnat
Yes, but we keep only one of them. Even though all of the ones authored are valid ones.
#
Zegnat
So if all of the authored ones are valid, it made sense to me to atleast keep the first non-empty one as the one we (already arbitrarily) pick
#
gRegorLove
Is a better question: should these rel attributes store multiple values? array?
#
tantek
example of keeping only one?
#
gRegorLove
Eh, backing up on that a bit -- would want examples before asking my question.
#
Zegnat
For that we will only set the `text` property for `#` to "Author". And skip "Permalink too!".
#
Zegnat
<a href="#" rel="author">Author</a><a href="#" rel="bookmark">Permalink too!</a>
#
Zegnat
We arbitrarily decide that the first one must be the one the author *really* wanted.
#
KartikPrabhu
ha! it seems mf2py always gets the last attribute for hreflang and others which is definitely a bug
#
tantek
Zegnat: no, not arbitrary, rather, it's a simple and predictable model
#
Zegnat
Yes, always last value is definitely a bug, KartikPrabhu :)
#
Zegnat
Where `text` will be "" and not "Permalink" if we keep empty values.
#
Zegnat
<link href="#" rel="author"><a href="#" rel="bookmark">Permalink</a>
#
Zegnat
I would say first non-empty is just as simple and predictable. Especiall for e.g.:
#
KartikPrabhu
it should be first value current spec right?
#
Zegnat
Once you set the key, you never overwrite it
#
Zegnat
You only add additional keys if they are defined by properties later in the document
#
KartikPrabhu
right, that check is missing from mf2py
#
gRegorLove
php-mf2 doesn't get rel-urls.rels "author": http://pin13.net/mf2/?id=20180322213622296 Does that work in your latest PR, Zegnat?
#
tantek
Zegnat, except then you remove the ability for the author to force a blank value explicitly
#
tantek
which we currently have
#
tantek
I think you're overthinking it with a theoretical example
#
tantek
and IMO that's enough to reject that line of thinking
#
tantek
that is, a theoretical example is insufficient to justify a change
#
tantek
whereas I've just given you a theoretical *feature* that exists in the status quo. you provide no reason to remove it.
#
tantek
you cannot make assumptions like "Clearly the empty string adds no information about the URL"
#
tantek
the author's explicit authoring was considered, without any opinion about whether it adds information or not
#
Zegnat
Except it does not add any information is this specific case. An empty string for `media` means nothing at all. While if the author specifies a correct media value later in the page we should probably tell our consumer about that useful information.
#
Loqi
[kartikprabhu] #65 rel-urls use last value instead of first one
#
Zegnat
But please comment on the issue so the spec change proposal can be adjusted :)
#
Zegnat
This isn’t arbitrary text like a p- property, we are talking about metadata on URLs. Empty meta data has no meaning. If the author doesn’t have the correct meta data to provide, just leaving the HTML attributes off is perfectly fine.
#
Zegnat
But I am happy to revise the proposal again if I am the only one who thinks this
#
KartikPrabhu
actually from a debugging HTML point of view I would think an empty value would be more of a red flag
#
Zegnat
And, yes, gRegorLove. I do believe my patch will get `author` there
#
tantek
again you cannot assume this: "Empty meta data has no meaning." it's like saying zero has no meaning.
#
tantek
Zegnat "if I am the only one who thinks this" does not matter how many people agree on a theoretical
#
KartikPrabhu
no, now I am on tantek's side
#
tantek
you should challenge yourself to propose changes for empirical reasons
#
KartikPrabhu
i think the empty check should be there
#
tantek
and to reject changes for theoretical reasons
#
KartikPrabhu
should not*
#
Zegnat
The resource at URL is written in language "" and of type "".
#
Zegnat
I can say that because we are talking about very specific metadata here. What exactly does it mean when you say:
#
Zegnat
When the HTML provides us with a specific language and type later on the page, I would want to know about these. Because they only mean something when they are defined.
#
Zegnat
That’s the point I am trying to make. We are talking about very specific metadata about URLs with specific relationships.
#
KartikPrabhu
Zegnat: mf2 consumers are free to neglect that as meaningless but I think the mf2 parsed output should be closer to authored HTML
#
Zegnat
Then I propose it returns all the things it finds, if you want it to reflect everything authored in the HTML.
#
KartikPrabhu
as in not just the first value?
#
KartikPrabhu
that does seem fine to me
#
Zegnat
Yeah, return an array, like we usually do with multiple values
#
Zegnat
We aren’t keeping only the first rel-value we find for a URL either. We compound that from all the different elements as well.
#
KartikPrabhu
this time doc ordered array
#
Zegnat
source order definitely makes sense for title and text. I am wondering if hreflang, media, and type are unordered sets like rel or not.
#
KartikPrabhu
what was the consuming side use of the rel-urls properties again? They were added later
#
Zegnat
No clue. I only ever use the rels property. Do not think I have ever consumed rel-urls. I was just working on the PHP parser for it and wanted to iron out some things I thought were inconsistent.
nitot joined the channel
#
KartikPrabhu
!tell aaronpk snarfed: do you guys consume the rel-urls from the parsed mf2?
#
Loqi
Ok, I'll tell them that when I see them next
nitot joined the channel
#
Loqi
microformats2 parsing brainstorming
#
KartikPrabhu
so [kevinmarks] would be the one to ask
#
Zegnat
“no need for array for "name"/textContent - since there is always only one at most” - I don’t understand this argument
#
Zegnat
The spec incorporated adding more properties from links encountered later in the document. So it was already known that there could be multiple values.
#
Zegnat
Should add all of this to the issue later...
[kevinmarks] joined the channel
#
Zegnat
Pfff. Done. I can finally go to bed:
#
Zegnat
https://wiki.zegnat.net/media/textparsing.html - describes and implements an algorithm that extracts a plain text value from an element. Removes STYLE/SCRIPT and replaces IMG per customary mf2 rules, but also adds \n for P and BR elements per https://pin13.net/mf2/whitespace.html
#
Zegnat
!tell aaronpk Sneakpeak of text content extraction that (should) match all of your whitespace examples: https://wiki.zegnat.net/media/textparsing.html
#
Loqi
Ok, I'll tell them that when I see them next
#
[kevinmarks]
The rel-urls use case was xfn where you have <a href="http://tantek.com" rel="friend colleague met" >t</a> and you don't want to ha even to walk the rels and collate urls yourself
#
Loqi
Tantek Çelik
#
KartikPrabhu
[kevinmarks]: yes. but why only get the first value? why not the array of all of them
#
KartikPrabhu
sorry first value of stuff like "text" and "media"
#
gRegorLove
Zegnat++ for textparsing algorithm!
#
Loqi
zegnat has 15 karma in this channel (187 overall)
#
[kevinmarks]
I think empirically we didn't find multiple ones
#
KartikPrabhu
it is still a departure from mf2 conventions where almost everything is an array
#
KartikPrabhu
or dictionary
#
KartikPrabhu
is surprised he never noticed that
[cb] and chrisaldrich joined the channel
#
@ChrisAldrich
@kaushalmodi Also, on your site I'm seeing rel=me instead of the rel="me" with the proper quotes around me. See http://microformats.org/wiki/rel-me for examples.
(twitter.com/_/status/976954987778539522)
kaushalmodi and KartikPrabhu joined the channel
#
@ChrisAldrich
@huby plain old semantic HTML with microformats in combination with the webmention protocol allow one to post "likes" to one's own website and send them to others. Here's a simple example: http://boffosocko.com/2018/01/11/1-million-webmentions/
(twitter.com/_/status/976965497429352450)
[tantek] joined the channel
#
[tantek]
KartikPrabhu the parsed rels were added afterwards, as kevinmarks notes, for specific use cases and real world examples
#
[tantek]
So I’m going to oppose all theoretical proposed changes to anything real related, especially if reasoned from a “consistencies in the [parser] code” perspective. That’s very bad pluming-centric reasoning
#
[tantek]
rel* related
#
[tantek]
Seriously if you don’t have a real world example that you need to consume and the current parser spec is failing you, please stop proposing such changes. It’s a waste of time to pursue theoretical purity
#
[tantek]
That’s math/philosophy, not science. And here we are doing science
#
[tantek]
(As any spec / code that deals with actual human published data / content should )
#
KartikPrabhu
ok I am not very invested in this anyway. not my hill