#tantekis just now catching up on morning logs he missed.
snarfed and tilgovi joined the channel
#tantekJeena: re: FB "something a friend of mine commented showed up in my stream, I wanted to find it, but I can't on his timeline it doesn't show up and in my stream it's gone too" - check https://www.facebook.com/notifications
#KevinMarkswell, like pinterest did at the time when instagram didn't have a web view
#gRegorLoveKevinMarks: How can I get a person with pestagram?
snarfed, wolftune, JasonO, j12t and tantek joined the channel
#KartikPrabhu!tell snarfed: I see why hfeed2atom skipped some posts of yours. The first post for instance gives the mf2 name property as ["", ""]. I was assuming that the name always ends up as a non-blank string due to implicit name parsing. I should correct that
#ben_thatmustbemeoops, almost at my train stop. surprised it held out this long
acegiak, j12t and fourtonfish joined the channel
#petermolnarI have a pretty nasty question: if I pull in all what's in a h-entry for a webmention, I could pull in harmful JavaScript as well, right? Had anyone written any preventions for this?
#rhiaropetermolnar: I'm gonna convert incoming webmention content into markdown to store... which hopefully resolves some of that
#petermolnarit won't as markdown can have html in it; unless you write your own parser, filtering out malicious code
#rhiaroit isn't one-to-one both ways is it? When I convert stuff to markdown it drops a lot of stuff
#rhiaroWell, I haven't really got far with this yet, I'll report back..
#petermolnarthat depends on the parser to drop or include as html those that it can't parse
#cweiskei remember that there are XSS tests for webmention
#voxpelliI myself simply ignore HTML as there's other attack vectors in there as well – like an absolute positioned Iframe that takes over your entire site with a phishing one
#voxpelliWhitelisting would be the only way to go I think
interactivist joined the channel
#ben_thatmustbemethis is why most people opt not to use the html from an h-entry and use only content which should strip out tags like "<script>"
#voxpelliOne needs to ensure to filter rel-attributes and classes as well to avoid microformat injections that could in worst case hijack ones identity
#voxpelliAnd style attributes to prohibit the HTML to break ones site
#Loqiben_thatmustbeme meant to say: this is why most people opt not to use the html from an h-entry and use only content which should strip out tags like "<script>" and "<style>"
nxd4n_ and nxd4n joined the channel
#Loqislack/snarfed: stripping *all* html may be a bit too far. i often put links in relies that are important for context
KartikPrabhu joined the channel
#Loqislack/snarfed: but a small whitelist generally makes sense. WordPress's and bleach's are both good afaik
#LoqiThe Vouch protocol is an anti-spam extension to Webmention. Webmention with Vouch depends on understanding Webmention https://indiewebcamp.com/Vouch
#LanceyWorkuse vouching to determine whether or not to strip html?
#ben_thatmustbemeyou could, but if you use vouch you have a reasonable spot to start from
#cweiskethe owner of a white-listed domain dies, the domain is squatted by a malicious person and boom
#KartikPrabhuso I haven't had any trouble with next lines in webmentions from notes
#voxpelliThere's no requirement to have a \n after a <br> though – and any \n will anyhow be treated as a regular space in the browser so one can't assume that a \n in the HTML means a line break :P
#voxpelliKartikPrabhu: the only kind-of-spec for extracting text from HTML actually requires that you apply the CSS, because of things like that, and applying CSS isn't really what one want in a server side parser :/
#KartikPrabhuno. not on server side but if you want to present comments with white-space intact, that is how to do it. HTML should not dictate presentation anyway
#voxpelliben_thatmustbeme: You have to handle the fact that all block level elements will be followed by a line break as well – youay even need to handle <hr> if you want to be bullet proof :P
#Loqislack/snarfed: hence HTML influencing presentation, if not dictating it
#ben_thatmustbemeyou can remove the new line from block level elements
#voxpelliKartikPrabhu: glennjones made a note in the linked to issue about his approach which handles most cases (although currently not <br> actually)
#Loqislack/snarfed: if we say it's all css and not html, then it's even harder to convert to text with formatting, not easier
#Loqislack/snarfed: and would make this even harder :P
#ben_thatmustbemethen you just sanitize the html, remove JS, and scope the css
#voxpelliben_thatmustbeme: problem with including CSS is that then a "text-transform: lowercase" will also affect the parsed data :P (Currently the case wen using .innerText in Chrome)
#Loqislack/snarfed: in practice, all of the edge cases I've seen in bridgy publish (dozens!) were html formatting. not a single one was css, and not a single user requested it
#Loqisnarfed: KartikPrabhu left you a message 9 hours, 47 minutes ago: I see why hfeed2atom skipped some posts of yours. The first post for instance gives the mf2 name property as ["", ""]. I was assuming that the name always ends up as a non-blank string due to implicit name parsing. I should correct that http://indiewebcamp.com/irc/2015-07-27/line/1438060499242
#aaronpkpetermolnar: I only use the plaintext content from the parser, so there's no HTML injection attacks possible
#KevinMarksyes, still debating whitespace collapsing
#aaronpkwow that's a lot of chatter about html sanitization
#voxpelliKartikPrabhu: I think it's desireabl from the mf2 spec perspective that the plain text presentation is expectable and consistent across clients
#petermolnaraaronpk I was just wondering if we already have a solution specifically for webmentions but soon realized that this is basic HTML trouble
#KartikPrabhudoes HTML spec discuss anything about "proper" text representation of tags?
#voxpelliKartikPrabhu: stripping disregards the actual formatting of the HTML and thus line breaks and spaces that's not rendered on the web page itself might get introduced or lost
#KartikPrabhutantek: I don't know. people have been saying that <br> should be next line or something, but I think it is fine to be stripped
#voxpelliKartikPrabhu: closest I've heard of is the implementation of the ".innerText" – but it has met resistance amongst some browser vendors and thus never been standardized
#voxpelliKartikPrabhu: the one in the mf2 spec is fine unless most clients starts to do their own additional processing, then it would be better to agree on a common line there, which is what eg. is up for discussion around <br>
#voxpelliKartikPrabhu: How else can one ensure that ones content will be correctly interpreted or that one will correctly interpret others content? If there's a common practice it should be documented?
#Loqislack/snarfed: voxpelli++ eg replying to known makes me sad since i know links will be lost
#KartikPrabhuvoxpelli: i don't see how that is an issue. I have a <br/> tag. consume it as you will. On my site I can set br { margin: 5000px } who cares
#aaronpki treat replies as they are treated on twitter, you can't embed markup
#voxpelliKartikPrabhu: if you have a line break in your text then I want to be able to understand that to the best of my abilities so that I can give your content as respectful of a presentation as possible of course :)
#KartikPrabhuthen use the HTML not the content.value
#voxpelliKartikPrabhu: one doesn't exclude the other :) I want to be able to present the text only version of your content as accurately as possible as well
#KartikPrabhuwell then you are going to lose fidelity
#voxpelliAnd as a consumer I would want the mf2-client library to do all that hard work for me so that I know that what I get out from that is ready to be presented in a certain way out of the box
#KartikPrabhuseems like asking a lot from a mf2 parser, instead of using a reasonable HTML sanitiser
#Loqislack/snarfed: bestpractices++ for this, since writing your own code for your indieweb site is important, but is definitely the exception, not the rule, and will only get more so
#voxpelliIf that certain way is "whitespace: preserve" then I want that to be expected of me and to eg have Indiewebify.me alert me if I don't do that
#voxpelliSo many protocols in the Indieweb now that it's hard to know the ins and outs of them all – WebMention, Micropub, mf2-parsing etc
#voxpelliSo we need tools and best practices that can help us keep track of that
#KartikPrabhuaren't we just adding to the "ins and outs" by asking mf2-parsers to do a lot more
#KartikPrabhubest practices are fine. but shoving all those into a parser is asking a lot
#voxpelliKartikPrabhu: the mf2-clients are experts on this and they already do it to a certain degree today as pointed out in the issues above :)
#tantekvoxpelli: it's good to keep the building blocks small, modular, and potentially swappable
#tantekas well as layering things like mf2 helper libs ON TOP OF parsers, rather than incorporating into parsers
#KartikPrabhuyeah. use a parser to get mf2 props, then use an HTML sanitser to do other things. If you are building a thing like Known, document how it handles them or something for end-users
#voxpelliThe nodejs parser eg already have an experimental more advanced whitespace parsing built in
#KartikPrabhuI have added HTML tags to my white-list over time, it is unreasonable to say "mf2py should do that"
#gRegorLoveRight, it's not there. It came from me replacing the expected JSON :)
#gRegorLoveI was getting phpunit errors still after I was pretty sure I'd fixed the parser, so as a double check I ran the HTML through unmung and replaced it in the test.
#KartikPrabhuhuh, don't understand the context/goal in that link ^ "Make it easier for people to improve their Sense Making as a way of Augmenting Human Intellect"
#snarfedKevinMarks: sigh. we may still be able to get event members though. haven't investigated yet
#snarfedman i totally forgot implementing glyphicons and target= in my linkifier
#tanteksnarfed: hmm - I disagree with "In the the link text, Removes the leading http(s)://[www.] and ellipsizes at the end if necessary." - since that changes what the author typed in the note - and conveying https is important, as well as supporting the select/copy/paste text use case
#aaronpkhm i was hoping to finish the webmention notification clustering before deploying but it's proving to be a bigger challenge than originally anticipated so i should probably just deploy the better RSVPs
#tantekexactly. such confusion is why I don't think it's a good idea to remove "www." when autolinking URLs
#tanteksnarfed, PASTA seems to imply spaghetti (code, behaviors), so I'm wondering if we can work that in somehow
#gRegorLoveFor my like display, maybe I should ignore the p-name and just display "Liked this". I think I did that initially, but then switched at some point to using the content, since that was Bridgy's default content for likes.
#tantekor maybe PASTA is already overloaded, e.g. giant spaghetti monster, Pastafarian
#tantekthough it would be nice to try to keep them compatible
#aaronpkif the "properties" key stays, then they are still quite compatible
#aaronpke.g. add[properties][category][]= vs the simpler add[category][]=
#tanteka-ha that's cool - thus mp CRUD brainstorming is helping to figure out the structure of what /edit post could be like
#tantekfor editing specific properties - thought that still should (needs to be) connected to what the presentation of an edit post of a property would *look* like to users viewing the edit post
#aaronpkwe've been talking about adding additional properties to the parsed microformat result such as "lang" attributes
#tantekI guess we can hope it works out - and if not (discovered by way of someone implementing more detailed edit posts)
#aaronpkii guess whether to keep "properties" for the micropub request is a matter of whether it's possible that microformats parsers will eventually include other things next to "type" and "properties" keys in the future, e.g. for person-tagging
#tantekI guess lang is reasonable to drive via non-english posting practices
#tantekright, I'm saying we need indieweb documentation of existing display text publishing practices with permalinks before jumping to needs / markup etc.
#Loqitantek meant to say: right, I'm saying we need indieweb documentation of existing non-english display text publishing practices with permalinks before jumping to needs / markup etc.
#tanteksure - probably better to have them on indiewebcamp.com
#tantekto focus on actual live indieweb publishing practices
#aaronpkwell anyway i wasn't talking about language support for micropub explicitly, more in general about whether there might be things added to the microformats spec that are *not* inside the "properties" object
#tantekwell we tried that with area shape & coords and it failed
#tantekben_thatmustbeme is the one that helped figure that out
#KevinMarkslinguistic analysis assumes thinsg are in one language only
#tantekaside: we really do need to start creating those social web maps so we document things like the shrinking land bridge of the FB API from the indieweb to FB.
#tantekaaronpk - time to change the indiewebcamp.com header?
#JeenaI guess I would then have a cite with the blockquote text and a h-author and a bookmark around everything
#aaronpkJeena: yeah, you'd post your own h-entry which has an h-cite inside
#KevinMarks__Anyone have somewhere in sf I could do TWiG from tomorrow? I want to be up for hwc, and it will be 103F here again so podcasting from my garden would be hard