2017-01-05 UTC
tantek joined the channel
miklb and chrisaldrich joined the channel
miklb joined the channel
# 03:02 tantek what the what. since when has github wikis had "Edit mode:" where you can switch from Markdown to Mediawiki and back?!?
# 03:03 tantek oh haha it doesn't actually translate it back/forth, it just re-interprets the "source" ?
KevinMarks joined the channel
KevinMarks joined the channel
# 04:04 loqi.me edited /like (+105) "tantek added "https://www.fastcodesign.com/3066415/how-stories-solved-instagrams-biggest-threat-self-conscious-users" to "See Also"" (
view diff )
chrisaldrich, tantek and cweiske joined the channel
# 11:09 myfreeweb why does woodwind say no feeds found for rhiaro.co.uk? there are some h-entries…
# 11:11 rhiaro It's probably not sending an accept header, and getting JSON instead of HTML
# 11:11 rhiaro Everyone is having this problem with my site at the moment :)
# 11:14 rhiaro Lesson: if you can only handle one content type, ask for it
# 11:14 rhiaro Lesson 2: rhiaro makes things too complicated, don't parse her stuff
# 11:15 myfreeweb heh, i remember some content-type/accept issues on my stuff as well :D
# 11:33 Zegnat not defaulting to HTML is an interesting move on HTTP, rhiaro. Not sure how I feel about that.
kants and petermolnar joined the channel
# 12:24 rhiaro I figured that HTML is for browsers and browsers send accept headers
# 12:25 rhiaro Scripts which don't bother probably want JSON
# 12:26 rhiaro Except for all this microformats stuff, though that usually ends up as JSON anyway so things that are converting could request JSON
# 12:26 cweiske do you send a special content type with your mf json?
# 12:37 petermolnar advice for many: the setup of having a host which forwards with iptables into an lxc container, which has an nginx, which proxies based on domain into other lxc containers on the same host effects your network quite bad
# 12:41 myfreeweb heh i'm planning the setup for moving unrelenting.technology to a new vps… i think i'll set up an nginx in a jail with host networking, proxying to apps in their own jails over unix sockets
# 12:42 petermolnar unix sockets might be fine, and from the word jail I assue *BSD, which is a different best
# 12:42 myfreeweb yeah i run freebsd everywhere i can
# 12:43 petermolnar had I done it over unix sockets, I probably would have prevented the issue
# 12:44 petermolnar for now I got tired and fed up with the thing and moved all the mind blowing number of 7 sites of friends onto the same lxc container
# 12:44 myfreeweb freebsd's veth like thing (vimage/vnet) was buggy for a while, they fixed it now… but there always was the other option — you just assigned ip addresses to jails, and they were set up as aliases on your net interface. i don't think linux has such a setup?
# 12:45 myfreeweb what exactly was the issue with veth? more latency?
# 12:47 petermolnar I've been planning to move to freebsd for more than a year, but never got the time to learn the bare minimums enough :(
# 12:50 petermolnar nice, I see ips trying to reach my no-more-existing wp-admin.php ... shall I assume those are not visitors? :D
# 13:32 GWG petermolnar, I see that from bots on sites run on and off WordPress
# 13:33 cweiske the 404 list on my page is very long, mostly from bots checking if I run some vulnerable software
# 15:07 aaronpk is there something i can do to punish bots that probe for wp-login.php? can i send back like super large files or something? redirect them to a 100mb video file?
# 15:22 aaronpk sadly this is not a visible improvement so won't count towards #100daysofindieweb
# 15:25 aaronpk eh, the goal was visible improvements specifically
# 15:25 aaronpk "It must be something with a publicly visible result (e.g. has a visible effect on the presentation of your web page, or is an improvement to an open source tool)"
# 15:30 aaronpk i just have to pick a small improvement and get that out of the way first :)
# 15:59 sebsel petermolnar So when I post a link here to petermolnar.net's wp-login.php, Loqi gets banned? :o
# 16:02 sebsel That's better. But someone else can potentially exclude your posts from showing up in webmentions.io
# 16:03 aaronpk loqi and webmention.io both use XRay to fetch the post, which is on appengine, so it may use different IPs
# 16:03 petermolnar essentially forcing me to ban services by asking services to poke me
# 16:21 aaronpk hm should i be adding rel=nofollow to links in comments?
# 16:25 voxpelli aaronpk: if you don't trust the author of it, then yes?
# 17:17 Loqi Ok, I'll tell them that when I see them next
# 17:33 aaronpk i want to autolink @-mentions in text, but not if the @-mention is already in an <a> tag
KartikPrabhu and gRegorLove joined the channel
# 18:31 aaronpk sure would be nice if these variables were named something where i could understand what they are for
KartikPrabhu joined the channel
# 18:48 Loqi gRegorLove: tantek left you a message 2 weeks, 2 days ago: not seeing the link to LA/Planning from /Planning#Completed but that approach sounds good
tantek joined the channel
# 19:07 aaronpk autolinking is hard, why hasn't this been solved once?
# 19:07 aaronpk cassis does too much autolinking so i don't want to use it
# 19:08 aaronpk notices the "do embeds or not" option and tries it out
# 19:09 aaronpk oh yeah, with cassis autolink i can't override what happens with the @names
# 19:10 aaronpk i'm also mildly amused that cassis autolink doesn't catch tantek's permashortcitations
# 19:11 tantek it's a one-off library specific object just for folks to pull out that auto-linked user as an object. barnabywalters uses it
# 19:12 tantek what kind of @-name override functionality are you looking for?
# 19:12 aaronpk i want to override @-names from my nicknames cache
# 19:12 tantek also, permashortcitations are *not* supposed to be auto-linked
# 19:13 tantek if they were linked they would be permashortlinks
# 19:13 aaronpk okay well if PSCs aren't supposed to be autolinked, then it should probably not autolink the domain in them
# 19:13 aaronpk right now it turns (ttk.me t4m92) into (<a href="http://ttk.me/">ttk.me</a> t4m92)
# 19:14 tantek perhaps I'll not autolink a domain if it is preceded by '(' and suffixed with ' '
# 19:15 tantek I think that will precisely catch those cases
# 19:16 aaronpk in <a href="http://example.com/">from example.com!</a> the autolinker catches the inner domain name and puts an <a> in the <a>
# 19:17 tantek wait a minute, I didn't say it can auto-link HTML !
# 19:17 tantek if it's HTML, you've presumably already got links!
# 19:17 tantek it would never generate <a href="http://example.com/">from example.com!</a>
# 19:19 tantek (so haven't had to scratch that itch yet ;) )
# 19:19 tantek aaronpk do you have suggestions for how to support @-names linking via a nicknames cache?
# 19:20 tantek have it check there first, and then if not found, just link to Twitter?
# 19:20 aaronpk hm that could work... the other option is providing a callback function so I can run whatever code when a nickname is found
# 19:20 aaronpk i think the dictionary leads to worse performance because it will load it for autolinking every comment even if there are no nicknames
# 19:20 tantek that dictionary being too big (i.e. too many people have their own websites) would be a good problem to have ;)
# 19:21 tantek it could call out to a custom @-name auto-link function
# 19:22 tantek if provided, otherwise it would just do its default @-name linking
# 19:22 aaronpk yeah maybe the cassis way of doing it would be to check for the existence of a named function (either window.whatever in JS or function_exists('whatever') in PHP)
# 19:22 aaronpk i'm not sure if you can do actual callback functions the same way in JS and PHP
# 19:23 aaronpk so my broader problem is that I don't know whether the comment text is HTML or not
# 19:23 tantek wait a minute, I thought we totally solved that problem
# 19:23 aaronpk or another way of phrasing it is, "http://example.com" is valid HTML and I would still want to autolink that even if the author did not
# 19:24 tantek forget auto-linking, you need to know for escaping
# 19:24 aaronpk also even if HTML is provided, I can't be sure that the author has linked hashtags or @-names in their HTML so I want to do a pass on those too
# 19:24 tantek that's a different problem and I'm totally not going to solve that one
# 19:25 tantek processing arbitrary HTML -> HTML is definitely not on my list
# 19:25 aaronpk that's already mostly done for me by XRay. it sanitizes HTML, and leaves only a small list of tags as well as mf2 classes.
# 19:30 tantek auto-linking arbitrary (even semi-processed) HTML requires a very different approach. You basically have to parse the HTML completely, and then apply plain-text auto-linking to each text-node.
# 19:36 tantek I can see why for @-names in comments, but I'm not so sure about arbitrary URLs
# 19:36 tantek at least in articles, I've written domains and URLs that I deliberately don't want linked
# 19:37 aaronpk well the same problem applies to @-names and hashtags
# 19:37 tantek either to not give them search juice, or traffic, or to add a barrier to a bad site
# 19:37 aaronpk <a href="http://example.com">hello #world</a> should not be autolinked, but "hello #world" should
# 19:38 ben_thatmustbeme ironically in my irc client it autolinked 'http//example.com' because it doesn't understand html
# 19:41 ben_thatmustbeme although i clicked it and it created that room, so i guess thats technically the correct thing to do if IRC supports chat names like that
# 19:42 tantek I think my IRC client auto-handles clicks on any #-name or use of a nickname in plain text, and either joins that room or opens a pm window respectively
# 19:42 tantek cursor just changes from arrow to hand with pointer finger, no other visual cue
# 19:44 aaronpk okay yeah i can see dropping autolinking plaintext URLs in HTML
# 19:45 aaronpk so then the problem is just: avoid autolinking @-names and #hashtags if they are inside specific HTML elements
# 19:46 aaronpk and then a separate issue for p3k is knowing whether the comment was parsed as HTML vs plaintext which it does not right now
# 19:46 tantek a-ha! yes that's key information to pass along from the microformats parsing
# 19:47 aaronpk i think it gets lost from the webmention.io web hook to p3k so that should be easy
# 19:47 tantek avoid *any* auto-linking inside specific HTML elements (A, AREA, BUTTON, SCRIPT, TEXTAREA, ... ?)
# 20:01 aaronpk ah yes, webmention.io indicates whether the comment was parsed as HTML or text
# 20:06 aaronpk okay this is progress. now i'm only running autolink on plaintext comments
# 20:11 Loqi [Aaron Parecki] Day 14: Posting to my Website from Alexa #100DaysOfIndieWeb
# 20:12 aaronpk not trying to messily autolink html sovled a lot of my edge cases :)
# 20:12 Loqi tantek has 2 karma in this channel (311 overall)
pfefferle and tantek joined the channel
# 20:49 aaronpk heh, now that i show images in comments, i'm going to have to start archiving those and rewriting the img tag
# 20:53 tantek so I'm auto-internet-archiving any images I link to
# 20:53 tantek though potentially more helpful to more people? since the archive is available to anyone
# 20:54 aaronpk i'm just going to have the same problem i had with broken avatars soon
# 20:55 tantek though people don't often embed images in your comments do they?
# 20:55 tantek I mean, I should be seeing this problem with external images right?
# 20:55 tantek (though in my primary posts, not in comments obv)
KevinMarks joined the channel
# 21:03 tantek I am saying I don't think that is a "solved problem" from a security perspective, and that asserting "thoroughly" is perhaps a bit naive
# 21:06 tantek aaronpk, I'd want to see some security testimonials on their home page
# 21:06 tantek from security professionals who have code-reviewed and or tried to exploit it
# 21:06 tantek that being said, it would be interesting to try to develop a subset of HTML that was "safe" as it were, a profile that could be used as the target of a purifier like that
# 21:07 tantek (nevermind that any security professional will first balk at them not using https!)
# 21:08 tantek aaronpk are you using that? and did you check their sha-1 checksums of their downloads?
# 21:09 aaronpk that's what i'm using yes. i installed it via composer
# 21:09 tantek huh, also looks like it is available via github, so that's https at least
# 21:10 tantek hey aaronpk, what no CASSIS auto_link love in your post? ;)
# 21:10 tantek what did you end up using? or did you fork it?
# 21:11 tantek I was wondering how / what you link hashtags in comments to
# 21:11 tantek hmm - do you think that reflects commenter intent?
# 21:12 aaronpk hard to say. for silo posts, i suppose the "correct" thing is to link to the tag page on the silo
# 21:13 tantek a tag reply is a reply that is tagging *your* original post
# 21:16 tantek aaronpk, so in that situation, the tag intention is clear, it *is* intended to be tagged within the context of your site
# 21:16 KevinMarks So when I do a tweet quote from a piece and add #indieweb does that qualify?
# 21:17 tantek a tag reply is very different than a reply with hashtags
# 21:17 aaronpk i don't really care enough about this to think about it right now
# 21:18 KevinMarks Well, when I do that my goal is to tag it in my bookmarks (on pinboard) and I know it will show up here too
# 21:18 tantek KevinMarks: right, personal context intent, not necessarily back to the original
# 21:19 KevinMarks Deciding how to link a tag yourself is fine. I cite Technorati tags as precedent
# 21:19 aaronpk because i'm only autolinking plaintext, the tweet replies from bridgy actually link to twitter hashtag pages
# 21:21 tantek KevinMarks, you'll have to disambiguate known
# 21:21 tantek waits for aaronpk to build his own personal taggregator
# 21:26 KevinMarks Ah, with known you get an html comment, so it links to the hashtags on my known site
# 21:30 tantek aaronpk, how much work would it be for Loqi to ping internet archive for all the links in anything said in IRC that it is saving / showing in the logs?
# 21:31 tantek e.g. if we paste a tweet here, then Loqi pings internet archive to archive it as well as saving the IRC text to its logs database
# 21:31 tantek the goal/effect is that all links in our IRC logs get save in the internet archive, so if (when?) those links die, the IRC log could redirect to the internet archive version instead
# 21:32 tantek aaronpk, in that case there's an easy project for you ;)
# 21:33 tantek Kevinmarks, so svgur.com pings IA for svgur posts themselves?
# 21:34 aaronpk i just added it to the code that does link previews
# 21:35 Loqi [Aaron Parecki] Day 16: Improved Comments Display for p3k #100DaysOfIndieWeb
# 21:40 bear hmm, that would be a fun project - a test app that checks html sanitizers against known attacks
# 21:42 Loqi sanitize, specifically "sanitizing HTML", "sanitizing for (display inside) HTML", or "sanitization" is a common operation performed by any site which displays content from external sources, including user entry https://indieweb.org/sanitize
# 21:43 tantek makes the same mistake he pointed out like a day or two ago
# 21:45 tantek KevinMarks: in this case I don't think I made the mistake *before* someone else did *and* I pointed it out! It was like pointing out the mistake trained my brain to then repeat it!
# 21:46 Loqi ok, I added "http://htmlpurifier.org/live/smoketests/xssAttacks.php" to the "See Also" section of /sanitize
# 21:49 petermolnar damn, I have a shortcode collision when the post time is the same on two posts
# 21:50 aaronpk looking up a URL posted in here on the wayback machine says it wasn't archived yet. but then changing https to http made it show up
# 21:57 tantek aaronpk, I've seen similar IA weirdness with http(s)
# 21:58 aaronpk i think it might just be a caching/delay thing actually
# 21:58 aaronpk so i'm pretty sure the automatic archiving from links here works but haven't actually confirmed it yet
# 21:59 tantek I'm guessing they may be particularly busy crawling things that could change drastically in ~15 days.
# 23:05 tantek hey, out of curiosity, does anyone's site here have a posted data retention policy?
# 23:06 tantek you know, of any data your server keeps about any visitor
# 23:07 tantek might want to figure that out in the next 15d
# 23:13 Loqi [@Pinboard] @printfJess delete whatever you can, don’t collect data, if you have to collect it, don’t store it, if you store it, don’t store it for long
# 23:14 tantek There's a natural tension between that and caching though
# 23:16 tantek How shall we document this for indieweb? E.g. what are best practices, examples thereof, software implementations etc.