#dev 2017-01-20

2017-01-20 UTC
# 00:00 
@call_user_func [New]pfefferle/wordpress-webmention-for-comments Webmention support for (threaded) comments https://packagist.org/packages/pfefferle/wordpress-webmention-for-comments (twitter.com/_/status/822232191857922048)
tantek, KevinMarks, gRegorLove, KevinMarks_ and KartikPrabhu joined the channel
# 07:20 
@sl007 @aaronpk Friend analysed changes : "This time they don't shoot in the knee they cripple complete thigh" + last in https://github.com/aaronpk/IndieAuth.com/issues/130 (twitter.com/_/status/822343134617235456)
cweiske, sknebel_ and tantek joined the channel
# 08:34 
@W3CMI W3C ajánlás lett a Webmention http://w3c.hu/archivum/2017/w3cnews.html%23pid_20170119b.hun_w3.org (twitter.com/_/status/822361692755607552)
marcthiele and tantek joined the channel
# 10:34 
seblog.nl edited /Webmention (+125) "/* Extensions */ added summaries" (view diff)
# 10:38 
@schestowitz #Webmention is latest #W3C Recomm. (tracking) What about #privacy ? What about #swpats -free standards? Rather than #drm and other rubbish? (twitter.com/_/status/822392899971055616)
tantek, marcthiele and arush joined the channel
# 11:43 
GWG Good morning, all.
tantek and KevinMarks joined the channel
# 15:08 
@dissolve333 @schestowitz I don't really understand what you are saying? How does DRM relate at all to #webmention? (twitter.com/_/status/822460679189921792)
# 15:13 
tantek.com edited /tinbox (-52) "done invite people to 2017-01-25 HWC SF" (view diff)
gRegorLove and tantek joined the channel
# 17:47 
aaronpk oh gosh i broke something and now my twitter posse is going rogue
# 18:17 
tantek.com edited /100DaysOfIndieWeb (+185) "start a How to section with info from aaronpk and other stuff on the page" (view diff)
# 18:18 
aaronpk oh... i think i'm running into concurrency issues writing to disk
# 18:19 
tantek whoa
# 18:19 
tantek with p3k?
# 18:19 
aaronpk i have one process syndicating my post, and another one expanding the reply context
# 18:19 
tantek you posting too much? ;)
# 18:19 
tantek interesting!
# 18:19 
aaronpk and then they clobber each other when they try to update the file
# 18:19 
aaronpk my last couple posts glitched out in different ways. some were missing the reply context, some were missing the syndication URL
# 18:20 
aaronpk i'm just gonna scale back to a single background process for now
# 18:21 
aaronpk i guess this is a problem with storing the whole post data in one atomic unit
# 18:22 
aaronpk i would have the same problem with mysql if I were storing the whole post contents in a single JSON field for example
# 18:22 
voxpelli aaronpk: can't you lock the file with flock()?
# 18:23 
voxpelli and have the process that fails to acquire the lock wait and sleep until the lock is released?
# 18:27 
aaronpk  sure, but that's a bunch of work i haven't done yet :)
# 18:31 
aaronpk oh that wouldn't actually help me anyway
# 18:31 
aaronpk because they'd still clobber each other cause they read the post contents into memory in order to manipulate the properties
# 18:32 
aaronpk also i'm using this storage wrapper which allows storing files on things that might not be the filesystem so i don't actually have access to filesystem locking https://laravel.com/docs/5.2/filesystem
# 18:33 
voxpelli right, yeah, that would complicate things
# 18:34 
aaronpk either way, the problem is when two processes load the post into RAM, manipulate it there, then try to write it back to disk
# 18:34 
aaronpk breaking up the data into separate units (either by using different columns or tables in an RDBMS, or using different files for a file-based approach) is probably the best way to solve it
# 18:36 
voxpelli sounds like the easiest one at least
# 18:36 
aaronpk otherwise i basically have to build my own locking mechanism, and processes the might write the file would have to request a write lock at the time they open the file
# 18:37 
tantek exactly. building your own locking mechanism = probability of building your own deadlocking mechanism ;)
# 18:38 
aaronpk so for now, since everything is queued and processed in a background task, i'm just going to reduce to one process so that everything happens sequentially
# 18:41 
tantek.com edited /100DaysOfIndieWeb (+241) "/* Brainstorming */ 100 days of positive posts" (view diff)
# 18:41 
sknebel_ aaronpk: that's why I put different generated bits into different files for now
# 18:43 
sknebel_ but that probably also gets annoying once you get to the point where changed fields start to overlap between tasks
# 18:44 
aaronpk this wasn't a problem for me until i had two tasks that often take a long time. POSSEing being one, and fetching reply context (or repost content) being the other.
# 18:46 
tantek aaronpk: I'm surprised you're not fetching/caching the reply context as part of the authoring UI / flow
# 18:46 
aaronpk tantek: that would involve every client being responsible for that task
# 18:47 
aaronpk instead, i can have a client that sends only the "repost-of" or "in-reply-to" property as a URL
# 18:47 
tantek I suppose I would expect good UIs (clients) to *have to* do that purely for UX reasons
# 18:47 
aaronpk my IRC client certainly doesn't need to
# 18:47 
tantek like helping the user be sure they are replying to the right thing
# 18:48 
aaronpk !like http://example.com/post/...
# 18:48 
aaronpk i have that in a private IRC server
# 18:48 
tantek of course you do :)
# 18:48 
tantek.com edited /100DaysOfIndieWeb (+256) "/* More 100 days projects */ 100 days of positive news" (view diff)
# 18:49 
aaronpk I also trust my own website to fetch the parts of the repost/reply post that I need more than I trust other clients
# 18:49 
tantek that makes sense
# 18:49 
tantek at least from a validation / updating perspective too
# 18:49 
aaronpk of course if a client sends me an in-reply-to value that is a full h-cite object, it'll store it just the same
# 18:50 
tantek I'm still uncertain it makes sense (for me at least) to store *someone else's data* in the same storage / file as *my data*
# 18:50 
aaronpk for the repost case i think it makes sense
# 18:50 
tantek cache vs. persistence data policies
# 18:50 
aaronpk i'm less certain about it for reply contexts, but it was convenient to do so
# 18:51 
tantek hmm, convenience seems like a bad reason to do that, and how data stores end up getting bloated with things they shouldn't have
# 18:52 
tantek repost makes sense because you as the author want to absolutely capture the snapshot in time of the thing you reposted
# 18:52 
aaronpk well i have to store it *somewhere* in order to render it. it was easier to put it in the post file than come up with a scheme for storing it outside the post file
# 18:52 
voxpelli I would store reply-contexts the way I store webmentions I think
# 18:52 
tantek right, my approach for storing that kind of thing is the same for webmentions - stuff from other sources, not me
# 18:52 
tantek heh voxpelli :)
# 18:53 
voxpelli in a threaded conversation the reply-context could even be the exact same data as the webmention I have already received on another post
# 18:53 
aaronpk voxpelli: yeah that was what i was originally diagramming for this project, but i still haven't come up with a long term plan for storing webmention content
# 18:53 
aaronpk my disk storage in p3k-v1 ended up having some issues that i haven't figured out how to resolve yet
# 18:54 
tantek basically for each storage file I have (bim) I plan store a second file of the cached stuff from others
# 18:54 
tantek and yes that means two file loads instead of one, but it also means a very simple data / file persistence / cache policy
# 18:55 
voxpelli I store everything as json keyed on their normalized URL:s and then make them into likes, reply-contexts etc based on relations between that url and another url
# 18:55 
aaronpk "normalized"?
# 18:55 
aaronpk and does that mean you have one giant file with everything?
# 18:55 
tantek what is a normalized URL?
# 18:55 
Loqi It looks like we don't have a page for "normalized URL" yet. Would you like to create it?
# 18:55 
aaronpk or is the URL the filename?
# 18:56 
voxpelli aaronpk: I store it in PostgreSQL so my keys are in there, but I guess I could just as well eg. sha-hash them and put them as files on disk
# 18:56 
aaronpk ah postgres okay
# 18:58 
aaronpk yeah i was considering a sha-hash as well. v1 made a filename based on the URL, so I had files like this which is great for readability https://media.aaronpk.com/Screen-Shot-2017-01-20-10-57-52.png
# 18:59 
voxpelli can't really come up with a good definition of normalized URL
# 19:00 
aaronpk start with what do you do to normalize a URL?
# 19:00 
voxpelli I think I do more normalization than many would think as okay
# 19:00 
tantek more than following rel-canonical?
# 19:00 
aaronpk the only normalization that I did was lowercase domain name and remove :80 and :443 if present
# 19:01 
voxpelli tantek: I don't think I follow rel-canonical, but maybe I do after I salmentioned my code
# 19:01 
aaronpk https://en.wikipedia.org/wiki/URL_normalization#Normalizations_that_preserve_semantics
# 19:01 
aaronpk i don't think i did the percent decoding thing but probably should have
# 19:01 
tantek that sounds like enough material to start stubbing an article aaronpk, especially since you have a real world implementation!
# 19:02 
aaronpk well, it's an old implementation, not in use anymore
# 19:02 
KevinMarks Is that the same as spiderpig does?
# 19:02 
aaronpk KevinMarks: it's similar, but spiderpig had different constraints so it ended up doing a little more than that
# 19:02 
voxpelli a normalized URL is a URL with non-significant alternatives removed, like the default :80 port
# 19:02 
loqi.me created /normalized_URL (+124) "prompted by tantek and dfn added by voxpelli" (view diff)
# 19:02 
Loqi ok
# 19:03 
voxpelli I do: https://github.com/voxpelli/webpage-webmentions/blob/master/lib/utils/url-tools.js#L8
# 19:03 
voxpelli Which eg. means that I treat http and https as the same currently
# 19:03 
aaronpk oh yeah i did too
# 19:03 
tantek voxpelli: you may find fortune.com violates that
# 19:03 
voxpelli and I remove any double /
# 19:04 
voxpelli as well as any trailing /
# 19:04 
voxpelli I also remove any www. subdomain
# 19:04 
tantek what is no-www?
# 19:04 
Loqi no-www is a movement to deprecate use of "www." at the start of URLs as being redundant, unnecessary, and a waste of resources https://indieweb.org/no-www
# 19:04 
aaronpk wow that's bold
# 19:05 
voxpelli but I only use the normalized URL for matching different URL:s against each others, to avoid duplicates and to allow for my embed code to actually find all mentions
# 19:05 
aaronpk presumably two different URLs could overwrite each other in your storage tho?
# 19:06 
voxpelli yes, but only if someone has some very weird implementation
# 19:07 
voxpelli double /, trailing /  and www. most often have the very same content as any URL:s without it
# 19:07 
aaronpk yeah
# 19:10 
tantek.com edited /100DaysOfIndieWeb (+62) "/* Brainstorming */ positive before negative" (view diff)
# 19:20 
aaronpk what is the dutch word for the dutch language?
# 19:20 
Loqi It looks like we don't have a page for "dutch word for the dutch language" yet. Would you like to create it?
# 19:21 
aaronpk Nederlands?
# 19:25 
tantek aaronpk: you may find this link handy for that question in general (language-name word for the language-name language) https://en.wikipedia.org/wiki/Main_Page#p-lang-label
# 19:26 
aaronpk good call
sknebel joined the channel
# 19:32 
tantek.com edited /100DaysOfIndieWeb (+29) "/* 100 Days of 500 Words */ 100dagen500woorden" (view diff)
# 19:34 
tantek hmm, besides aaronpk, only sebsel is using a complete hashtag
# 19:37 
sebsel tantek with # you mean?
# 19:37 
sebsel I'm just copying aaronpk :)
# 19:38 
sebsel oh, I see #indieweb now
# 19:38 
tantek :)
KartikPrabhu joined the channel
# 20:49 
tantek.com edited /100DaysOfIndieWeb (+128) "100 Days of Positive Posts moved from Brainstorming to Other" (view diff)
# 20:53 
KartikPrabhu lost all my tags while moving databases! ;(
# 20:53 
KartikPrabhu database--
# 20:53 
Loqi database has -1 karma in this channel (-2 overall)
# 20:56 
voxpelli no backups? :(
# 20:57 
tantek KartikPrabhu: wat? they were not in the export?
# 20:57 
tantek I think I gave up on "meta" tags that sit outside the content storage
# 20:58 
KartikPrabhu they were but they are stored as some "relational" thing and not as plaintext. So all tags got exported but the "relation" between them and posts broke :(
# 20:58 
tantek inline hashtags in the content are harder to "lose"
# 20:58 
tantek sorry to ehear that KartikPrabhu :(
# 21:00 
KartikPrabhu i did all this moving to make it easier to finally move to file-storage
# 21:00 
voxpelli very interesting though in relation to the earlier discussion of splitting up data among many files – can make data loss easier
# 21:00 
tantek any chance you can check archive.org for your tags on your permalinks?
# 21:01 
tantek voxpelli: not quite sure I follow. the problem was the ethereal "relaional" things, rather than different plain text stores
# 21:02 
KartikPrabhu right, my original database setup was that "tag" is an object and "post" is an object and there is a "relation" betwen them that MySql magically manages
# 21:02 
loqi.me created /static_site_generation (+34) "prompted by tantek and dfn added by tantek" (view diff)
# 21:03 
KartikPrabhu i am no DB expert so I made this after reading things on the web
# 21:03 
voxpelli tantek: well, KartikPrabhu  thought he had all pieces, but he didn't – the more spread out ones stuff is, the easier to forget one part. Having everything at the same place makes that impossible
# 21:03 
tantek voxpelli: having everything in files in the file *system* is a form of everything in the same place, especially if they're in the same root folder
# 21:04 
voxpelli KartikPrabhu: usually no magic relations in MySQL, relations are something you most often define at query time there – matching one value against another. (Compare to eg. Neo4j where a relation is a first class object)
# 21:04 
KartikPrabhu that shows how much I know :P
# 21:05 
voxpelli tantek: well, formatting drives or erasing databases – both times you need to ensure that you have extracted everything you want or else rely on you having backups of things from before you erased it
# 21:05 
voxpelli KartikPrabhu: but since theres nothing magical, then maybe you have exported it afterall?
# 21:05 
KartikPrabhu yeah it is possible, I am looking at my export JSON to check that
# 21:06 
voxpelli if you can put parts of the export up somewhere we can have some more eyes
# 21:07 
KartikPrabhu my text editor is hainvg trouble with the large JSON file :P
# 21:08 
KevinMarks Usually you have an id on each, post_id and tag_id and table that has post_id, tag_id pairs in
# 21:09 
tantek right, ids for tags just in case you want to "rename" the tags and have them stay assigned to all the posts you assigned them to, instead of having the name of the tag *be* its ID
# 21:09 
KartikPrabhu yeah I think that database which has the post-tag pair didn't get exported
# 21:11 
KevinMarks Well, also to make the table fixed size
# 21:13 
KartikPrabhu now I am thinking of just storing the tags as a comma-separated text field
# 21:16 
KevinMarks The question is if you need to make tag pages, are you better off with a db or a generation script
tantek joined the channel
# 21:20 
KartikPrabhu KevinMarks: yes, that is what I need to look into
# 21:21 
KevinMarks With a static generator it will build the tag pages for each update
# 21:21 
tantek hey at least you have tag pages
# 21:22 
KartikPrabhu my tag pages are now all blank! :P
# 21:22 
KartikPrabhu https://kartikprabhu.com/articles/tag=web
# 21:23 
sknebel KartikPrabhu: does the internet archive maybe have old copies you could throw through a microformat parser for recovery?
# 21:23 
tantek it is probable
# 21:23 
KartikPrabhu sknebel: yeah that is a way to recover them
# 21:24 
sknebel seems like not very recent ones though
# 21:24 
KartikPrabhu my last article post is from 2015 so should be fine :P
# 21:24 
sknebel oh, ok then ;)
# 21:34 
loqi.me created /calendar_heatmap (+220) "prompted by tantek and dfn added by sknebel" (view diff)
# 21:36 
loqi.me created /forestry.io (+112) "prompted by tantek and dfn added by [keithjgrant]" (view diff)
# 21:37 
loqi.me edited /visualization (+23) "sknebel added "[[calendar heatmap]]" to "See Also"" (view diff)
# 21:42 
tantek.com edited /Events (+123) "/* January */ update confirmed locations for 2017-01-25 HWC" (view diff)
AbeEstrada joined the channel
# 22:25 
aaronpk eep sorry to hear that KartikPrabhu!
# 22:28 
KartikPrabhu it isn't that bad. Only my article posts had tags and so only about 50 of them
# 22:28 
KartikPrabhu but I'm glad this happened before i started tagging notes
# 22:30 
aaronpk true!
tantek joined the channel