2019-01-15 UTC
snarfed, [jgmac1106], [tantek] and [cleverdevil] joined the channel
# 01:15 Loqi denschub has 1 karma in this channel over the last year (2 in all channels)
# 01:16 [tantek] aaronpk, as someone who has had to implement bits of AP I must confess I'm surprised at your response. Or did you mean not interested in participating in an explicitly global public conversation?
# 01:17 [tantek] In contrast to public but contained discussion e.g. here in dev?
# 01:17 DenSchub oh, i don't mind that response! i can totally understand it, especially in such a controversial topic, which can also be a huge time sink
# 01:18 [tantek] I for one found aspects in denschub's post that could be used to improve nearly any standards discussion or community, including IndieWeb specs, microformats vocabularies etc
# 01:19 [tantek] FWIW learning from others's mistakes is one of the cheapest ways to learn
# 01:25 aaronpk I meant I don't have anything to add to the public conversation that's going on in those blog posts referenced
snarfed, KartikPrabhu and [eddie] joined the channel
# 05:25 [eddie] [cleverdevil] you mentioned previously that your podcast listens didn’t have the correct time entered on your site but since then you mentioned that you fixed it. Was it an issue with your script from Overcast or an issue in Known?
[cleverdevil] joined the channel
# 05:26 [eddie] Is the current gist fixed? Or was it beyond that script?
# 05:26 [cleverdevil] I’ve updated it slightly since then. I can update the gist later this evening.
# 05:27 [eddie] Cool, that’d be great. I’m gonna revise it to send via Micropub and do a single test run to see how it turns out.
# 05:31 [eddie] Haha wow. Yeah I’d definitely shy away from a service until Marco gets back to you 😆
# 05:31 [eddie] And it’ll be nice to have some different outputs: Known, Micropub, etc
# 05:33 [cleverdevil] If I were to do a service I’d want to update my approach to use Micropub against Known.
snarfed joined the channel
snarfed joined the channel
ichoquo0Aigh9ie, ichoquo0Aigh9ie_, KartikPrabhu, swentel, [tantek], cweiske, swentie, strugee, [mrkrndvs], leg, eli_oat and [kevinmarks] joined the channel
# 10:27 [kevinmarks] This bit “But it's not good enough: for example, people have expressed that they want others to be able to read messages, but not reply to them.
# 10:32 [kevinmarks] You can't stop people from replying. You can stop displaying their replies.
# 10:35 [kevinmarks] So adding a "don't @ me" flag to your posts does what? Gives notice that webmentions of it will be ignored?
[jgmac1106] joined the channel
# 10:49 sknebel and potentially tells all other conforming implementations to discard posts that claim to be replies
# 10:54 sknebel or, depending on the protocol design, never reaches others. E.g. if I remember correctly, in Diaspora a reply is only distributed through the thing it's replying to, so that server has control over it
krychu and [mrkrndvs] joined the channel
KartikPrabhu joined the channel
# 11:36 jeremycherfas That's what it does, but maybe it just doesn't have a name. Anyway, thanks.
KartikPrabhu, [Rose], [svandragt], [voss], [jgmac1106], [pfefferle], swentel, [eddie], snarfed, gRegorLove, [kim_landwehr], [tantek] and [cleverdevil] joined the channel
# 17:03 [cleverdevil] !tell [eddie] I updated the gist just now, fell asleep last night before I remembered to do it 😉
# 17:03 Loqi Ok, I'll tell them that when I see them next
[eddie] joined the channel
KartikPrabhu, krychu, snarfed, [zak], [chrisaldrich] and j4y_funabashi joined the channel
# 18:31 aaronpk so, Aperture is officially getting slow now, and I need to figure out a solution
# 18:31 aaronpk the database machine's disk usage is at like 100%
# 18:32 aaronpk notice how the latency climbs steadily from when aperture was launched in july
# 18:33 snarfed aaronpk: happy to help if you want someone to bounce ideas off of
# 18:33 aaronpk i put the 7-day limit on public accounts to build myself this escape route
# 18:33 aaronpk but my own account has been archiving everything forever
# 18:33 aaronpk and i think it just might not be sustainable to do that
# 18:34 snarfed it can be if you decide to, you'd just need a different architecture
# 18:34 snarfed maybe first decide what you want it to be, eg permanent archive for even some people, or none. then we can figure out architecture to support that choice
# 18:35 sknebel shouldn't old stuff that's not accessed "only" take space, for the most part?
# 18:35 aaronpk sknebel: yea but indexes get updated and such around those old entries
# 18:35 aaronpk snarfed: yeah i'm leaning towards dropping the whole idea of it being any sort of permanent archive, since that greatly simplifies the requirements
# 18:36 aaronpk the problem is i still do want some sort of permanent archive of (some) of the channels i have set up
# 18:36 snarfed you have seemingly had good luck decomposing things into many microservices
[cleverdevil] joined the channel
# 18:36 snarfed also try it out dropping archiving from aperture and make sure actually fixes the problem. seems likely but not guaranteed
# 18:37 [cleverdevil] [aaronpk] I may be able to help as well. I’ll check and see if I can get some infra for you.
# 18:37 sknebel have you made sure the DB can do everything based on the indexes?
# 18:37 aaronpk [cleverdevil]: thanks but i think throwing more hardware at the problem is just going to push the same issue down the road til later
# 18:37 aaronpk sknebel: i spot checked a few indexes of some of the slower and most common queries and it was using them
# 18:37 snarfed eh if it's enough hardware it can be a long way down the road, ie many years. esp if it's just you or maybe just a few ppl permanently archiving
# 18:38 Loqi throwmoneyattheproblem has 1 karma over the last year
# 18:38 Loqi throwsomeoneelsesmoneyattheproblem has 1 karma over the last year
# 18:38 sknebel throwsomeoneelsesmoneyattheproblemaslongasyoudonthavetospendtonsoftimeinreviewmeetings++
# 18:38 Loqi throwsomeoneelsesmoneyattheproblemaslongasyoudonthavetospendtonsoftimeinreviewmeetings has 1 karma over the last year
# 18:41 [cleverdevil] Another recommendation would be to consider making it super easy to install for folks like me.
# 18:41 [cleverdevil] I am currently putting load on hosted Aperture and would be fine self-hosting if it were a quick and easy thing to do.
# 18:41 sknebel I think the idea behind not doing that was that people like you (or me) write one that's easy to install so it's not everyone using Aperture :D
# 18:42 aaronpk yep there's that, and also i can't start charging until i at least know the cost and how this will need to scale
# 18:42 [cleverdevil] Well, I do think that someone was working on a WordPress-based microsub server.
# 18:42 snarfed honestly i'd discourage most of us from charging for our indieweb services unless it's someone who really wants to do a full fledged startup
# 18:42 snarfed otherwise it'll usually be much more added pain (collecting money, support) than gain
# 18:46 GWG cleverdevil, jackjamieson. Yarns. It's in beta
# 18:46 aaronpk wonders what the plan is for Yarns around archiving content
# 18:48 aaronpk the ironic part of this is my original plans for building aperture (before microsub even) were to treat channels as folders on disk of text files, the same way I store my GPS data
# 18:49 aaronpk eh pretty sure most of the load on my hosted aperture is from myself
# 18:49 snarfed possible! you may also be an outlier in usage terms
# 18:49 sknebel how important is to have access to the archive through aperture?
# 18:50 aaronpk my GPS database is 6.2gb and over 10 million records and it contributes nothing to the overall load of the server
# 18:50 sknebel e.g. you could see if the DB is happier if you move old posts to an archive table, or do the text-file export for that
# 18:51 aaronpk so far, my actual use of Aperture/Monocle/etc hasn't really involved diving into super old archives
# 18:51 aaronpk and also i don't actually want to archive *all* channels, only a few of them
# 18:52 aaronpk so maybe i set up something separate that pulls content from an aperture channel and saves it as text files, totally outside of the aperture code base
# 18:52 GWG I think jackjamieson wants a working version before we worry about archiving
# 18:52 GWG I know my opinion was that archiving would be a bookmark post
# 18:53 aaronpk one of the other challenges is around how to handle content from feed pages, since sometimes entries have bad/missing uids, or the dates are missing or super old even though the content is new, so another thought I had was to basically store only the current items in a feed page and anything not in that page gets deleted
# 18:54 GWG aaronpk, how much do you want to archive personally?
# 18:55 snarfed GWG: i think most of our discussion of "archiving" here has been about keeping just the feed data itself long term/permanently, not fetching and archiving entire posts
# 18:55 snarfed primarily around managing server load over time, not archiving as a feature
# 19:00 aaronpk my theory is that if these tables aren't just infinitely growing in size that things will go faster
# 19:04 aaronpk select count(*) as aggregate from `entries` where `entries`.`source_id` = 599 and `entries`.`source_id` is not null;
# 19:04 aaronpk here's an example of a query that is currently very slow even though it's using an index:
# 19:05 snarfed you could probably add indices or tune to improve it. and you'll still have a decent write i/o burden even with a fixed size table, but load should stay fixed, not growing without bound
# 19:06 GWG Okay. I was focusing on user need
# 19:11 aaronpk i guess at least one thing i can do right now is try to remove that query
j4y_funabashi joined the channel
# 19:23 aaronpk oh i could do my old trick that saved my butt during my startup days of moving the longblob column to a new entries_data table to keep the entries table smaller and fixed width columns
# 19:25 j4y_funabashi also might not need the not null clause? doesn't the id = xx negate the second clause?
[schmarty] joined the channel
# 19:27 [cleverdevil] Have you run an EXPLAIN on the query to be sure its actually using the indexes?
# 19:27 aaronpk `| 1 | SIMPLE | entries | ref | entries_source_id_url_index | entries_source_id_url_index | 8 | const | 78690 | Using index |`
# 19:28 j4y_funabashi heh yeah it is a lot but mySQL can definitely do sub second queries on multi million row tables
# 19:35 aaronpk varies between 1-40 seconds depending on the rest of the server load of course
# 19:35 jacky thinks this'll make for an interesting blog post :)
# 19:35 aaronpk the trick with these things of course is that sometimes completely unrelated queries show up in the slow log when the whole server is under heavy load
# 19:38 j4y_funabashi yeah on your graph it is io wait time that is spiking so might not bee query efficiency related at all
# 19:39 aaronpk but looking at the slow query log that one comes up a lot
# 19:47 j4y_funabashi given my limited understanding of explain output that one looks OK as long as the a actual count is around 78690 then it isn't doings unnecessary scans
# 19:48 gRegorLove I forget, but is it more efficient to count() on an indexed column instead of count(*)?
# 19:50 aaronpk i thought i removed all instances of that code and switched to denormalizing it instead, but something somewhere is still calling it
# 19:51 aaronpk turns out all i needed to know was whether there are any entries for that source, not the exact number, so i just added a column to the sources table
# 19:57 aaronpk something is trying to return all entries from a source without a limit and it's chrisaldrich's feed so there are 36,000 rows
krychu joined the channel
# 20:04 aaronpk ohh it's when someone adds an existing feed, it tries to go add all the entries to their channel
KartikPrabhu joined the channel
# 20:11 aaronpk i do need some way to report feed errors back to the user
# 20:11 aaronpk looking at the logs there are a bunch that are failing for various reasons
# 20:11 aaronpk some of them are tumblr returning http 401 for the request, some are granary.io instagram 401s
KartikPrabhu joined the channel
# 20:17 aaronpk 1) stop trying to add *all* past entries from a feed into a new channel, limit to the most recent 100
# 20:17 aaronpk 2) stop counting records when all you wanted to know was whether there are any
[chrisaldrich] joined the channel
# 20:24 [chrisaldrich] aaronpk, is there something my site is doing that's causing it to dump out that much data? (I'm presuming it's the known site, but do others do that too?)
# 20:24 [chrisaldrich] seems like it's been a while since I've been able to be an edge case that breaks something... 🙂
# 20:25 [chrisaldrich] I think my wordpress site has a reasonable RSS limit of maybe 40 which I'd upped since all my microposts stream by so quickly....
# 20:26 [chrisaldrich] what are method are you using that returns so much data? And is it paginating all the way down?
# 20:26 Loqi It looks like we don't have a page for "method are you using that returns so much data" yet. Would you like to create it? (Or just say "method are you using that returns so much data is ____", a sentence describing the term)
# 20:26 aaronpk it's that i do actually store everything from a feed
[tantek] joined the channel
# 20:28 [tantek] aaronpk, sounds like your "archiving" use-case could be delegated to a separate service
# 20:29 [tantek] though yes, archiving "everything" (for some value of "everything") *and* indexing it sure does start to sound like a search engine
[eddie] joined the channel
[jgmac1106] joined the channel
# 20:36 Loqi aaronpk has 82 karma in this channel over the last year (261 in all channels)
# 20:42 Loqi throwmoneyattheproblem has 0 karma over the last year
# 20:47 Loqi throwlimit100attheproblem has 1 karma over the last year
KartikPrabhu, [benatwork] and [cleverdevil] joined the channel
# 21:06 Loqi [aaronpk] has 83 karma in this channel over the last year (262 in all channels)
j12t, krychu and [kevinmarks] joined the channel
# 21:45 [kevinmarks] i wonder if my tweet list via granary is causing pain - it has ~2000 unread usually
# 21:46 aaronpk i doubt it. the problem was really caused because multiple people subscribed to the same feed
[Rose], [smerrill] and [cleverdevil] joined the channel
# 22:37 [cleverdevil] FYI, Marco Arment has told me that the current rate limit on his OPML export is 10 hits per day per user.
# 22:37 [cleverdevil] So, if you are planning on using my scripts or writing your own to track your listens, I’d recommend staying well below that limit.
snarfed joined the channel
blueyed joined the channel
# 22:52 [smerrill] I’ve run it in Docker, but I also run it on a pair of Raspberry Pis. I configure my home router to point to my Pis for DNS. works a treat.
# 22:55 blueyed [smerrill]: Thanks!
[jgmac1106] joined the channel
# 23:51 jacky the more I look into microsub, the less likely I see me implementing this within Koype