#dev 2019-08-13

2019-08-13 UTC
#
vika_nezrimaya
oh well it seems like I don't have alternative slugs supported in my Micropub endpoint, what an oversight
#
vika_nezrimaya
it seems like I have outgrown redis...
#
vika_nezrimaya
and my software isn't even production ready
#
vika_nezrimaya
has anyone used CouchDB before here? Is it good?
itsmekntDiscord[ joined the channel
#
[snarfed]
vika_nezrimaya you may be interested in https://indieweb.org/database-antipattern
#
vika_nezrimaya
@snarfed: file systems seem to be my bottleneck on an RPi with an SD card
guilbo joined the channel
#
vika_nezrimaya
Because on a laptop with a normal HDD it's a lot faster
#
[snarfed]
bottleneck...for what? request throughput? latency? you've measured?
#
[snarfed]
just curious
#
vika_nezrimaya
I just transferred it to a laptop and it became faster
#
[snarfed]
eh there are lots of differences btw an rpi and a laptop, right?
#
vika_nezrimaya
Architecture, clock speed, but disk I/O is I think the most impacting one here
#
vika_nezrimaya
'cause SD cards aren't built for server-grade applications
#
[snarfed]
maybe! measuring is always useful. intuition is nice but sometimes misleading
#
[snarfed]
you may also be interested in aaronpk's architecture, which uses filesystem for canonical storage, and also a db as a purely derived index + cache, to make querying easier and sometimes faster. https://indieweb.org/p3k#Indexing_and_Caching
#
[snarfed]
gotta run. gl!
#
vika_nezrimaya
I tried to transition to Redis to keep the dataset in memory at all times. But Redis isn't a proper document store, and with a proper document store it could be easier to query posts by properties (i.e. implement search or filtering by tags)
#
aaronpk
I'm a big fan of the hybrid approach
#
aaronpk
also SD card read speeds are generally fast enough for this kind of thing, especially if you can move the index to RAM
#
aaronpk
Redis is certainly more than capable of this, it'll just take some fiddling to figure out the right data structures to use
#
vika_nezrimaya
Right now I think I'll end up with a lot of auxilliary keys like category_{} = ["slug1", "slug2", "slug3"] to speed up tag feeds
#
vika_nezrimaya
I already have a "posts" hash with my posts in MF2
#
vika_nezrimaya
and a main feed list containing their order
#
aaronpk
tbh you could probably run MySQL with the data folder in ram and have a much easier time querying it :-D
#
aaronpk
Create and populate the database on boot from the files on disk 😂
#
vika_nezrimaya
MySQL? no thanks I've heard of enough problems with MySQL and now I'm scared of it 😂
#
aaronpk
Eh plenty of people run production workloads way more complicated than this on MySQL
#
vika_nezrimaya
but this could be a nice idea - to store posts in flat files but load them up in Redis as a cache
#
aaronpk
Yeah regardless of the DB I like that approach a lot
#
vika_nezrimaya
but then I lose on Redis' native replication...
#
aaronpk
Just rsync every little while
#
aaronpk
filesystems are good at that
#
vika_nezrimaya
could get out of sync between these whiles :3 rsyncing every second I think is a bad idea :D
#
aaronpk
Also means you can commit the storage files to git and easily keep them in multiple places
#
vika_nezrimaya
and we're returning to my old design basically but with a cache DB
#
vika_nezrimaya
and posts in MF2 JSON
#
vika_nezrimaya
instead of Hugo markdown with front-matter in YAML
#
aaronpk
If you really need, there are plenty of tools for creating fancy syncing distributed filesystems
#
vika_nezrimaya
I hope my Micropub endpoint code is modular enough so I can just swap out functions for reading and writing posts to switch a DB :3
#
vika_nezrimaya
right now it puts the post into the "posts" hash, adds it to the "posts_order" (which is a fancy way of naming my "main feed" list), updates "categories" (used on ?q=category for autocompletion) and updates "slugs" key with alternative slugs for a post
#
aaronpk
Sounds reasonable
#
vika_nezrimaya
If I were to switch it for local files, I would dump JSON into a file and... make some data structures for feed, probably also on the filesys
#
vika_nezrimaya
WAIT I GOT IT
#
vika_nezrimaya
I understand why my previous implementation was a snail
#
vika_nezrimaya
It read THE WHOLE DATASET from disk on EVERY SINGLE FEED QUERY
#
aaronpk
Well that would explain it :-)
#
vika_nezrimaya
more than 200 files
#
vika_nezrimaya
right now my dataset is at 299 posts
#
vika_nezrimaya
299 open() calls
#
aaronpk
Yes that sounds problematic :-)
#
vika_nezrimaya
it IS problematic :3
#
aaronpk
I was originally doing something similar before I started building out the DB index
#
aaronpk
I still don't store the post contents in the DB, but it means I open at most 10 files per page load
#
aaronpk
Figuring out what files to load is done entirely with DB queries
#
vika_nezrimaya
my dataset is 1.1M right now, this means it reads 1M of data from an SD card. I could continue storing JSON files, but build an index in Redis. In case of power outage the index could be easily rebuilt from the dataset in flat files since they contain A LOT of metadata
#
vika_nezrimaya
this basically means ditching posts key
#
[tantek]
Huh, I'd avoid any heavy read/writing with an SD card, they tend to have a pretty dismal MTBF
#
vika_nezrimaya
I had f2fs on my last one
#
vika_nezrimaya
so that should've prolonged it a little bit
#
aaronpk
(Relatedly, read up on tips for prolonging SD card lifetimes in a Pi such as moving the /var/log folder to ram)
#
aaronpk
but yeah expect to get maybe 1-2 years off the card
#
vika_nezrimaya
My /var/ was on a USB-HDD
#
vika_nezrimaya
that really helped :3
#
aaronpk
my last one started failing in non obvious ways, like individual characters in a file we're getting swapped around, but as long as it was turned on it mostly worked. Next time it rebooted it died
#
vika_nezrimaya
I remember my first card dying spectacularly
#
aaronpk
So store your posts on a usb drive too then?
#
vika_nezrimaya
I had to reboot the Pi by BLINDLY logging on tty1 with a keyboard
#
vika_nezrimaya
it was a headless setup
#
vika_nezrimaya
SSH wasn't working
#
vika_nezrimaya
and I didn't have a serial converter
#
[tantek]
wonders if sorting cities by longitude is a reasonable linearization of an otherwise 2D data set, you know, as a Y-axis on a plot where time is the X-axis
#
vika_nezrimaya
didn't get to venues yet
#
aaronpk
longitude is a X axis, and also roughly represents time order due to timezones already
#
[tantek]
I'm thinking more like a map tilted 90deg counter-clockwise
#
[KevinMarks]
Google had a neat peano curve that favoured land for linearising latlong
#
aaronpk
has no idea what Tantek is trying to do
#
aaronpk
I can't read maps rotated 90 degrees
#
[KevinMarks]
Cities are sparse at higher latitudes in either direction, so there may be a cheat there
#
aaronpk
Oh I figured it out
#
aaronpk
I would order them by first appearance ;-)
#
[KevinMarks]
I have a project that is gently abusing the UK national grid, which mostly works until I make a mistake and get 0 latitude somehow and everything is super distorted
#
[tantek]
aaronpk, yes it will be roughly by appearance. turns out mixing that with rough longitude clustering works too
#
[KevinMarks]
The UK national grid is seductive as the country is small enough that you can treat it as flat and use metres and calculate distances in 2d with pythagoras without the errors getting big enough to matter, and angles just work.
#
[tantek]
related, is there any canonical algorithm for coloring cities? I know Dopplr had one
#
[KevinMarks]
And the channel islands are just about in range, but if you include Gibraltar things go dodgy.
#
vika_nezrimaya
oh wait I don't think need CouchDB...
#
[KevinMarks]
How often are you rewriting things, as opposed to appending them?
#
[KevinMarks]
Tantek's bim model may suit you if you're primarily working by date
#
[KevinMarks]
What Simon has been doing with sqlite may also be an option if your data is more read than written https://simonwillison.net/2017/Nov/13/datasette/
#
[tantek]
darn it I don't want to "install" anything I just want to browse it online
#
[tantek]
that's almost valid CASSIS
#
[KevinMarks]
2 if you don't export it
#
aaronpk
And 200mb of dependencies and a build environment
#
[KevinMarks]
var md5 = require("md5");
#
[KevinMarks]
const blueshift = cityName => `#${md5(cityName).substr(0, 6)}`;
#
[tantek]
yeah, I wonder how much that md5 "require" pulls in
#
[tantek]
well at least it's in php
#
[tantek]
Dopplr had pages for cities ... that are in the internet archive
#
aaronpk
Aw I miss dopplr
#
[tantek]
I think I'm going to just get city colors one at a time from the internet archive instead of yakshaving re-implementing the Dopplr color algorithm (which if someone does, please put it up as a web service 🙂 )
#
Loqi
misses dopplr too
#
aaronpk
First 6 of the md5 hash? I bet I can write that on my phone while I wait for this plane to take off
#
[tantek]
uh the code is literally in the logs up there ^^^ it's not about writing it, it's about getting it to execute and deploying 😂
#
[KevinMarks]
In any language with useful libraries (ie not js) it's no trouble
#
[tantek]
waits for citycolor.p3k.io/San_Francisco
#
[tantek]
aaronpk, since you first had no idea and then figured it out, here's the list based on that roughly clustered by longitude heuristic I mentioned: https://indieweb.org/cities#IndieWebCamp_Cities_By_Region
#
aaronpk
good luck doing that on a phone in node ;-)
#
aaronpk
will take longer to get nice URLs or a nice hostname tho ;-)
#
[tantek]
aaronpk++ :exploding_head:
#
Loqi
aaronpk has 43 karma in this channel over the last year (197 in all channels)
#
[tantek]
well that answers that question, München ist Munich
#
[tantek]
kind of an amazing coincidence that /Portland 's color is red like a rose and /San_Francisco 's color is almost orange like the Golden Gate bridge though also kinda pink/fuchsia
[timothy_chamber joined the channel
#
Loqi
ok, I added "https://pin13.net/city-color.php" to the "See Also" section of /cities https://indieweb.org/wiki/index.php?diff=64348&oldid=64344
#
[tantek]
aaronpk now bordered with colors from your city-color service: https://indieweb.org/cities#IndieWebCamp_Cities_By_Region
#
aaronpk
Haha that needs a little spacing or something
#
[tantek]
that's not supposed to be graphical, that's just a list and a way to stick the colors in there that didn't harm legibility
#
aaronpk
It kinda does because it kills the white space between the lines
#
Ruxton
you could just put a lil coloured box after each name too -- <span style="display: inline-block; border:solid 2px #0ef8f8;"></span>
#
[tantek]
or before
#
Ruxton
yeah use it fr list dot
#
aaronpk
before 👍
#
Ruxton
much prefer before alos
#
Ruxton
*also
#
aaronpk
Better!
[Rose], Tevya, IWSlackGateway, KartikPrabhu, [tonz] and cweiske joined the channel
#
aaronpk
according to this, the plan is to move tumblr to wordpress as the backend! https://poststatus.com/automattic-has-purchased-tumblr/
loicm, KartikPrabhu and [tantek] joined the channel
#
[tantek]
presumably that's what Automattic has optimized to operationalize so it makes a lot of sense
#
[tantek]
one giant export, giant import into wordpress backends, then throw the DNS switch
#
aaronpk
don't forget about the tumblr dashboard
#
aaronpk
and figuring out what reblogging looks like in wordpress?
#
aaronpk
and how to deal with comments? because I dont' think the mechanics of comments map 1:1
#
aaronpk
sounds like a huge project!
[grantcodes] joined the channel
#
[grantcodes]
It's also something they probably couldn't do without the new editor. They can provide a custom UI for Tumblr using standard WordPress functionality
#
[grantcodes]
Or they could use the rest API
#
[tantek]
the tumblr dashboard will become the Reader that WordPress never had
#
[grantcodes]
WordPress has always had a reader?
#
[tantek]
since when? it's never come up in the IndieWeb community (people have installed various reader plugins, but nothing built-in)
#
[grantcodes]
To .com not self hosted
#
[tantek]
so Automattic has a reader, not WordPress then (nothing open source)
IWSlackGateway and [tantek] joined the channel
#
[tantek]
will be interesting to see which will be made to support arbitrary feeds so it can 'read' the other
[grantcodes] joined the channel
#
[grantcodes]
WordPress.com is still WordPress. But Tumblr will be the same I assume it won't be open source
#
[tantek]
naw, WordPress.com is WordPress plus a bunch of plugins, themes, and other services
#
[tantek]
by flipping all the Tumblr backends to WordPress backends, they'll be able to say an even higher % of the web runs on WordPress
#
[tantek]
after operational efficiency, I'd say that's a likely goal (acquisition as a path to marketshare)
#
[grantcodes]
You should say WordPress.org for what you're meaning. I think that's how it's generally separated.
#
[grantcodes]
Or WordPress core is the naming usually used for the actual main open source code.
[Rose], [Lewis_Cowles], jeremych_, [KevinMarks], jgmac1106, [pfefferle], [jgmac1106], [xavierroy], vika_nezrimaya, IWSlackGateway, [tantek], loicm and nloadholtes joined the channel
#
[tantek]
via sl007: ActivityPub Conference 2019 – Speakers / Talks announced https://redaktor.me/apconf/ (requires JS to view)
#
[tantek]
Some good topics worth broader discussion here too (noting here in the logs since content may disappear due to js;dr): “Keeping Unwanted Messages off the Fediverse” / Spam, scams and harassment pose a threat to all social networks, “Decentralised Hashtag Search and Subscription in Federated Social Networks”
#
[tantek]
"Privacy is becoming more and more central in shaping the future of tech and the data protection legislation has contributed significantly to making this happen. Privacy by default and design are core principles that are fundamental to how software should be envisioned. "
#
[tantek]
“The case for the unattributed message” / Despite it's significant contribution to internet culture, the archetype of the anonymous image board has been largely ignored by protocol designers.
#
[tantek]
More talk description details (without JS) in cwebber's blog post: https://dustycloud.org/blog/activitypub-conf-2019-speakers/
#
jeremycherfas
!tell zegnat Is there any reason not to download that CSS I was calling from the cloud and serve it myself, rather than do the integrity check?
#
Loqi
Ok, I'll tell them that when I see them next
#
[tantek]
jeremycherfas, definitely worth serving subresources yourself first. measure and optimize for download perf as a separate task ("calling from the cloud" is like one possible aspect of that)
#
jeremycherfas
Thanks. For context, this is a CSS made available by a CSS framework. I don't want to incorporate it into my build process (because I don't have one) and so I am working with the CDN while I develop. But I am unlikely to need all of it in production.
[jgmac1106], IWSlackGateway, valuemachine, rainmanj_, jackjamieson, zoglesby, [tantek], KartikPrabhu and [snarfed] joined the channel
#
jackjamieson
[snarfed] I have a quick question about granary. When I convert my twitter timeline to HTML using granary, some tweets are given a u-category consisting of the author's h-card. Why is u-category used here?
#
KartikPrabhu
what is person-tag?
#
Loqi
A person tag (AKA people tag) is a person mention that is also a tag on a post that refers to a specific person by URL rather than just a word or phrase, and is done as an explicit tagging action by the user, beyond just mentioning a person via hyperlink / h-card / or @-name, autocompleted or not https://indieweb.org/person-tag
#
KartikPrabhu
jackjamieson: maybe that ^
#
jackjamieson
When I parse those tweets through my Microsub server they end up with a category property that causes alltogethernow.io to error
#
KartikPrabhu
hmm I suppose Together should fail gracefully
#
jackjamieson
Thanks KartikPrabhu, that makes sense
#
jackjamieson
Looks like that's it. My understanding is that Together has fairly strict expectations for data structures, but I agree it would be better to fail gracefully. I'll create an issue in Together
#
KartikPrabhu
yeah. for consuming microformats it is better to not be very strict, or at least not throw up errors if the data structure is not recognized
#
jackjamieson
I should also see how Aperture handles that case, since the flow of twitter->granary->aperture->Together works as far as I can tell
#
Loqi
it is probable
[schmarty] and nilocDiscord[m] joined the channel
#
[tantek]
anything that consumes microformats needs to handle the fact that there may always be more structured data than they needed / wanted
#
[snarfed]
tantek++
#
Loqi
tantek has 22 karma in this channel over the last year (118 in all channels)
#
sknebel
I guess with together the issue it that it interprets jf2, which mostly isn't supposed to have that, and thus expects the microsub endpoint to figure out how to deal with that
[grantcodes] joined the channel
#
[grantcodes]
Evening! So together does support categories. But I think it expects the category property to only contain strings at the moment, not objects
#
[snarfed]
lol bridgy has struggled with that before too, when it's hit objects in the url property. https://github.com/snarfed/bridgy/issues/511
#
Loqi
[snarfed] #511 OPD: handle composite url properties?
#
[snarfed]
i don't think we determined whether that was kosher, but evidently parsers do return it, eg for <span class="u-url h-card">
sblinnDiscord[m] joined the channel
#
[grantcodes]
That's sort of the goal of jf2 though isn't it? To be more opinionated / strict on data structures
#
[snarfed]
sure? this ^ is for normal mf2. i haven't followed the micropub/jf2 specific parts of the conversation
jackjamieson joined the channel
#
[snarfed]
=> #microformats
jackjamieson joined the channel
#
GWG
By the way, Microsub has strict rules because the server is supposed to normalize, not the client
#
GWG
I had this whole conversation with aaronpk about that when I was doing the same thing
jackjamieson joined the channel
#
GWG
So, in my opinion, the Microsub backend should get the issue, not Together
jackjamieson, loicm and [schmarty] joined the channel; truthDiscord[m] left the channel
#
jackjamieson
Thanks all for the additional context. I'll take care of this in Yarns then. GWG, do you reckon such cleaning should be part of Parse-This, or further downstream in Yarns?
#
sknebel
it's still a good question how to express a person tag in jf2
#
GWG
jackjamieson, normalization should be Parse This
#
jackjamieson
GWG, Thanks, I'll shift my issue over to Parse-This. I started writing a function to address this in Yarns so I'll adapt that into a PR
#
GWG
Perfect
#
alexmarcus[m]
What is Together?
#
Loqi
Together is a social reader that was initially conceived at the 2017 IndieWeb Summit in Portland by Jonathan LaCour and several others during the Putting it all together session https://indieweb.org/Together
#
alexmarcus[m]
Thanks Loqi
#
Loqi
you're welcome, alexmarcus[m]
#
jackjamieson
Sknebel, good point about expressing a person tag in jf2. To fit the current jf2 spec, one approach could be to simply take either the 'url' or 'name' property from the h-card. I'm open to suggestions if there are existing conventions etc.
#
jackjamieson
Also, regarding my original question about granary, the FAQ at the bottom of https://indieweb.org/person-tag states that Twitter handle links should not be person-tags
#
GWG
You also have references in jf2
#
[tantek]
correct. the "tag" aspect of "person-tag" is the *user* *explicit* action of tagging.
#
[tantek]
not some backendy automatic thing
#
KartikPrabhu
I think twitter does support person-tags, and the @-mentions would simply count as mentions then
#
[tantek]
correct
#
[tantek]
even if Twitter *didn't* support person-tags, the @-mentions are still *only* mentions
eli_oat joined the channel
#
jackjamieson
GWG, I think you're right that a reference makes sense. If my understanding is correct then a person-tag containing an h-card should be reduced to its URL, and the references property can contain the full h-card
#
GWG
I think that is how the jf2 spec solves it
#
[tantek]
seems reasonable
#
jackjamieson
I'm off for now but I'll try to write that PR this week
[snarfed] joined the channel
#
[snarfed]
ah good point jackjamieson tantek, twitter @-mentions should indeed be mentions, as opposed to actual person tags in photos
#
[snarfed]
i'll fix that in granary