#dev 2024-08-29

2024-08-29 UTC
cuibonobo, aaaa, geoffo, thegreekgeek_, claudinec and [Otto_Rask] joined the channel
#
capjamesg[d]
[snarfed] Durability is a world unto itself!
#
capjamesg[d]
I have an idea on how my database could provide atomic durability on write() operations.
#
capjamesg[d]
1. Write to a journal. 2. Apply the change in memory. 3. Write exact changes made to the index to the log. 4. Delete the journal file.
#
capjamesg[d]
This naturally constricts operations to a single thread, but the system doesn't work with multiple threads. I'm not sure how to make that work right now.
aaaa, aciditypossum[d], lazcorp, AramZS and [Otto_Rask]1 joined the channel
#
sebbu
now my mastodon and nostr account can talk
Kaguneh, [Otto_Rask]2, zicklepop, [schmarty] and [Joe_Crawford] joined the channel
#
sebbu
2 bridges consecutively :D
[mattl], btrem and [schmarty] joined the channel
AsherVo and gRegor joined the channel
#
[snarfed]
capjamesg++
#
Loqi
capjamesg has 44 karma in this channel over the last year (205 in all channels)
to2ds joined the channel
#
capjamesg[d]
This took me a _long time_ to figure out.
#
capjamesg[d]
It sort of works how SQLite does, where the database file is the single source of truth and the journal records pending transactions.
#
capjamesg[d]
I read ~half of the LinkedIn post you sent. It was incredibly helpful.
#
capjamesg[d]
The idea of having a separate log entity to which all data publishers send data and to which services can subscribe is fascinating.
#
capjamesg[d]
I guess a distributed version of my code above could be: (i) have two servers running; (ii) one is a primary, the other is a secondary; (iii) all index requests get sent to a central log, and messages are sent to all servers to update their indices; (iv) each server keeps track of the last record it indexed, so mechanisms can be built for services to re-request data that has been lost in transit.
#
capjamesg[d]
Presumably (iv) has many algorithms that could be used around acknowledgements, etc.
#
capjamesg[d]
I don't have a use case for it and it's a bit far out of my realm of understanding right now, but suffice to say this is a fascinating topic!
#
[tantek]
is anyone here interested in microcopy for more user friendly renaming/reframing of "developer-ish" features? e.g. the things devs take for granted as user features on GitHub like "fork", "pull request", and "merge" which are all jargon
#
[tantek]
I really do think there are much more user-appealing alternatives (thought of a few in today's espresso chat with capjamesg[d]) and may blog about them
#
[tantek]
provided a few in #indieweb until Loqi yelled at me for "implementation" lol
Annika joined the channel
#
[tantek]
posting link(s) here to move the chat from #indieweb to here: https://chat.indieweb.org/2024-08-29/1724960250557500
#
Loqi
[preview] [[tantek]] capjamesg[d], on the topic of "edit suggestion" posts (user-friendlier renaming/reframing of "pull request"), which the original author could "accept suggestion" (e.g.f for fixing typos etc.) I did write this over a decade ago: https://tantek.com/201...
#
Loqi
[preview] [[tantek]] inspired by IRL asking [KevinMarks] to fix mistypings of what I demo'd at IWC PDX when he was live-tooting the demo session
#
[tantek]
what if I could have done those with "edit suggestion" responses on my own site, in reply to his posts, which then only he and I by default could see, until he did an "accept suggestion" on my response, which then edited his post automatically, and recognized the contribution in the edit history of that post?
#
[mattl]
"pull request" is also GitHub-specific language, I think?
#
jimw
yes. the similar process on gitlab is referred to as merge requests, for example.
#
[tantek]
mattl, not GH-specific, but apparently GH-invented per https://news.ycombinator.com/item?id=11096061
#
[tantek]
though the action/verb applies to any DVCS system
superkuh_ joined the channel
#
[mattl]
Yeah, and at this point in time Git is basically the only DVCS people use? Maybe Mercurial? All the others seemed to fall out of favor.
#
jimw
Perforce is still popular with gamedev companies, I think.
#
[mattl]
I think Perforce isn't distributed, like CVS and SVN.
#
[mattl]
Certainly lots of things still use CVS and SVN too. I think most of the *BSD operating systems are all CVS.
#
jimw
Yeah, WordPress still uses SVN, I think.
#
jimw
I think Facebook/Meta uses something of their own but it is or was built on top of the same format as git.
#
[tantek]
good to know there's no particular reason to stick with the not only dev-centric but git-specific(!) phrase "pull request" since it apparently originated from git command line "git request-pull"
#
[mattl]
They were using Phabricator, I think.
#
epoch
I didn't know `git request-pull` was a thing. kind of figured the phrase "pull request" was an informal thing like, just asking someone to pull from a version of the git repo that you were hosting.
#
jimw
https://github.com/martinvonz/jj is what i was thinking of.
#
Loqi
[preview] [martinvonz] jj: A Git-compatible VCS that is both simple and powerful
#
jimw
ah, it's from someone who works at google, not meta. bad memory.
#
[mattl]
OpenBSD is also looking at https://gameoftrees.org
[Murray] joined the channel
#
[Murray]
With the standard http://webmention.io API, is there a way to fetch for pages with and without a trailing slash at once? I'm realising that my implementation is missing some if people are/aren't using them. Happy to ping it twice, but if possible would obviously prefer a single request
#
[Murray]
actually, typing that out made me realise that there's a multipage API isn't there, seems to do the trick πŸ˜…
#
Loqi
[preview] Implementing a transaction log for JameSQL
#
Loqi
[preview] Implementing a transaction log for JameSQL
#
capjamesg[d]
I realised while writing it that I don't have a good way to deal with delete operations.
#
capjamesg[d]
Is it valid to have a hash table of record UUIDs mapped to their start and end bytes in the file then, on delete, open the main index file, seek to the start, replace with empty spaces until the end, and save the changes?
#
capjamesg[d]
(And is my understanding that would be faster than writing the whole index correct?)
#
aaronpk
wow has really nobody made a progressively enhanced lyrics highlighter javascript library?
#
aaronpk
i want an <audio> tag and some lyrics below, and i want each line to highlight in sync with the audio player
#
aaronpk
but i don't want to double up the lyrics in a sidefile or javascript
zicklepop joined the channel
#
[snarfed]
capjamesg for serious uses, probably not, that wouldn't handle either concurrency or scaling well
#
[snarfed]
but you get to choose your req'ts!
#
[tantek]
aaronpk, sounds like karaoke πŸ™‚
#
[tantek]
what is edit
#
Loqi
βœ‚οΈβœοΈ An edit (AKA diff, change) is a special type of reply that indicates a set of suggested changes to the post it is replying to. A collection of (presumably related) suggested edits in open source is often called a patch or pull request https://indieweb.org/edit
#
capjamesg[d]
[snarfed] What would you recommend?
#
[tantek]
[KevinMarks] what was the AS1 activity you were thinking of for an edit response to someone else's post? no mention of such on /edit and web searching for anything in AS1 seems to yield any results
#
[snarfed]
capjamesg again too big and broad a question, and depends too much on the details of your req'ts, to answer succinctly here
#
[snarfed]
if you want some good background, find the original Google papers for Bigtable and/or SSTable, lots of old school storage eng wisdom there
#
capjamesg[d]
Thank you!!!
#
capjamesg[d]
I'm definitely looking to be pointed in the right direction so I can figure out the lay of the land.
#
[snarfed]
also the storage chapter(s) of any database textbook
#
capjamesg[d]
"Since SSTables are immutable, the problem of permanently removing deleted data is transformed to garbage collecting obsolete SSTables. Each tablet’s SSTables are registered in the METADATA table. The master removes obsolete SSTables as a mark-and-sweep garbage collection [25] over the set of SSTables, where the METADATA table contains the set of roots"
#
capjamesg[d]
This makes sense.
#
aaronpk
karaoke usually shows you only one or two lines of the song at a time, i want the entire lyrics on the page, and just highlight one line at a time
#
capjamesg[d]
You could have an index that describes all operations, append a delete operation to the disk index. If the server is built from the index, operations can be applied top to bottom. A GC could run that reads all records, finds data / fields that would later be deleted and removes it from the index.
#
capjamesg[d]
But I really need to read this whole thing to understand what's going on.
#
capjamesg[d]
And storing all data in a single file doesn't seem like the best idea in the world.
#
[snarfed]
not if you need to scale in either data volume or traffic
#
[snarfed]
which you might not! ok either way
#
capjamesg[d]
I suppose that at scale the database would be bottlenecked by locks and file write times on that one file that you are trying to.
#
capjamesg[d]
(This is simultaneously an exercise in *what do I need* vs. what's possible.)
#
capjamesg[d]
In looking at Elasticsearch, their disk indices are split up across lots of different files.
#
aaronpk
well that was a lot of work for what is mostly a shitpost but i got it working
#
capjamesg[d]
[snarfed] This really is a whole world unto itself.
#
capjamesg[d]
And I guess if data is split up across lots of different files, garbage collection can happen asynchronously.
#
capjamesg[d]
I'm reading GCP's overview of Bigtable's garbage collection now.
#
[snarfed]
yeah you have done quite the tour of substantial computer science fields over the last year or so 😁
#
aaronpk
this deserves its own blog post but here is an <audio> tag that highlights lyrics in the HTML below, and with no JS you can still play the audio and read the lyrics you just get no highlighting https://aaronparecki.com/2024/08/29/9/oauth-oh-yeah
#
capjamesg[d]
Information retrieval and storage are fascinating problems!
#
capjamesg[d]
I have the Designing Data Intensive Applications book in the other room. I should start reading it.
CRISPR joined the channel
#
[KevinMarks]
[tantek] I was thinking of the various accept/reject etc verbs. In AS1. Maybe they were using a post for the edit and relying on the recipient to diff.
CRISPR joined the channel
#
[tantek]
[KevinMarks] I'm going to post about the UX and how it could work with microformats h-entry etc for the specific use-case of offering content corrections, and then see if someone replies on Masto
CRISPR joined the channel
#
[KevinMarks]
Expressing an edit as a diff is non trivial
#
[tantek]
I mean Wikipedia/MediaWiki does it
#
[tantek]
as does Git