#dev 2024-08-29
2024-08-29 UTC
cuibonobo, aaaa, geoffo, thegreekgeek_, claudinec and [Otto_Rask] joined the channel
# capjamesg[d] [snarfed] Durability is a world unto itself!
# capjamesg[d] I have an idea on how my database could provide atomic durability on write() operations.
# capjamesg[d] 1. Write to a journal. 2. Apply the change in memory. 3. Write exact changes made to the index to the log. 4. Delete the journal file.
# capjamesg[d] This naturally constricts operations to a single thread, but the system doesn't work with multiple threads. I'm not sure how to make that work right now.
aaaa, aciditypossum[d], lazcorp, AramZS and [Otto_Rask]1 joined the channel
Kaguneh, [Otto_Rask]2, zicklepop, [schmarty] and [Joe_Crawford] joined the channel
# sebbu https://bsky.app/profile/npub14yct5d2heclcc3cmmurk4gu7pvluz8k2rc2l5rx4a44eav5aww9qpxe4fj.momostr.pink.ap.brid.gy && https://mastodon.social/@npub14yct5d2heclcc3cmmurk4gu7pvluz8k2rc2l5rx4a44eav5aww9qpxe4fj@momostr.pink vs https://primal.net/p/npub14yct5d2heclcc3cmmurk4gu7pvluz8k2rc2l5rx4a44eav5aww9qpxe4fj
[mattl], btrem and [schmarty] joined the channel
# capjamesg[d] https://trends.google.com/trends/embed/explore/TIMESERIES?eq=date%3Dall&geo=&q=we+should+improve+society+somewhat&req=%7B%22comparisonItem%22%3A%5B%7B%22keyword%22%3A%22we+should+improve+society+somewhat%22%2C%22geo%22%3A%22%22%2C%22time%22%3A%22all%22%7D%5D%2C%22category%22%3A0%2C%22property%22%3A%22%22%7D&tz=0
AsherVo and gRegor joined the channel
# capjamesg[d] [snarfed] I added journaling and state rebuilding: https://github.com/capjamesg/jamesql/commit/7c31650dc0881134a791ddd0ee77d63308043d25#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R610
# capjamesg[d] [edit] [snarfed] I added journaling and state rebuilding: https://github.com/capjamesg/jamesql/commit/7c31650dc0881134a791ddd0ee77d63308043d25#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R610
to2ds joined the channel
# capjamesg[d] This took me a _long time_ to figure out.
# capjamesg[d] It sort of works how SQLite does, where the database file is the single source of truth and the journal records pending transactions.
# capjamesg[d] I read ~half of the LinkedIn post you sent. It was incredibly helpful.
# capjamesg[d] The idea of having a separate log entity to which all data publishers send data and to which services can subscribe is fascinating.
# capjamesg[d] I guess a distributed version of my code above could be: (i) have two servers running; (ii) one is a primary, the other is a secondary; (iii) all index requests get sent to a central log, and messages are sent to all servers to update their indices; (iv) each server keeps track of the last record it indexed, so mechanisms can be built for services to re-request data that has been lost in transit.
# capjamesg[d] Presumably (iv) has many algorithms that could be used around acknowledgements, etc.
# capjamesg[d] I don't have a use case for it and it's a bit far out of my realm of understanding right now, but suffice to say this is a fascinating topic!
Annika joined the channel
# [tantek] posting link(s) here to move the chat from #indieweb to here: https://chat.indieweb.org/2024-08-29/1724960250557500
# Loqi [preview] [[tantek]] capjamesg[d], on the topic of "edit suggestion" posts (user-friendlier renaming/reframing of "pull request"), which the original author could "accept suggestion" (e.g.f for fixing typos etc.) I did write this over a decade ago: https://tantek.com/201...
# [tantek] what if I could have done those with "edit suggestion" responses on my own site, in reply to his posts, which then only he and I by default could see, until he did an "accept suggestion" on my response, which then edited his post automatically, and recognized the contribution in the edit history of that post?
# [tantek] mattl, not GH-specific, but apparently GH-invented per https://news.ycombinator.com/item?id=11096061
superkuh_ joined the channel
# jimw https://github.com/martinvonz/jj is what i was thinking of.
# [mattl] OpenBSD is also looking at https://gameoftrees.org
[Murray] joined the channel
# [Murray] With the standard http://webmention.io API, is there a way to fetch for pages with and without a trailing slash at once? I'm realising that my implementation is missing some if people are/aren't using them. Happy to ping it twice, but if possible would obviously prefer a single request
# capjamesg[d] [snarfed] I wrote some learnings in https://jamesg.blog/2024/08/29/transaction-log-jamesql/.
# capjamesg[d] [edit] [snarfed] I wrote some learnings in https://jamesg.blog/2024/08/29/transaction-log-jamesql/.
# capjamesg[d] I realised while writing it that I don't have a good way to deal with delete operations.
# capjamesg[d] Is it valid to have a hash table of record UUIDs mapped to their start and end bytes in the file then, on delete, open the main index file, seek to the start, replace with empty spaces until the end, and save the changes?
# capjamesg[d] (And is my understanding that would be faster than writing the whole index correct?)
zicklepop joined the channel
# Loqi βοΈβοΈ An edit (AKA diff, change) is a special type of reply that indicates a set of suggested changes to the post it is replying to. A collection of (presumably related) suggested edits in open source is often called a patch or pull request https://indieweb.org/edit
# capjamesg[d] [snarfed] What would you recommend?
# capjamesg[d] Thank you!!!
# capjamesg[d] I'm definitely looking to be pointed in the right direction so I can figure out the lay of the land.
# capjamesg[d] Hm...
# capjamesg[d] "Since SSTables are immutable, the problem of permanently removing deleted data is transformed to garbage collecting obsolete SSTables. Each tabletβs SSTables are registered in the METADATA table. The master removes obsolete SSTables as a mark-and-sweep garbage collection [25] over the set of SSTables, where the METADATA table contains the set of roots"
# capjamesg[d] This makes sense.
# capjamesg[d] You could have an index that describes all operations, append a delete operation to the disk index. If the server is built from the index, operations can be applied top to bottom. A GC could run that reads all records, finds data / fields that would later be deleted and removes it from the index.
# capjamesg[d] But I really need to read this whole thing to understand what's going on.
# capjamesg[d] And storing all data in a single file doesn't seem like the best idea in the world.
# capjamesg[d] I suppose that at scale the database would be bottlenecked by locks and file write times on that one file that you are trying to.
# capjamesg[d] (This is simultaneously an exercise in *what do I need* vs. what's possible.)
# capjamesg[d] In looking at Elasticsearch, their disk indices are split up across lots of different files.
# capjamesg[d] [snarfed] This really is a whole world unto itself.
# capjamesg[d] And I guess if data is split up across lots of different files, garbage collection can happen asynchronously.
# capjamesg[d] I'm reading GCP's overview of Bigtable's garbage collection now.
# aaronpk this deserves its own blog post but here is an <audio> tag that highlights lyrics in the HTML below, and with no JS you can still play the audio and read the lyrics you just get no highlighting https://aaronparecki.com/2024/08/29/9/oauth-oh-yeah
# capjamesg[d] Information retrieval and storage are fascinating problems!
# capjamesg[d] I have the Designing Data Intensive Applications book in the other room. I should start reading it.
CRISPR joined the channel
# [KevinMarks] [tantek] I was thinking of the various accept/reject etc verbs. In AS1. Maybe they were using a post for the edit and relying on the recipient to diff.
CRISPR joined the channel
CRISPR joined the channel
# [KevinMarks] Expressing an edit as a diff is non trivial