#dev 2024-08-19
2024-08-19 UTC
oodani, AramZS, bterry, thegreekgeek_, srijan, ttybitnik, [Joe_Crawford] and [aciccarello] joined the channel
# [aciccarello] The number of times I've plugged localhost URLs into microformats parsers expecting a result is to damn high π
beanbrain, gRegor and [tantek] joined the channel
# capjamesg[d] [KevinMarks] How much do you know about NoSQL databases?
# capjamesg[d] I have been implementing one and I'm a bit stuck on one piece.
# capjamesg[d] I learned about the concept of a Global Secondary Index that indexes a particular key in all your documents, so you can search them faster.
# capjamesg[d] If lots of documents have the same value for that key, checking to find all documents with that key is super fast.
# capjamesg[d] Then to check if documents start with something, you could construct a prefix-tree (trie) and use that for lookup.
# capjamesg[d] But I'm not sure about how to check if a key _contains_ something.
# capjamesg[d] Suppose I have a NoSQL database with five million documents. I want to check if `text`, which contains 1,000 words in each document, contains `coffee cats`. What would be the best way to do that search efficiently?
# capjamesg[d] My hunch is that for short strings, an efficient string search algorithm is sufficient.
# capjamesg[d] And for long strings, you should build a reverse index of the words in `text`?
# capjamesg[d] Ah, maybe you'd need a global index that maps every word in `text` to its doc id. Then you can look that up once!
# capjamesg[d] Ahhhh!
# capjamesg[d] I think I get it now.
# capjamesg[d] (But if any of this doesn't make sense, let me know!)
# capjamesg[d] I just wrote a long blog post on all of this and I was really stuck on this point.
[snarfed] joined the channel
# capjamesg[d] [snarfed] That's what I'm building!
# capjamesg[d] I wanted to know tools like Elasticsearch work behind the scenes, so I'm trying to build one for myself π
# capjamesg[d] I don't plan to use it for production, but it has been fun to tinker.
# capjamesg[d] Across 200,000 documents, a contains and starts_with query now takes 0.1s on my Mac with what I have built so far.
# ptramo[d] capjamesg[d] total payload bytes?
# capjamesg[d] Yeah.
# capjamesg[d] It seems like document stores are lots of indices π
# ptramo[d] capjamesg[d] a fun problem is stemming, ie returning documents containing "searched" when looking up "searching"
# capjamesg[d] Ahhh I'm not that far yet π
# capjamesg[d] ptramo[d]
# capjamesg[d] Oops!
# capjamesg[d] @Ο++
# capjamesg[d] ptramo[d] ++
# capjamesg[d] There we go!
# ptramo[d] So my new job involves building automated web agents. Been having lots of fun building an obstacle course (https://challenge.xmit.dev). I'd be curious to get a list of problems around lack of accessibility to add test cases for them
jonnybarnes joined the channel
# capjamesg[d] ptramo[d] A link on the page where you get an incorrect answer back to the home page would be ideal.
# capjamesg[d] So I could keep using my keyboard to navigate the page.
# capjamesg[d] This is very cool by the way!
# capjamesg[d] It would be great if these items were spaced further apart:
# capjamesg[d] It is hard for me to read and understand them.
# capjamesg[d] Also, the white and black text is jarring in dark mode. Using an off-white or off-black colour makes text easier to read.
# ptramo[d] capjamesg[d] I don't set colors, blame your browser π¦
# capjamesg[d] [Murray] At least you figured it out before going to bed!
gRegorLove_ and bterry joined the channel
# ptramo[d] capjamesg[d]: Ah so agents that don't know how to navigate the history get penalized but not stuck⦠why not. Do you know have cmd-back to go back with your keyboard though?
# ptramo[d] * not have. Ctrl-left on Linux/windows and cmd-left on Mac?
# ptramo[d] Hmmm for spacing I don't know what would be most appropriate. a
{ padding: 0.25em }
maybe?bret joined the channel