#sknebelgets a preview of "Show/Hide Transcript..." in -streams and yeah, that's what the microformats text form starts with. wonder if thats something that can be improved
geoffo joined the channel
#capjamesghalts the Taylor Swift music to listen to TWITWAE
#Loqi[preview] [Danilo] Took about a week of steady work to gain feature parity, but I've got SvelteKit generating a static site via Netlify, and Ghost has been replaced.What a robust community Svelte has. I found backup on nearly everything I needed, from date formatters t...
#capjamesgbkil [KevinMarks] [snarfed] What is the best way to rank documents with attributes that have different data types?
#[snarfed]Sorry, I don't actually know much about IR or search
#capjamesgI have two attributes: vector similarity and time. I want to weigh records published in the last 60 days more than the rest, but not exclude documents further back.
#bkilOr you using some existing library or system perhaps such as Elasticsearch or is this full-custom?
#[KevinMarks]The basic idea is that you convert them into weighting factors and multiply them
#[KevinMarks]Making it clear waht is going on in results is harder.
#capjamesgHow do you do that [KevinMarks]? I struggled with this in IndieWeb Search.
#capjamesgbkil I have a faiss vector datastore and a JSON file that maps the vectors to documents.
#[KevinMarks]With technorati, we had post tables with different time horizons. We'd look in the "last 24 hours" one first, then the last week, then last 28 days, then full.
#[KevinMarks]This was based on sorting by date primarily, and having very variable in links.
#capjamesgI wanted to ask my Bot something along the lines of "based on things I have said recently in the IndieWeb chat, suggest blog post topics" but the IR mechanism doesn't care about dates.
#bkilSo basically as per the multiplication mentioned by [KevinMarks] , you may multiply the existing weight by the (inverse) of the logarithm of the age of the given document. What base you use for your logarithm is subject for experimentation of course.
#bkilBut I also like the binned approximation above - would probably be less resource intensive as well.
#capjamesgi[0] is an ordered list of indices for items in schema, where the order is the similarity between the user's query (as an embedding vector) and items in the vector data store.
#[KevinMarks]The binned idea works when you have a lot of variation in results for different queries - eg for Technorati we were returning links to an article, and an NYT article would have a lot of links on the day compared to a blog post from 2 days ago