#dev 2023-04-01

2023-04-01 UTC
IWSlackGateway, geoffo, [schmarty], [snarfed], gxt__, angelo and [tantek] joined the channel
# 08:50 
@roytang New blog post: Webmention Spam  https://roytang.net/p/21cnqkib/ (twitter.com/_/status/1642085711900983296)
# 08:56 
IWDiscordRelay <c​apjamesg#4492> I have seen the same thing too ^
# 08:56 
IWDiscordRelay <c​apjamesg#4492> Maybe it’s pingbacks?
# 09:01 
IWDiscordRelay <L​unarequest#0122> By any chance ia the website rumnycircle?
strugee joined the channel
# 09:47 
IWDiscordRelay <c​apjamesg#4492> I’m seeing it from numerous different site Lunarequest[d].
# 09:47 
IWDiscordRelay <c​apjamesg#4492> I wonder if there is a way to separate pingbacks from webmentions on webmention.io…
# 09:56 
IWDiscordRelay <L​unarequest#0122> Hmmm funky, i suspected it was a website called rummycircle since they do shit like that and sms bomb peoplr
# 09:56 
IWDiscordRelay <L​unarequest#0122> [edit] Hmmm funky, i suspected it was a website called rummycircle since they do shit like that and sms bomb people
[jamietanna] joined the channel
# 11:58 
[jamietanna] https://twitter.com/aakashg0/status/1641976906982498310 is fun for those of us using Bridgy Publish
# 11:58 
@aakashg0 3. Links hurt, unless you have enough engagement Generally external links get you marked as spam. Unless you have enough engagement. https://pbs.twimg.com/media/Fsl5SwFXoAAolQx.jpg (twitter.com/_/status/1641976906982498310)
[schmarty] joined the channel
# 13:31 
aaronpk At this point I would probably recommend just not putting the ping back header on your site
[TMichelleMoore], geoffo, chenghiz_, [pfefferle], nertzy, [tw2113_Slack_], [KevinMarks] and [snarfed] joined the channel
# 17:11 
[snarfed] Oh btw [tantek] re Bridgy Twitter Publish going away and whether you should build Twitter image upload into Falcon...
# 17:12 
[snarfed] It's not clear whether that will even be possible. As far as we can tell, they plan to turn off the v1.1 API entirely, and v2 doesn't have media upload yet. https://twittercommunity.com/t/i-got-a-error-when-i-posted-statuses-update-by-a-new-free-access-level/188995/2?u=schnarfed
# 17:12 
[snarfed] They probably don't deliberately mean to remove API image upload, but... 🤡
geoffo and jjuran_ joined the channel
# 17:38 
aaronpk are you serious
# 17:39 
capjamesg "Technically v1.1 is announced to be deprecated in 30 days" wtf
# 17:39 
capjamesg "technically"
# 17:42 
[snarfed] they said deprecated, but as far as we can tell they mean actually turned off
# 17:43 
[snarfed] but again, 🤡
# 17:44 
[KevinMarks] https://xoxo.zone/@KevinMarks/110120700195232020
# 17:44 
[snarfed] yup KevinMarks++
# 17:44 
Loqi KevinMarks has 14 karma in this channel over the last year (54 in all channels)
# 17:44 
Loqi [preview] [Kevin Marks] Elon tomorrow: I'm pleased to announce that we finally managed to get the clown car out of the gold mine.
[chrisbergr] joined the channel
# 17:54 
sknebel https://martymcgui.re/2023/04/01/this-week-in-the-indieweb-audio-edition--march-25th---31st-2023/
# 17:55 
sknebel gets a preview of "Show/Hide Transcript..." in -streams and yeah, that's what the microformats text form starts with. wonder if thats something that can be improved
geoffo joined the channel
# 18:01 
capjamesg halts the Taylor Swift music to listen to TWITWAE
geoffo joined the channel
# 18:50 
[KevinMarks] https://hachyderm.io/@danilo/110124959698749322
# 18:51 
Loqi [preview] [Danilo] Took about a week of steady work to gain feature parity, but I've got SvelteKit generating a static site via Netlify, and Ghost has been replaced.What a robust community Svelte has. I found backup on nearly everything I needed, from date formatters t...
# 19:11 
capjamesg bkil [KevinMarks] [snarfed] What is the best way to rank documents with attributes that have different data types?
# 19:11 
[snarfed] Sorry, I don't actually know much about IR or search
# 19:12 
capjamesg I have two attributes: vector similarity and time. I want to weigh records published in the last 60 days more than the rest, but not exclude documents further back.
# 19:13 
bkil Or you using some existing library or system perhaps such as Elasticsearch or is this full-custom?
# 19:13 
[KevinMarks] The basic idea is that you convert them into weighting factors and multiply them
# 19:13 
[KevinMarks] Making it clear waht is going on in results is harder.
# 19:14 
capjamesg How do you do that [KevinMarks]? I struggled with this in IndieWeb Search.
# 19:14 
capjamesg bkil I have a faiss vector datastore and a JSON file that maps the vectors to documents.
# 19:15 
[KevinMarks] With technorati, we had post tables with different time horizons. We'd look in the "last 24 hours" one first, then the last week, then last 28 days, then full.
# 19:15 
[KevinMarks] This was based on sorting by date primarily, and having very variable in links.
# 19:16 
capjamesg I wanted to ask my Bot something along the lines of "based on things I have said recently in the IndieWeb chat, suggest blog post topics" but the IR mechanism doesn't care about dates.
# 19:16 
capjamesg Thus, the result was suboptimal.
# 19:16 
capjamesg How would you convert them to weighing factors?
# 19:17 
bkil You could also consider using a logarithmic time scale.
# 19:17 
capjamesg Can you elaborate?
# 19:20 
bkil I'm just guessing. Haven't implemented my own search engine yet, but am thinking about it. https://en.wikipedia.org/wiki/Temporal_information_retrieval#Time-aware_retrieval/ranking_models_(T-RModels)
# 19:23 
bkil So basically as per the multiplication mentioned by [KevinMarks] , you may multiply the existing weight by the (inverse) of the logarithm of the age of the given document. What base you use for your logarithm is subject for experimentation of course.
# 19:24 
bkil But I also like the binned approximation above - would probably be less resource intensive as well.
# 19:26 
capjamesg I see.
# 19:26 
capjamesg I'll give it a shot!
# 19:26 
capjamesg bkil++
# 19:26 
Loqi bkil has 2 karma in this channel over the last year (3 in all channels)
# 19:26 
capjamesg This is _so helpful_!
# 19:33 
bkil I'd be delighted to see your source if it's public. I collect links to such implementations: https://github.com/bkil/freedom-fighters/blob/master/en/service/web-search.md#user-content-foss https://github.com/bkil/freedom-fighters/blob/master/en/server/backend-optional-portal-search.md#user-content-references
# 19:35 
capjamesg The config required is non-trivial if you want to run it locally, but here's the code where I get info from the vector index: https://github.com/capjamesg/llm-chatbot/blob/main/PromptManager.py#L67
# 19:36 
capjamesg I could add a line to get the date published (if available) for each source (which are in the schema[i]["date"] values).
# 19:36 
capjamesg Then I'd need to do the actual ranking.
# 19:36 
capjamesg i[0] is an ordered list of indices for items in schema, where the order is the similarity between the user's query (as an embedding vector) and items in the vector data store.
# 19:36 
[KevinMarks] The binned idea works when you have a lot of variation in results for different queries - eg for Technorati we were returning links to an article, and an NYT article would have a lot of links on the day compared to a blog post from 2 days ago
# 19:36 
capjamesg *I[0]
IWDiscordRelay, mouse[d], IWDiscord, geoffo, IWSlackGateway and [tantek] joined the channel