#dev 2021-12-29

2021-12-29 UTC
#
marksuth[d]
as much as anything it was the most popular of the suggested names on the renaming section, it’s also sufficiently unique as not to cause any clashes with other things & relatively easy to pronounce/spell
Allie left the channel
#
marksuth[d]
I believe it is credit to @chrisbergr for suggesting the name initially
gRegor joined the channel
#
GWG
I agree that the Pass part of it confused me initially
wackycity[d], gRegor, jjuran, akevinhuang, jeremycherfas, [snarfed], jessealama[d], [aciccarello] and cygnoir joined the channel
#
@ariadneconill
↩️ the market doesn’t care about SOLiD. it doesn’t care about IndieAuth. it doesn’t care about ActivityPub or WebMention or any of the other technologies that could actually shift the power dynamic toward the people. instead we get new masters, who just happen to have money.
(twitter.com/_/status/1476064609387917313)
Cygnoir[d] joined the channel
#
jjuran
Anyone know who the "new masters" are in that tweet?
#
[tantek]
coiners
[fluffy] joined the channel
#
capjamesg[d]
I have a difficult problem. I have created a search index that creates a reverse index of web documents per https://www.tbray.org/ongoing/When/200x/2003/06/18/HowSearchWorks and its proposed schema (but with json instead of xml).
#
capjamesg[d]
I am a bit stuck on how to create ranking factors.
#
capjamesg[d]
The algorithm above works for finding text.
#
capjamesg[d]
But, I can't figure out how to boost with word counts without reading every record saved.
#
capjamesg[d]
Do I need to keep the whole index in memory, including factors like word counts?
#
capjamesg[d]
Actually, I could build all scores when the program is first run.
#
capjamesg[d]
The disadvantage is that I would have to rebuild the index when a new record is added.
#
capjamesg[d]
That isn’t too bad though for now.
#
petermolnar
> I can't figure out how to boost with word counts without reading every record saved
#
petermolnar
I don't think that's avoidable.
#
petermolnar
*something* needs to read the records to create the index
#
capjamesg[d]
Elasticsearch allows for query time custom weights.
#
capjamesg[d]
I could build indexes for each value. So word counts could be queried as an index.
#
capjamesg[d]
But I may be overengineering. I don’t need this yet.
#
jessealama[d]
would it help to use Solr?
[tonz] joined the channel
#
capjamesg[d]
Thanks for the suggestion! I hadn’t thought about looking at Solr.
#
capjamesg[d]
I am trying to build something barebones to learn about search.
#
jessealama[d]
ah, I see, understood. Solr has some boosting stuff that might be a good match for your use case. The indices are stored in a binary format, but there's an API that you can use to get rather low-level information about the current state of the system
#
capjamesg[d]
Thank you for sharing!
#
capjamesg[d]
What does it mean that indices are stored in binary?
#
capjamesg[d]
Do you know what the benefit is of that approach?
tetov-irc joined the channel
#
petermolnar
if you want to go truly barebones, imo https://whoosh.readthedocs.io/en/latest/intro.html is a good start
#
Zegnat
I cheated on the ranking last time I did a search experiment. I got a limited number of search results through dumb means, and then of those results read the full texts into an in-memory SQLite FTS5-enabled db (this is surprisingly fast). By running the same search query against the minimal SQLite set and retrieving results ordered by Rank, I made SQLite decide which pages were the “best” for the search
#
capjamesg[d]
Thanks for sharing Zegnat!
#
Zegnat
It basically assumes that people only want 1 page of results, and that querying the big data set is smart enough to get the somewhat correct results for the page of results, and that ranking can be figured out as a second step. It does mean that ranking is decided on the fly and the big db never needs to be reindexed for changing in the ranking algo
#
petermolnar
from the whoosh doc, their default ranking algo: https://en.wikipedia.org/wiki/Okapi_BM25
#
capjamesg[d]
Is there an easy way to flatten a tree?
#
capjamesg[d]
(not the literal kind haha)
#
capjamesg[d]
I have a tree of words like "james" -> "coffee" -> "blog".
#
capjamesg[d]
If I type "james", I'd like to get the other two no
#
capjamesg[d]
*the other two nodes and their children
#
petermolnar
what is turtle?
#
Loqi
Turtle is nothing like XML https://indieweb.org/Turtle
#
petermolnar
our wiki isn't always graceful
#
petermolnar
but it's also not what I was after
#
capjamesg[d]
(the use case is autocomplete, I should have added)
#
Zegnat
An easy way to flatten a tree it probably to run through the tree with your code, and thus very language specific?
#
capjamesg[d]
Thanks for your help everyone!
#
capjamesg[d]
petermolnar++
#
Loqi
petermolnar has 4 karma in this channel over the last year (34 in all channels)
#
Loqi
Zegnat has 10 karma in this channel over the last year (28 in all channels)
#
capjamesg[d]
petermolnar I am using TF/IDF for now but have had great success with BM25 In the past when working with SQLite.
juanchi, KartikPrabhu, [jacky] and MarkJR84[d] joined the channel
#
@FalconSensei
↩️ I went ahead and did the migration: https://geekosaur.com/ is officially using @eleven_ty instead of Hugo now. Still have to work on the design and include a few things (like Webmentions), but now I can create pots in the new blog without worry
(twitter.com/_/status/1476236490296418309)
#
jeremycherfas
Do any micropub clients have an API?
sp1ff joined the channel
#
capjamesg[d]
What do you want to build jeremycherfas?
#
jeremycherfas
I've been poking around in NewsBlur to see how its share-to services operate, and they are really rather simple, at least on the surface. (Busy writing up a post now). It structures a post to the service that is specific to that service. If there were a client that accepted such a request, perhaps it could send the request on to a site with an endpoint.
#
jeremycherfas
I think that might be easier than sending a request direct to a site's micropub endpoint, although if the user could add the url of the endpoint and a token, perhaps that would work too.
#
Zegnat
The API is Micropub, so I am not sure there is a Micropub client that in its turn exposes Micropub. Feels like inception of some sorts, haha
#
petermolnar
as Zegnat said: micropub itself is an api.
#
[KevinMarks]
You could try with commentpara.de first as that is relaxed about auth
#
petermolnar
what is playlist
#
Loqi
A playlist represents a collection of audio or video of some significance to the list's creator https://indieweb.org/playlist
#
Zegnat
I do not think commentpara.de does micropub, does it?
#
Zegnat
jeremycherfas: I was just looking, and Quill has bookmarklets for different post types where it puts data like url, content, and name for bookmarklets in a Quill URL.
#
Zegnat
So you might be able to use those. Link to Quill‘s bookmark page, and if you are logged in to Quill, it will post to your own site?
#
Zegnat
I apparently cannot test it right now, it looks like Sink has died from neglect so I have no micropub endpoint
[jeremycherfas] joined the channel
#
[jeremycherfas]
I was looking at the Quill docs as that seemed like the best starting point, but I hadn’t thought to look at the bookmarklets. Thanks [Zegnat]
alephalpha0, angelo and tetov-irc joined the channel
#
[chrisaldrich]
Quill bookmarklets using URL structures is a clever trick. Reminiscent of being able to use WordPress' structured URLs to create posts with many of the parts already filled in.
#
[chrisaldrich]
GWG, could it be possible for one to create posts with WP query parameters that target its micropub endpoint?
#
[chrisaldrich]
I've been slowly trying to nudge services like IFTTT to add micropub support... :uphill_climbing_emoji: