#dev 2021-07-21

2021-07-21 UTC
sp1ff and KartikPrabhu joined the channel
#
vikanezrimaya
question: do your webmention endpoints set a custom user-agent header when fetching replies for verification?
#
vikanezrimaya
yay, my webmention endpoint prototype can correctly generate an update command for Micropub! the only thing left is sending it to the endpoint
#
Loqi
yay!
#
vikanezrimaya
and here I need to make a decision - the endpoint should probably support being used for multi-website setups, and as such I will need to enable Kittybox (the main software) to issue an all-powerful token to the webmention endpoint so it could interact with all the websites hosted on a single Kittybox instance
#
vikanezrimaya
I got a security vulnerability. More like, a huge hole in the wall right next to the locked door.
#
vikanezrimaya
I forgot to check that the user updating or deleting a post matches the post author!!!
shoesNsocks1 and [schmarty] joined the channel
#
vikanezrimaya
Yeah. In current version, if you manage to register another website on the instance and gain a token for it, you can do action=delete&url=https://fireburn.ru/posts/something to delete my posts!
#
vikanezrimaya
Thankfully there'
#
vikanezrimaya
s only 5 lines of code needed to fix it... and also 5 lines of code to add support for an internal token that bypasses this restriction
#
[schmarty]
the trouble with (mega)tokens
#
vikanezrimaya
Obviously the token needs to be kept safe
#
vikanezrimaya
anyway I now check that `user.me` matches the URL-to-be-deleted's origin (because Kittybox is supposed to control the / of the website, it's a reasonable assumption)
#
vikanezrimaya
if they don't match, it gives a 403
#
vikanezrimaya
Wow. designing software that's secure against a malicious fellow user is hard!
#
vikanezrimaya
it's a nice learning experience tho
#
vikanezrimaya
and i've added a unit test to prevent such a failure from occuring in the futures
#
vikanezrimaya
And all of that happened because the scope checking code was a straight port from the older version that was not designed with multi-user support in mind
alex11, Jh7579, [pfefferle], capjamesg and [tw2113_Slack_] joined the channel
#
Loqi
ok, I added "https://whoosh.readthedocs.io/en/latest/intro.html" to the "See Also" section of /search https://indieweb.org/wiki/index.php?diff=76467&oldid=76409
hendursa1, Harry1, [pfefferle] and capjamesg joined the channel
#
vikanezrimaya
Whoosh, huh. Sounds neat.
#
capjamesg
Anyone doing search for a blog should check our sqlite's full text search. It's so powerful.
#
capjamesg
I was thinking of moving to elasticsearch but that was going to be a big hassle. elasticsearch would provide me with a bit more control over weights until I found out sqlite's full text search 5 supports weights via the BM25 algorithm.
rockorager and [tantek] joined the channel
#
[tantek]
sounds like a good use-case for sqlite!
#
capjamesg
Now I can weigh meta descriptions, titles, categories, etc.
#
capjamesg
Which is important because a keyword in a title / h1 should be weighed much heavier than a keyword in a URL slug.
#
capjamesg
Performance is great on a small scale. I can't speak for growing it but for me it works really well.
hendursaga joined the channel
#
aaronpk
capjamesg++ very cool
#
Loqi
capjamesg has 3 karma in this channel over the last year (4 in all channels)
#
aaronpk
I need to replace elasticsearch with this on my site since it's become a bit of a burden
#
capjamesg
I have 338 pages indexed and 245 images indexed. The last crawl took 02:28 (on a low-powered cloud machine).
#
capjamesg
(But a lot more goes on than just adding pages to the DB :) )
#
sknebel
aaronpk: whatever database you are using probably has something similar, pretty much everyone has added something in that direction
#
sknebel
(although I don't know if you current feed the content into the DB or just metadata)
#
capjamesg
One thing I think could be interesting is a "links" search that lets you search through external links.
#
capjamesg
Not that I have many though.
#
aaronpk
Last time I looked MySQL didn't have a good full text search like Postgres has. But it's been a few years
#
aaronpk
and no, currently the full text of my posts is not in MySQL
#
capjamesg
Postgres' full text search looks interesting.
#
capjamesg
They have ranking / balancing too.
martijnvdven[d], [tw2113_Slack_], [pfefferle], [grantcodes] and hendursaga joined the channel
#
petermolnar
postgres is powerful, but sqlite imo is a way nicer solution, given it's portability.
[schmarty] joined the channel
#
[schmarty]
Would love to see an IndieWeb self-hosted search tool that leverage this https://phiresky.github.io/blog/2021/hosting-sqlite-databases-on-github-pages/
#
Zegnat
I love SQLite for search. I also used in-memory SQLite with full-text search to combine multiple searches into one. (Run search queries on multiple indieweb sites, push all the results into SQLite, then rerun the search query and sort by rank.)
#
petermolnar
see my call for a standard for publishing search corpus for a static site in sqlite from a few days ago
#
capjamesg
schmarty technically -- and this is a leap -- my crawler could run on any site.
#
capjamesg
I'd have to make a few adjustments but it has robots.txt / sitemap identification support so it would work on any well-structured site.
#
capjamesg
But it's a whole flask app :D
#
Zegnat
Note that the chat search (https://indiechat.search.cweiske.de/) run by cweiske uses a self hostable and open source engine as well: https://github.com/cweiske/phinde
#
capjamesg
Yeah. cweiske wrote a good blog post on that.
#
Zegnat
petermolnar: I must have missed that, I declared chat bankruptcy on -dev
#
Loqi
[petermolnar] (from -chat) re search: I wonder if there's a way to somehow "distribute" search. There's the whole opensearch idea, but now that I'm thinking about it, it wouldn't easily help to index pages at all. If instead there was a way to pre-generate an inde...
#
Zegnat
Interesting idea.
capjamesg joined the channel
#
Zegnat
Though I still like my previous experiment. I would scan a site for their opensearch. Then if I want to search multiple sites, I run the same search query against all their opensearch endpoints and mix the results together. That way the “search engine” never needs to have the data and you only need to make sure your own local search is uptodate
#
capjamesg
What is opensearch?
#
Loqi
OpenSearch is a specification for search engine discovery and syndication of search results https://indieweb.org/OpenSearch
#
Zegnat
(opensearch might not actually be the best for this, but given as an example standard)
#
capjamesg
Is opensearch still supported by major browsers Zegnat?
#
petermolnar
it is in FF
#
petermolnar
although FF is less and less of a major browser
#
petermolnar
Zegnat: the opensearch approach would be very nice for a search over small amounts of sites. I was aiming on how we could empower "indie" search engines that could easily be put together by, say, checking a meta header for a search db, pulling the db, and using it as another source. For example to create a search for all sites ever authed with indieauth.
#
Zegnat
I guess I do not know if I would trust that to hold longterm in any way. While the opensearch route works as people (hopefully) test their own local searches more. But yeah, definitely interesting.
#
petermolnar
the problem was highlighted in the response: trust.
#
petermolnar
but that's always a problem.
#
petermolnar
capjamesg next level: add opensearch :)
#
Zegnat
You could also put the micropub JSON or microformats2 parser output of every public post in a zip file. Or really, any export format could work. I am not sure if we are talking about a search specific thing at that point, or if we should just think of it as a making-available of all the public contents in one dump
#
petermolnar
the sqlite db is full text search and indexed already
#
petermolnar
unlike the other formats
#
Zegnat
Sure, but nearly the same comments apply. I am not sure it matters whether the indexing work has already run on my machine before making the dump available, or if you have to do it?
#
Zegnat
I am going to think a bit more about it on my walk home from the office, back later :D
#
capjamesg
petermolnar any good docs to take a look at?
#
petermolnar
what is opensearch?
#
Loqi
OpenSearch is a specification for search engine discovery and syndication of search results https://indieweb.org/OpenSearch
#
petermolnar
plus there's https://github.com/petermolnar/petermolnar.net/blob/master/opensearch.xml , the meta entry as `<link rel="search" type="application/opensearchdescription+xml" href="/opensearch.xml <view-source:https://petermolnar.net/opensearch.xml>" title="petermolnar.net">` and see the XML section of https://github.com/petermolnar/petermolnar.net/blob/master/search.php
capjamesg joined the channel
#
capjamesg[d]
Thanks petermolnar.
#
capjamesg[d]
Loving the Discord bridge by the way petermolnar 🙂
IWSlackGateway, lahacker[d] and [schmarty] joined the channel
#
[schmarty]
gRegorLove and others, WP 5.8 finally has the plugin feature to specify where updates come from (so private plugins can't be so easily squatted) https://make.wordpress.org/core/2021/06/29/introducing-update-uri-plugin-header-in-wordpress-5-8/
[calumryan] joined the channel
#
Zegnat
Nice to see that ticket getting some closure
#
Zegnat
petermolnar: I think the main issue I bumped into with OpenSearch was that it was not clear if there was a standardised way to retrieve the results
#
GWG
[schmarty]: gRegor knows and was thrilled
#
Zegnat
petermolnar: for https://github.com/Zegnat/indiewebsearchengineproofofconcept I defaulted to using only WP sites, who actually had a search API. but if the same could be done with OpenSearch, that would be awesome
capjamesg, [arush] and [tw2113_Slack_] joined the channel
#
petermolnar
in theory, my search.php will return values for wp-like search
#
petermolnar
that's the reason why I have the json portion of that file
#
petermolnar
except it's not under the rest route
#
petermolnar
I can probably fix that though
#
petermolnar
oh, look what I found: http://whatnamespace.net/ <http://whatnamespace.net/#opensearch> - in case Zegnat wants to express something that ends up impossible with h- things
#
Zegnat
uMatrix is blocking the opensearch link if I try to open it in the browser :P
gRegor joined the channel
#
petermolnar
capjamesg, Zegnat: the <link> pointing at the opensearch xml tells someone where the rest of the opensearch meta info is; in that meta info - the xml itself- , there's definition on where to go to fetch search results and what response to expect. Following the preferred URL along with the search params will return the results.
#
petermolnar
So if one were to modify Zegnat's code to fetch the opensearch xml and then pick a Url from there it should work.
#
Zegnat
I think I started with that, actually, during the indiewebcamp. But very few of the sites had opensearch, and even if they had, there was no real way to get the search results because I could not find a standardised results format
#
petermolnar
well... the standard is probably whatever WP returns.
#
petermolnar
for JSON.
#
petermolnar
for XML... there's probably a namespace for that.
#
Zegnat
"there’s probably a namespace for that" is something I need on a tshirt
jeremycherfas joined the channel
#
[tw2113_Slack_]
teehee
#
petermolnar
after a brief research, most of the xml results are either rss or atom.
#
petermolnar
and yes, that t-shirt is a must.
jeremycherfas and IWSlackGateway joined the channel
#
gRegor
+1 would buy
rockorager, gRegor and jeremycherfas joined the channel