#dev 2021-07-16

2021-07-16 UTC
[tw2113_Slack_], angelo_, Seirdy, [jacky], jeremycherfas, KartikPrabhu, jjuran, capjamesg and hendursaga joined the channel
#
@mauricerenck
If you use one of my @getkirby plugins, this may be interesting for you. I wrote down how I will continue working on the Podcaster, Komments and Webmention plugins. There will be breaking changes and some new features: https://maurice-renck.de/blog/2021/update-kirby-plugins
(twitter.com/_/status/1415941856203051008)
petermolnar and hendursaga joined the channel
#
capjamesg
More of a CS question than antyhing but I'm wondering about the efficiency of Python's "in" keyword. Are there any resources on this?
#
capjamesg
If I had a list of, say, 1000 items, would a binary search be faster than in?
#
oenone
for lists, average would be O(n) for in: https://wiki.python.org/moin/TimeComplexity
#
capjamesg
Thanks oenone.
#
sknebel
The kind of thing I generally wouldn't worry about unless you have a profiling result showing it matters
#
sknebel
(performance-tuned python gets kind of ugly quickly. I've done it a bit for mf2py to help bridgy performance, but only after measuring ;))
capjamesg, alex11, alex_, jjuran_, Saphire and [KevinMarks] joined the channel
#
[KevinMarks]
with python, you're better off switching data structure to one that suits the task - dictionary or set are O(1) for in
#
capjamesg
KevinMarks. Very good.
#
capjamesg
The use case was to crawl all links on a page on my blog for my search engine. If one is discovered that is not in the sitemap, that link should be added to the to-crawl list.
#
capjamesg
But doing so involved checking if a link had already been crawled first before adding to the to-crawl list.
#
capjamesg
Which I could achieve by using a dictionary.
#
capjamesg
So that would definitely be more efficient.
#
[KevinMarks]
yes, and set or dictionary dedupe too
#
[KevinMarks]
same pattern works in javascript as well, but js objects can be a bit more complicated than python dicts
#
[KevinMarks]
a common pattern is reference counting, so you do `d[url]=d.get(url,0)+1` then you have a dict with url and number of reference to it, which you can then maybe sort by count to decide which to crawl first
[Will_Monroe] and [tw2113_Slack_] joined the channel
#
capjamesg
Since this is just for my blog -- and the crawl time is only a few mins -- I don't think reference counting is necessary. But that is very, very interesting. I'll keep that in mind!
[snarfed] joined the channel
#
[snarfed]
anyone good at working with images? I’m looking for a way to automatically generate an alpha channel for a given image based on a background color, eg based on white in http://localhost/bridgy_logo.jpg, and get a full alpha channel (eg in a PNG version), not just on/off. any ideas?
#
[snarfed]
(I don’t have Photoshop etc, so ideally command line, eg ImageMagick, or web based)
KartikPrabhu joined the channel
#
aaronpk
I think I have done that with the bridgy logo before actually
#
[snarfed]
ahaha yes i actually figured, and looked at the b&w version on the bottom of https://indieweb.org/
#
[snarfed]
any chance you remember how?
#
aaronpk
all cleaned up
capjamesg joined the channel
#
aaronpk
done in Affinity Photo, my attempt to use less Adobe products :)
#
[snarfed]
aaronpk++ woo thank you!!!
#
Loqi
aaronpk has 50 karma in this channel over the last year (129 in all channels)
#
capjamesg
Any good reads on how search engines decide when to crawl content? I'm implementing recursion to immediately crawl any URL that is discovered. But this doesn't seem elegant or like the most practical solution.
#
capjamesg
Before this, I stored all URLs in a dict but I couldn't change it because you can't change a dict in a for loop in Python.
#
[snarfed]
wait, what? sure you can
#
capjamesg
(you can't change a dict you are iterating over already)
#
capjamesg
Basic logic is this: find URLs in sitemap, crawl page, if new URL is discovered I want to add it to the "to crawl" list, add information about page to DB, go on to next. And so on and so on.
#
capjamesg
The "to crawl" dict is already being iterated over
#
[snarfed]
ah. you can, it just may change the iteration
#
[snarfed]
no matter
#
capjamesg
I was getting a runtime error.
#
capjamesg
dictionary changed size in iteration
#
capjamesg
Ah, figured it out.
#
capjamesg
I now use a list to iterate and that list refers back to objects in the dict (which I need for in statements later).
#
capjamesg
snarfed++
#
Loqi
snarfed has 24 karma in this channel over the last year (48 in all channels)
#
capjamesg
(for making me challenge my logic)
#
@DocFreemo
RT @wiktor@metacode.biz Keyoxide could’ve supported IndieAuth tags: https://codeberg.org/keyoxide/keyoxide-web/issues/97 This would allow you to login to OpenID sites using your OpenPGP key (I did that to leave authenticated Wordpress comments). In general there are quite some interesting features (1/4)
(twitter.com/_/status/1416066739096457220)
capjamesg joined the channel
#
capjamesg
Well, thanks to my tendency to overengineer my blog search engine now supports reading robots.txt directives and link discovery within articles (even though all of my links are always in my sitemap).
#
capjamesg
And now my code is quite messy so I need to tidy it.
[KevinMarks], [aciccarello], [schmarty] and alex11 joined the channel
#
capjamesg
I have a whole lot more respect for all the work that goes into Google now.
#
capjamesg
And more respect for Copilot by GitHub, which, to my surprise, has been a bit helpful.
[schmarty] and barnaby joined the channel
#
capjamesg
What is Lynx?
#
Loqi
It looks like we don't have a page for "Lynx" yet. Would you like to create it? (Or just say "Lynx is ____", a sentence describing the term)
#
capjamesg
Lynx is a text-based browser that has been in development since 1992.
alex11, barnaby, capjamesg, [jacky], [jeremycherfas], [jeremycherfas]1 and [KevinMarks] joined the channel
#
angelo_
what is SingleFileZ?
#
Loqi
It looks like we don't have a page for "SingleFileZ" yet. Would you like to create it? (Or just say "SingleFileZ is ____", a sentence describing the term)
#
angelo_
SingleFileZ is a tool that allows you to save a webpage as a self-extracting HTML file.
angelo joined the channel
#
Loqi
ok, I added "https://github.com/gildas-lormeau/SingleFileZ" to a brand new "See Also" section of /SingleFileZ https://indieweb.org/wiki/index.php?diff=76429&oldid=76428