#dev 2021-07-16

2021-07-16 UTC
[tw2113_Slack_], angelo_, Seirdy, [jacky], jeremycherfas, KartikPrabhu, jjuran, capjamesg and hendursaga joined the channel
# 07:55 
@mauricerenck If you use one of my @getkirby plugins, this may be interesting for you. I wrote down how I will continue working on the Podcaster, Komments and Webmention plugins. There will be breaking changes and some new features:

https://maurice-renck.de/blog/2021/update-kirby-plugins (twitter.com/_/status/1415941856203051008)
petermolnar and hendursaga joined the channel
# 08:20 
capjamesg More of a CS question than antyhing but I'm wondering about the efficiency of Python's "in" keyword. Are there any resources on this?
# 08:21 
capjamesg If I had a list of, say, 1000 items, would a binary search be faster than in?
# 09:16 
oenone for lists, average would be O(n) for in: https://wiki.python.org/moin/TimeComplexity
# 09:19 
capjamesg Thanks oenone.
# 09:19 
oenone binary search can be faster, for example: https://towardsdatascience.com/binary-search-in-python-is-it-faster-b960c74c3d11
# 10:05 
sknebel The kind of thing I generally wouldn't worry about unless you have a profiling result showing it matters
# 10:14 
sknebel (performance-tuned python gets kind of ugly quickly. I've done it a bit for mf2py to help bridgy performance, but only after measuring ;))
capjamesg, alex11, alex_, jjuran_, Saphire and [KevinMarks] joined the channel
# 13:55 
[KevinMarks] with python, you're better off switching data structure to one that suits the task - dictionary or set are O(1) for in
# 13:57 
capjamesg KevinMarks. Very good.
# 13:58 
capjamesg The use case was to crawl all links on a page on my blog for my search engine. If one is discovered that is not in the sitemap, that link should be added to the to-crawl list.
# 13:58 
capjamesg But doing so involved checking if a link had already been crawled first before adding to the to-crawl list.
# 13:58 
capjamesg Which I could achieve by using a dictionary.
# 13:58 
capjamesg So that would definitely be more efficient.
# 14:01 
[KevinMarks] yes, and set or dictionary dedupe too
# 14:02 
[KevinMarks] same pattern works in javascript as well, but js objects can be a bit more complicated than python dicts
# 14:05 
[KevinMarks] a common pattern is reference counting, so you do `d[url]=d.get(url,0)+1` then you have a dict with url and number of reference to it, which you can then maybe sort by count to decide which to crawl first
[Will_Monroe] and [tw2113_Slack_] joined the channel
# 14:37 
capjamesg Since this is just for my blog -- and the crawl time is only a few mins -- I don't think reference counting is necessary. But that is very, very interesting. I'll keep that in mind!
[snarfed] joined the channel
# 15:03 
[snarfed] anyone good at working with images? I’m looking for a way to automatically generate an alpha channel for a given image based on a background color, eg based on white in http://localhost/bridgy_logo.jpg, and get a full alpha channel (eg in a PNG version), not just on/off. any ideas?
# 15:03 
[snarfed] (I don’t have Photoshop etc, so ideally command line, eg ImageMagick, or web based)
# 15:03 
[snarfed] er sorry, make that https://snarfed.org/bridgy_logo.jpg
KartikPrabhu joined the channel
# 15:13 
aaronpk I think I have done that with the bridgy logo before actually
# 15:22 
[snarfed] ahaha yes i actually figured, and looked at the b&w version on the bottom of https://indieweb.org/
# 15:22 
[snarfed] any chance you remember how?
# 15:27 
aaronpk here you go [snarfed] https://media.aaronpk.com/bridgy_logo.png
# 15:27 
aaronpk all cleaned up
capjamesg joined the channel
# 15:30 
aaronpk done in Affinity Photo, my attempt to use less Adobe products :)
# 15:48 
[snarfed] aaronpk++ woo thank you!!!
# 15:48 
Loqi aaronpk has 50 karma in this channel over the last year (129 in all channels)
# 15:48 
capjamesg Any good reads on how search engines decide when to crawl content? I'm implementing recursion to immediately crawl any URL that is discovered. But this doesn't seem elegant or like the most practical solution.
# 15:49 
capjamesg Before this, I stored all URLs in a dict but I couldn't change it because you can't change a dict in a for loop in Python.
# 15:53 
[snarfed] wait, what? sure you can
# 15:54 
capjamesg (you can't change a dict you are iterating over already)
# 15:55 
capjamesg Basic logic is this: find URLs in sitemap, crawl page, if new URL is discovered I want to add it to the "to crawl" list, add information about page to DB, go on to next. And so on and so on.
# 15:55 
capjamesg The "to crawl" dict is already being iterated over
# 15:56 
[snarfed] ah. you can, it just may change the iteration
# 15:56 
[snarfed] no matter
# 15:57 
capjamesg I was getting a runtime error.
# 15:58 
capjamesg dictionary changed size in iteration
# 16:02 
capjamesg Ah, figured it out.
# 16:03 
capjamesg I now use a list to iterate and that list refers back to objects in the dict (which I need for in statements later).
# 16:03 
capjamesg snarfed++
# 16:03 
Loqi snarfed has 24 karma in this channel over the last year (48 in all channels)
# 16:03 
capjamesg (for making me challenge my logic)
# 16:10 
@DocFreemo RT @wiktor@metacode.biz
Keyoxide could’ve supported IndieAuth tags: https://codeberg.org/keyoxide/keyoxide-web/issues/97

This would allow you to login to OpenID sites using your OpenPGP key (I did that to leave authenticated Wordpress comments).

In general there are quite some interesting features (1/4) (twitter.com/_/status/1416066739096457220)
capjamesg joined the channel
# 17:02 
capjamesg Well, thanks to my tendency to overengineer my blog search engine now supports reading robots.txt directives and link discovery within articles (even though all of my links are always in my sitemap).
# 17:03 
capjamesg And now my code is quite messy so I need to tidy it.
[KevinMarks], [aciccarello], [schmarty] and alex11 joined the channel
# 17:28 
capjamesg I have a whole lot more respect for all the work that goes into Google now.
# 17:28 
capjamesg And more respect for Copilot by GitHub, which, to my surprise, has been a bit helpful.
[schmarty] and barnaby joined the channel
# 17:57 
capjamesg What is Lynx?
# 17:57 
Loqi It looks like we don't have a page for "Lynx" yet. Would you like to create it? (Or just say "Lynx is ____", a sentence describing the term)
# 17:57 
capjamesg Lynx is a text-based browser that has been in development since 1992.
# 17:57 
Loqi ok
alex11, barnaby, capjamesg, [jacky], [jeremycherfas], [jeremycherfas]1 and [KevinMarks] joined the channel
# 23:37 
angelo_ what is SingleFileZ?
# 23:37 
Loqi It looks like we don't have a page for "SingleFileZ" yet. Would you like to create it? (Or just say "SingleFileZ is ____", a sentence describing the term)
# 23:37 
angelo_ SingleFileZ is a tool that allows you to save a webpage as a self-extracting HTML file.
# 23:37 
Loqi ok
angelo joined the channel
# 23:41 
angelo SingleFileZ << https://github.com/gildas-lormeau/SingleFileZ
# 23:41 
Loqi ok, I added "https://github.com/gildas-lormeau/SingleFileZ" to a brand new "See Also" section of /SingleFileZ https://indieweb.org/wiki/index.php?diff=76429&oldid=76428