#dev 2022-07-18

2022-07-18 UTC
nertzy, petermolnar, alex11, jonnybarnes, lagash, geoffo, [Murray], tetov-irc, [KevinMarks] and AramZS joined the channel; petermolnar left the channel
#
capjamesg_
I want to run the IndieWeb Search crawler across multiple computers. How can I do this?
#
capjamesg_
I know very little about deployment but think k8s / Docker might be in the realm of that for which I am looking.
#
capjamesg_
Basically, I want an easy way to control computers running the crawler so I can suspend / rebuild / start crawlers easily.
#
[KevinMarks]
what's it written in?
capjamesg1 joined the channel
#
capjamesg1
Python
jonnybarnes and [manton] joined the channel
#
[manton]
Docker and Kubernetes may be overkill to start with. Is it a script you run? I would start by manually running the processes you need on multiple servers and then automate the management of them when you need to. (This is Ruby-specific, but I use Sidekiq across a few servers to manage background tasks. Probably something similar for Python.)
#
aaronpk
Agreed, it's more a matter of how you organize tasks than an ops or language question
#
[manton]
Yep. I think we’d need to know more about your current setup… Also, if you’re not already running multiple crawlers on a single server, that would be another place to start.
#
GWG
[manton]: How goes the location stuff?
#
[manton]
Pretty good, I think! Here’s a video sneak peek of my testing: https://www.manton.org/uploads/2022/b93d55f153.mov
#
GWG
[manton]: Fun. Also, new jsonfeed plugin for WordPress..now with next_url
#
[manton]
@GWG I saw that, great!
#
GWG
Maintaining it is not that big a chore
#
GWG
My latest work is calendar related
#
GWG
https://wpdev.gwg.us/5782/?jthdate=1 - I am still writing unit tests so this isn't in production yet
#
GWG
Or has a pretty URL
#
capjamesg_
[manton] The crawler is multi-threaded so I'm limited in how many instances I can run by resources.
#
capjamesg_
I have a messaging queue set up for relaying domains to crawl and some other things.
#
capjamesg_
I can, after adding the right config, set up a crawler and have it send data to the main system from any machine. But I wanted to figure out how to automate this.
#
capjamesg_
Also for learning so :)
#
GWG
I was expecting snarfed would pop up because I mentioned unit tests
alex11 and jacky joined the channel
#
jacky
I got my foursquare export data!
#
jacky
[manton]: if it's helpful, I can send you it for testing
#
jacky
I have no problem with that
#
jacky
it's quite small (1.1 mb)
#
jacky
this is what the file listing looks like https://imgur.com/a/yTqzKsp
geoffo and jonnybarnes joined the channel
#
[manton]
[jacky] I'm glad it's working for someone! No need to send it, but thanks. I have an old one from a few years ago… Maybe something is busted with my account now.
#
jacky
gotcha!
jacky, jonnybarnes, [tantek], win0err, [KevinMarks]1, [Murray], tetov-irc and [tw2113_Slack_] joined the channel