#capjamesg[d]Odd question: does anyone know anything about ideal cloud computing for compute-heavy operations (preferably that will not break the bank!).
tetov-irc, rockorager and jamietanna joined the channel
#jamietannacapjamesg[d] give hetzner.cloud or Scaleway a go for potentially cheaper - but often to get compute heavy, it's gonna be fairly expensive :(
#sknebelcapjamesg[d]: can you be more specific? do you need something to run 24/7, what limit are you hitting right now, ...
#capjamesg[d]sknebel I am just thinking about scale.
#capjamesg[d]The way things are going I could feasibly index a few thousand pages per hour with the PythonAnywhere server I am using.
#capjamesg[d](But that server is limited by CPU usage so I'm only using it to relieve my computer of some work)
#capjamesg[d]This is indexing from IndieMap WARC files, not the web itself.
#capjamesg[d]Say I wanted to index 100 IndieWeb sites. That would take at least 10 days 😄 And that's from a file, not the web 🙂
#sknebelok, but python anywhere probably is fairly limited, so I'd just start with a $5-$10 small-ish VPS somewhere and see how fast that goes
#sknebel(when I hear "compute-heavy" my mind went a few levels larger ;))
#sknebelingesting WARCs is potentially something you could do more optimized when going deep into the options of some cloud offering, but tbh that sounds like more hassle to me
#sknebelso hetzner cloud or something sounds like a good starting point
#sknebel(if someone knows how they'd do that kind of thing with a more cloud-focussed setup I'd be curious though, I have no real sense how those compare at small scale)
#capjamesg[d]Ingesting from WARC is fast(ish) compared to actually crawling documents.
#capjamesg[d]Because then request time / processing needs to be taken into account.
#capjamesg[d]With a bit of Python wizardry I think I can get it down to about 15 mins for 5,000 documents.
rockorager joined the channel
#capjamesg[d]And my image search engine would take everything to another level.
#capjamesg[d]Because then images need to be downloaded, checked, compared, and optimized before being indexed. But I'm not even going to attempt indexing images.
#rockoragercapjamesg[d]: Linode has $100 credits so you could test that out for free for a few months (I think you have 60 days to use the $100)
#aaronpkIf you're working with warc files do you even need this to be in the cloud? Why not use a desktop computer where it's cheaper to get a fast processor?
#capjamesg[d]Good question. It's more preference and about not wanting to leave my desktop running almost maxed out while these operations go on.
#sknebel(I'm also planning a small crawler project, but will just run that on one of my VPSes for a bit and see how that goes)
#[jeremycherfas]I’ve got myself in a bit of a muddle since GitHub deprecated passwords. I have a PAT working fine on my desktop, but not on my laptop. And when I use the PAT on the laptop I cannot authenticate. Is there a simple option? Am I best off to create a new PAT for the laptop only?
#capjamesg[d]I need to index HTML for featured snippet support.
#capjamesg[d]That doesn't work for multiple suites quite yet because the logic was oriented around my site and its structure. But I will index HTML anyway for when that support comes (i.e. for "who is" queries to show h-cards).
#GWG aaronpk Do you just manually edit the html for the spec? Or is there a trick to it?
#ZegnatI actually mess up spec edits constantly.You change public/source/index.php (and maybe some other file in /source) you then have to get the output HTML of that file and put it in public/spec/index.html
#ZegnatSo for every single change you have to edit at least 2 files
#[fluffy]Are there any codified standards for what an IndieAuth profile should include? I see that @rattroupe has added support for outgoing profiles but I don’t see any authoritative/canonical list of what fields are expected to be presented, just a vague-ish example in the IndieAuth spec itself.
#[fluffy]Authl has been requesting the profile scope for quite some time now but doesn’t actually make use of any fields returned because it just gets it from the h-card on the profile page.