#dev 2021-07-26

2021-07-26 UTC
jeremycherfas, samwilson, KartikPrabhu, stevestreza, Saphire, capjamesg and hendursa1 joined the channel
#
capjamesg
I know the webmention spec says that you have to process webmentions async.
#
capjamesg
Is it bad if I don't do that?
#
capjamesg
I have an almost-working webmention receiver ready but I can't do anything with it because I cannot figure out the async part.
[pfefferle] joined the channel
#
[pfefferle]
it is not “have to”
#
[pfefferle]
as far as I know, the spec recommends to do it asynchronously because of DoS attacks
#
[pfefferle]
the wordpress plugin is currently processing synchronously
#
capjamesg
Could those be mitigated by some kind of rate limit that occurs before the extensive validation pfefferle?
#
capjamesg
So the workflow would be: user makes HTTP POST request, basic validation occurs, if more than 10 requests made in last hour, block (unless from trusted party), then do the rest of validation.
#
[KevinMarks]
The slow (or rather unknown, async appropriate) part is fetching the source link and parsing it. The DoS problem is sending you a link to a large or slow source that you need to fetch and parse.
hendursaga and chenghiz_ joined the channel
#
[snarfed]
capjamesg in practice, afaik we haven’t seen any actual webmention DoS attacks yet. if async isn’t easy, i’d recommend you ignore it for now and revisit if you actually need it. plenty of us process wms synchronously, and async is only one of many techniques for scaling
KartikPrabhu joined the channel
#
superkuh
async makes it a lot easier to detect spam.
#
superkuh
Batching it.
#
capjamesg
snarfed good to know!
#
capjamesg
Async isn't easy with Flask as far as I can tell.
#
capjamesg
I see there is async/await syntax but because Flask / WSGI is by default single threaded it's hard to do async work.
#
[snarfed]
yeah you generally need to do it somewhere else architecturally, not within user-facing requests, regardless of how
#
capjamesg
Indeed.
#
capjamesg
So it's fine just to do the URL processing in thread?
#
capjamesg
It might mean a request takes a few secs extra but it's a lot easier than async (and means I can actually launch the endpoint!).
#
capjamesg
[KevinMarks] would adding a timeout help?
justache joined the channel
#
[snarfed]
capjamesg oh no I meant generally outside a thread, eg in a task processing queue like celery or wp-cron or PHP’s thing I forget
#
[snarfed]
most web servers and frameworks don’t have good ways to let you continue processing after you’ve returned a response, including threaded
#
[snarfed]
but feel free to try!
#
capjamesg
snarfed any good reads on celery? I couldn't grasp it the first time around.
#
[snarfed]
capjamesg hmm, not offhand. it’s probably overkill for a personal site though. feel free to try it if you want to learn!…but if you just want to get something working, I’d start with cron and a standalone script
shoesNsocks joined the channel
#
@askRodney
Here's how you can bring your own data in Gatsby, adding it to the GraphQL data layer. You see an example with the Webmentions. You can ♻️ upcycle the code here for other APIs. Hope you find it useful. https://rodneylab.com/add-data-gatsby-graphql/ #askRodney #gatsbyjs #Webmentions @GatsbyJS #GraphQL
(twitter.com/_/status/1419692974905217032)
#
capjamesg
snarfed tell me more.
#
capjamesg
Anything I can do to avoid implementing a redis and celery broker setup would be great.
#
capjamesg
So I would have a cron job that runs every, say, 5 mins. That job would execute a Python script that validates the webmention.
#
capjamesg
If the webmention is valid, it is marked as so in the db.
[jacky] joined the channel
#
[jacky]
could drop the info of the requests into a file folder if a db isn't your fancy
#
[jacky]
SQLite is always there (the only dependency is the static library and from there, you're breezing)
#
[jacky]
I've definitely used SQLite as a caching tool in some work apps that had similar constraints
#
[pfefferle]
[capjamesg] that is how WP-Cron works… when a webmention comes in, you schedule an event, that will be triggered on the next (wp-)cron run
#
capjamesg
pfefferle good to know! amazing!
#
[pfefferle]
and scheduling an event, means to add an entry to the mysql db
#
capjamesg
I can use sqlite3 and a "pending" attribute to distinguish processed vs. pending webmentions.
#
capjamesg
And use the cron job to execute a script that processes / validates all pending webmentions.
#
[pfefferle]
for example
#
[pfefferle]
we planned the async feature for webmentions the same way… save a comment to the db with only source and target and schedule an event, that should do the parsing stuff afterwards
#
GWG
[pfefferle]: We really need to figure out our next move
#
Loqi
GWG: capjamesg left you a message 23 hours, 15 minutes ago: I got my plant sensor set up. I have a web server on my local network that I can use to see graphs.
#
[snarfed]
pfefferle++ jacky++ exactly
#
Loqi
pfefferle has 1 karma in this channel over the last year (6 in all channels)
#
GWG
sensors++
#
Loqi
sensors has 1 karma over the last year
#
capjamesg
pfefferle so my solution makes sense architecturally?
#
capjamesg
If so, I can definitely launch the webmention endpoint.
#
capjamesg
It's something I have wanted to do for a while but have never gotten around to finishing.
#
capjamesg
Largely because async with flask is such a pain.
#
[KevinMarks]
A timeout is sensible in any case, yes. If you go to a queue model you may still need that to avoid the queue being blocked by one bad url. Retry with exponential back off is a common pattern too.
#
[KevinMarks]
Aeons ago when I wrote the technorati crawler in python I used Twisted for the async stuff, but there may be easier ways now.
#
[KevinMarks]
I'd put a number of tries in the db for the url as well, so you can schedule retries after the fresher ones, and give up after a few attempts.
rockorager joined the channel
#
[snarfed]
KevinMarks++
#
Loqi
KevinMarks has 11 karma in this channel over the last year (42 in all channels)
#
rockorager
capjamesg: Did you make an importer? Or how are you handling existing data?
#
capjamesg
rockorager I am only handling test data right now.
#
capjamesg
I might copy my webmentions from webmention.io. Not sure.
rockorager joined the channel
#
vikanezrimaya
my software seems to sometimes glitch out and start sending HTTP 500 errors instead of my homepage
#
vikanezrimaya
I wonder if I should just make it restart whenever I get 4 of these in a row in span of maybe a minute
#
vikanezrimaya
* I wonder if I should just make it restart whenever I get 4 of these in a row in span of maybe 5 minutes
#
vikanezrimaya
* I wonder if I should just make it restart whenever I get 4 of these in a row in span of maybe a minute
#
vikanezrimaya
or five minute
#
vikanezrimaya
it looks like there's something wrong with the connection pool
#
vikanezrimaya
i would need to add more debug logging temporarily to see what exactly is going wrong there and why does it time out
rockorager and angelo joined the channel
#
Zegnat
If you are looking for test webmentions, capjamesg, I guess you could loop whatever webmentions you already have on webmention.io and resend them to your new endpoint. Remember: webmentions for any resource can be send by anyone :) Depending on what your endpoint ends up doing for work, that might actually be the easiest way to transfer information
rockorager joined the channel
#
GWG
Zegnat: That does make me wonder if anyone has tried to figure out or log whether the sender is the Webmention endpoint for the source
[schmarty] joined the channel
#
[schmarty]
GWG: why would it be the same?
#
aaronpk
it most certainly is not the same for me
#
Loqi
aaronpk: GWG left you a message 1 day, 12 hours ago: https://github.com/indieweb/microsub/issues/24 I assume Monocle supports this, which you note you added to Aperture and is in Yarns...
#
aaronpk
and yes monocle supports that
#
[snarfed]
“sender” and “Webmention endpoint” aren’t even precisely defined enough to determine equality anyway. same process? host? source IP? system?
#
GWG
I'm thinking more of adding a parameter to the form on my site to detect someone filled that in
#
aaronpk
i have that
#
aaronpk
mainly so that the webmention endpoint can respond differently when it knows it's a person in a browser rather than a regular webmention sender
#
aaronpk
now i am second guessing that statement. I think I used to do that
#
aaronpk
but now webmention.io returns a 201 or a redirect based on whether it thinks a browser made the request
#
aaronpk
aha it was my previous website that did that extra parameter since the webmention endpoint was built in
shoesNsocks1 and [fluffy] joined the channel
#
GWG
aaronpk: I was thinking of adding it, nice to know someone else thought of it
odinho, jeremycherfas, [aciccarello] and rockorager joined the channel
#
rockorager
Zegnat: That's what I did, and it made me realize I had to implement the authorship spec in order to cover all the different ways to find an h-card for an h-entry
angelo and nertzy_ joined the channel
#
@RubygemsN
authorio (0.8.2): Rails engine to add IndieAuth authentication endpoint functionality https://rubygems.org/gems/authorio
(twitter.com/_/status/1419804363002843153)
#
[jacky]
random thought: using webmention in a micropub client to determine if a post is available on their site
#
vikanezrimaya
it sounds interesting but I do not understand what it means
#
[jacky]
mainly b/c I do post creation async and it might not _exactly_ available (things like asset checks, sending syndications, etc)
#
[jacky]
er, not webmention, lol, _websub_. just thought about it more, I'm more like the micropub endpoint gives an block for WebSub setup and go from there
#
[jacky]
that would require the site to send an update to a WebSub server to inform that it's changed so that might complicate things a bit
#
vikanezrimaya
wait, so a micropub client subscribes to websub notifications to be able to see when the post creation is done?
#
vikanezrimaya
sounds very complicated but interesting
#
vikanezrimaya
and also unavailable for those who can't use WebSub
#
vikanezrimaya
I am currently working on a hack to let me quickly import all my old webmentions from Webmention.io into Kittybox for safekeeping
#
[jacky]
yeah! this would be an "use it if we see it" kind of thing
#
vikanezrimaya
I exported all of them and now I'm gonna reuse some of my WIP webmention endpoint code as a console app that slurps a JSON list of sources and targets, fetches the webmention as it should and then sends it as if it was my webmention endpoint
#
[jacky]
oh nice
#
vikanezrimaya
Webmention.io API was very helpful in exporting, I just set a ridiculously large pagination size, got a 338KB JF2 file and processed it with jq to reduce extraneous info
#
vikanezrimaya
ugh. until I figure out a Rust MF2 parser (which will take a long time since I am lazy) i'll have to use Python
#
vikanezrimaya
it's a good thing I still remember a bit of Pythojn
#
[jacky]
oh wait
#
[jacky]
there's one
#
[jacky]
tbh it's been powering a lot of my stuff, it's good
#
vikanezrimaya
wait there is now?!
#
vikanezrimaya
All my work on writing a Python webmention endpoint wasted?!!
#
vikanezrimaya
well it's still a perfectly good webmention endpoint
#
vikanezrimaya
i can't allow it to go to waste
#
vikanezrimaya
so I will at least publish it on Gitlab
#
[jacky]
yeah it's still early alpha (like not on crates.io)
#
[jacky]
I have a issue open and I've been working on a PR on the weekends to help clean up the interface
#
[jacky]
oh definitely publish it
#
[jacky]
write about it too! that code might help someone else 🙂
#
vikanezrimaya
it's somewhat Kittybox-specific tho since it uses Micropub edits to publish Webmentions for everyone to see
#
vikanezrimaya
but any software can be adapted to it without proprietary extensions
#
vikanezrimaya
ugh, trying to remember if the new version contains any extensions to Micropub protocol that I might've invented myself
#
vikanezrimaya
I was planning to do something like that but I don't remember if I abandoned these plans
#
vikanezrimaya
Ok, i'm reading the library source code and it looks like it's untyped JSON
#
[jacky]
yeah _for no_w
#
vikanezrimaya
which is kinda good since it's flexible but kinda bad if you need strong type-checking
#
[jacky]
I have an issue open to help make it so that we can have a base concrete type for items and use that to be composable (MF2 is built around that)
#
vikanezrimaya
I remember that my draft contained a very neat type system for common MF2 objects and properties
#
[jacky]
like something like:
#
vikanezrimaya
you might actually remember it? I think we've discussed this before
#
vikanezrimaya
it was months ago tho