#dev 2024-03-03

2024-03-03 UTC
#
[tantek]
I need to write something up on how to maintain & improve a complex regex because I've now been doing it for years with the autolinker and maybe my tips can help others or people can tell me how I’m being inefficient :)
#
[tantek]
Thought that being said, I’m also "stuck" a bit on rewriting / refactoring to add a substantial new feature so I feel I have more to figure out (about that regex debugging & improving process)
#
[tantek]
Though* that being said
[snarfed] joined the channel
#
[snarfed]
there are a lot of great interactive regexp editors now
#
[snarfed]
(I've only used the Emacs one, but still, they're a great idea in general)
#
[snarfed]
that plus unit tests maybe
burley and [Joe_Crawford] joined the channel
#
[Joe_Crawford]
https://regex101.com is incredibly convenient to use for a big pile of cases to check against. I have not played much with the BBEdit regex explorer thing. "Pattern Playground" they call it.
burley and bterry joined the channel
#
[tantek]
what is regex
#
Loqi
A regular expression is a sequence of characters used to match, extract, and/or replace patterns in text https://indieweb.org/regex
#
[tantek]
^ [snarfed] [Joe_Crawford] please add the regex editors / tools / checkers that you use there
#
[Joe_Crawford]
regex101 added in 2019! A fine page, that is.
#
gRegor
Looks like they're in the see also there
#
gRegor
Ah and the classic...
#
gRegor
what is html regex
#
gRegor
Recommend the audio version of that
angelo_ joined the channel
#
[Al_Abut]
Can I get some fresh eyes on the thing I’ve been working on? I’m catching up on all the different techniques/factors for making responsive images and made a little sandbox to test out:
#
[Al_Abut]
Like don’t tweet it out or whatever, this is just for the indieweb fam until I write up a tutorial but I’d love to know if anyone can poke holes in this strategy.
#
[Al_Abut]
I’m taking a pause before I go down the rabbit hole of building a pipeline of assets that actually use it (applescript/automator/shortcuts to create image variations, MDX components to embed in markdown posts, etc)
#
[Al_Abut]
Oh and in case anyone’s wondering why I’d bother with all of that madness, here’s an animated GIF with a real world example: https://www.threads.net/@alabut/post/C3_Eo1xM-0R
bret and burley joined the channel
#
box464
Hi! Just wanted to share, I added basic microformatting to this ActivityPub educational tool Terence Eden is building. Hoping anyone that picks it up to learn AP will also include microformats in applications they build later. 😃 https://gitlab.com/edent/activitypub-single-php-file and my spun up instance of it https://ape.box464.social
burley_ and jacky joined the channel
#
capjamesg
box464 Very cool!
burley and [jeremycherfas] joined the channel
#
[jeremycherfas]
[Al_Abut] That demo is cool, although I wonder about the decision to crop the image for the smallest screens. Seems to be risky and does not convey the same idea, to me.
burley, geoffo, gRegor, oodani, Guest6, ttybitnik, jacky and superkuh joined the channel
#
[Al_Abut]
[jeremycherfas] thanks for checking it out. I wouldn’t crop the images automatically, that’s not clear from the demo. You’re right that it would be too risky - could cut off subjects or make it weird in other ways.
#
[Al_Abut]
The benefit of customizing images for mobile is that “normal” images (typically wide and made for desktop) look terrible. They come out microscopic because the wider you make an image, the shorter the height, so you end up with a postage stamp.
#
[Al_Abut]
(also just read your Seven Year Itch post, you’re hilarious)
burley, burley_ and jacky joined the channel
#
capjamesg
to2ds thank you again for your help with the NGINX configuration for my blog post.
mahboubine, geoffo and [Tilley] joined the channel
#
real_devastatia
I have a question about the innards of Firefox. Why is it that styles in an external stylesheet can't be accessed by, e.g., document.getElementById('whatever').style.whatever ? I've implemented a jQuery-esque workaround by adding an Element.prototype.css() method, which sniffs window.getComputedStyle(), but I'm still curious.
jacky joined the channel
#
[tantek]
the element.style is for the style attribute. to access external stylesheets, use the stylesheets collection: https://developer.mozilla.org/en-US/docs/Web/API/Document/styleSheets
#
real_devastatia
Thanks! I figured it had something to do with style being an attribute. Is document.stylesheets preferable to querying window.getComputedStyle, or are there specific intended use cases for each?
#
[tantek]
different use-cases for each
#
real_devastatia
I find that this works beautifully as both a getter and setter. I can even pass CSS identifiers instead of the JavaScript camelcase equivalent to it. Is there a compelling reason I shouldn't do it this way?
#
IWDiscord
<r​eal_devastatia>
#
real_devastatia
Element.prototype.css = function(property)
#
real_devastatia
var value = window.getComputedStyle(this, null).getPropertyValue(property);
#
real_devastatia
{
#
real_devastatia
if (arguments.length > 1) {
#
real_devastatia
this.style[property] = arguments[1];
#
real_devastatia
}
#
real_devastatia
return value;
#
real_devastatia
}; // css
#
[tantek]
alright, time for another attempt at figuring out matching of various @-@ domain plus optional path identifiers
#
[tantek]
real_devastatia, if code is working for your website, then go for it. I'm not sure this is the best place to ask for code optimization that's not directly related to indieweb building blocks
#
real_devastatia
Understood. That's basically my response to "you shouldn't extend prototypes of built-in objects" anyway.
#
[jeremycherfas]
[Al_Abut] Thanks, I think. The default mode of holding phones vertically does make horizontal aspect ratios odd, but I didn’t realise you were swapping out the phone image for a hand-made crop.
gRegor joined the channel
#
[tantek]
well that /regex page is a mess with a bunch of generic non-indieweb information that is instead linkable to Wikipedia
burley joined the channel
#
[tantek]
interesting, neither of the regex tool sites that people added to the /regex page "work" with my autolinking regex
#
[tantek]
and yet my autolinking regex does work when executed in JS or PHP
#
[tantek]
both regex101 and regexr fail
burley joined the channel
#
real_devastatia
Are you trying to get those sites to explain your autolinking regex?
#
[tantek]
or just verify that test examples work that I know work, so I can edit them and see if they still work
#
real_devastatia
What are you parsing with your regex? URLs?
#
[tantek]
looks like it's due to escaping the backslashes
#
[tantek]
or rather, none of the tools handle or have an option for "unescape backslashes before applying regex"
#
[tantek]
which is literally what you need to do if you want to embed regexes as strings in most languages
#
[tantek]
(escape backslashes that is)
angelo joined the channel
#
real_devastatia
@line 1313: Would you need to escape the backslashes in the first place if that string was in double quotes rather than single quotes?
#
[tantek]
pretty sure double quotes makes it worse
#
real_devastatia
I use filter_var to validate URLs and e-mail addresses. Validating them with regex is a rabbithole I don't want to go down. I messed with it a long time ago and discovered there are as many different solutions as there are e-mail addresses.
#
real_devastatia
[tantek]: That was gonna be my next question. lol
#
[tantek]
nah that fails for typical autolink use-cases like this: http://tantek.com <-- autolinkers will link that, filter-var will fail to see it as a URL
#
[tantek]
filter-var doesn't work for the auto-linking use-case
#
[tantek]
also @-names and @-@-names have no filter-var option either
#
real_devastatia
[tantek]: That's surprising. It's about as typical as a URL gets.
#
[tantek]
nope, not surprising at all because filter-var insists on some strict RFC syntax which is not even what browsers support
#
[tantek]
e.g. if you type or copy paste tantek . com (without the spaces around the . ) into a browser, it will work to navigate. it will also work to auto-link in nearly any text messaging or posting UI.
#
real_devastatia
Oh, so you're saying filter_var fails without the scheme (http:// or https://). Yeah, that's true.
#
[tantek]
this is a typical problem with most programming language built-ins and libraries. strict adherence to a syntax in a spec somewhere that is disconnected from the reality of the UX that users expect to "just work"
#
real_devastatia
I'd imagine there's some additional pre-processing in UX-driven apps to infer a spec-conformant URL. Browsers are certainly good at figuring out bad HTML in quirks mode.
#
real_devastatia
The problems I address with filter_var are much simpler than yours, at any rate. When I add a link to a page, I always include the scheme. It broke in one case when a visitor omitted the scheme from their website URL when they posted to my guestbook, but I went into phpMyAdmin and fixed it. I don't encounter the issue frequently enough to need a more robust solution.
#
angelo
verbose/extended mode is useful for complicated regexes
#
Loqi
angelo: jacky left you a message on 2024-01-06 at 7:31pm UTC: Have you done much with Braid recently?
#
angelo
PHP has an "extended"
#
angelo
mode.. here's some tricks for doing the same in Javascript: https://stackoverflow.com/questions/15463257/commenting-regular-expressions
#
angelo
making it work in both languages might be an exercise
#
[tantek]
angelo, "extended" didn't help for resolving the escaped backslashes
#
[tantek]
yes, making regexes work in both PHP takes a little work but isn't too hard
#
[tantek]
both PHP & JS*
#
angelo
putting it into verbose/extended format might let you better debug from the code itself without having to extract it into a third-party tool
#
[tantek]
I'm looking more for interactive editing of a regex, not debugging
#
[tantek]
I was hoping these tools would work, but they don't without the hassle of manual unescaping / re-escaping which of course introduces multiple chances to introduce errors
burley and adele joined the channel
#
angelo
i believe the regex is split cleanly between @-[@-] parsing and URL parsing with the former only accounting for ~60 characters and two escapes -- work on that chunk specifically?
to2ds joined the channel
#
to2ds
You're most welcome capjamesg. Very happy to help.
#
to2ds
Based on exchanges with iammnchrm the other day, I popped up a hyper-simple Python SSG, intended as a learning tool primarily at https://github.com/toddpresta/w1dino-swg -- it might be kind of oldschool, but I just needed an excuse to create a GH project on my neglected account 😊
#
[tantek]
angelo sort of. am considering refactoring to better handle @-domain mentions and @-url-with-path mentions and @-domain-@-domain and @-username-@-url-with-path
#
[tantek]
plus my local to my site version has footnote auto-linking too
#
[tantek]
how do folks like to document their complex / non-trivial regular expressions? like "big" ones?
#
superkuh
I take out sections of the regex and put them in lines of comments above the regex line saying what it matches/does in english.
#
gRegor
heh, was just going to say I took a glance at that regex then decided to slowly back away :)
#
[tantek]
lol gRegor
#
to2ds
Complex regexs are like Zen Koans and should be exempt from documentation.
#
gRegor
Not sure I've done any that big, but sometimes I concatenate the pattern over multiple lines (PHP) with a comment describing each part