#dev 2022-09-15

2022-09-15 UTC
# 00:05 
[Murray]1 is there a way to disable trackbacks and pingbacks on webmention.io? Or set them to autodelete?
# 00:05 
aaronpk it doesn't do trackbacks
# 00:05 
aaronpk you can just take off the pingback discovery tag on your website
# 00:05 
[Murray]1 I was about to ask if that was the method, perfect, I'll add it to the maintenance list 😄
# 00:06 
[Murray]1 aaronpk++
# 00:06 
Loqi aaronpk has 31 karma in this channel over the last year (113 in all channels)
AramZS, gxt, sp1ff, geoffo, petermolnar, epoch, retronav, mro, srushe, alecjonathon, eb, vikanezrimaya, ancarda, rockorager, capjamesg and tetov-irc joined the channel
# 11:24 
[KevinMarks] Apple has relaunched the darksky api with a 500k/month free tier for Apple Developer Program members https://developer.apple.com/weatherkit/get-started/
# 12:43 
GWG I have no interest in joining the Apple Developer program though
jacky joined the channel
# 13:25 
aaronpk Oh interesting, do you have to have a paid apple account too?
# 13:25 
aaronpk Yes
# 13:30 
jacky that's annoying
AramZS joined the channel
# 13:30 
jacky I'm hoping someone out there takes the bullet for the rest of us and puts something in front of that impl
# 13:34 
GWG jacky: Pirate Weather
# 13:34 
GWG https://pirateweather.net/
# 13:35 
jacky interesting, they even go as far as showing how the infra is set up
# 13:35 
jacky (very AWS-centric)
# 13:36 
jacky I'll try this out
# 13:36 
jacky gwg++\
# 13:36 
Loqi gwg has 18 karma in this channel over the last year (69 in all channels)
# 13:41 
retronav darksky was good while it lasted. pirateweather looks interesting, gwg++
# 13:41 
Loqi gwg has 19 karma in this channel over the last year (70 in all channels)
# 13:43 
jacky I still have a to-do item to update posts on my site with weather info if they're a checkin (mainly)
# 13:43 
jacky and to backport about ~3k posts from the older `v2.jacky.wtf` into it (and then the older blog)
jacky joined the channel
# 14:00 
GWG I love the geekiness of Pirate Weather
jacky joined the channel
# 14:05 
capjamesg Does anyone know how graphs are represented in code?
# 14:05 
capjamesg Say I have a graph with "microformats is structured data" and "structured data is a way to represent information"
# 14:06 
capjamesg If I said "define microformats" and wanted to traverse the graph, is this something I could do with a dictionary?
# 14:07 
jacky it kinda depends on how the parser does it; for mf2-rust, it's a straight up tree after transversing the DOM so I implemented some helper methods to help 'walk' the tree if looking for specific things
# 14:07 
jacky a dictionary _could_ work but you'd be doing a bit of manual searching
# 14:09 
jacky for example, this is the 'root' node https://docs.rs/microformats/latest/microformats/types/struct.Document.html and this method does a recursive scan for a item that has an ID matching a value https://docs.rs/microformats/latest/src/microformats/types/mod.rs.html#1311-1315 that pokes each item and asks if it, any of its children or its nested children have an item that has a matching ID
# 14:10 
jacky has a plan to refactor this so it's a giant flat list but restitching relationships would be annoying
# 14:10 
[manton] Apple has been going more into charging developers for things, presumably as part of increasing their services revenue. I think we’ll continue to see more web services APIs that have free and paid tiers.
jacky and [schmarty] joined the channel
# 14:40 
@socialmedduoNL Vraag: wat kunnen progammeurs doen om IndieWeb en andere sociale netwerken open te gooien? Suggestie: plugin? Webmention? Afhankelijk van wat in structuur van de thema's gebeurt. #WCNL https://www.instagram.com/p/Cih9ylgMVCB/?igshid=YTgzYjQ4ZTY= (twitter.com/_/status/1570419116959227907)
geoffo joined the channel
# 14:44 
GWG [manton]: I don't mind a free/paid tier philosophy
# 15:04 
[manton] Usually free/pair tiers make a lot of sense, but the Apple situation is kind of unique when you consider the basically mandatory $99/year dev program and the 30% tax on app purchases. I don’t love using developers as a revenue source for a company that has traditionally been product-focused.
jacky joined the channel
# 15:33 
GWG [manton]: Exactly
AramZS, jacky and geoffo joined the channel
# 17:45 
@lordmatt ↩️ No, but there is a module system that would enable someone to write a #webmention add-on.  It might be bigger than the framework and would need a DB or read access to a folder. (twitter.com/_/status/1570468760217067522)
# 17:45 
@OpenMentions ↩️ Are you going to add WebMention? (twitter.com/_/status/1570467599275663360)
# 18:22 
[manton] Full disclosure, I had a bug earlier today that sent webmentions in an infinite loop… 🙄 I think mostly internal to Micro.blog, and hopefully didn’t hit anyone else’s servers much.
# 18:25 
@lordmatt Is there a plugin comments system like those that are JavaScript only that also supports #WebMention? (twitter.com/_/status/1570477737948872704)
gRegor, [tonz], darth_mall, jacky and [jeremycherfas] joined the channel
# 19:59 
[tonz] Today at Netherlands WordCamp a speaker (Joost, of the Yoast WP plugin), asled attention for reducing a website’s footprint by reducing hits from crawlers and bots. Highlighting how WordPress by default has all kinds of active URLs that no site owner actually needs, but do get actively crawled all the time. E.g. a single author site like my personal WP blog has an author archive.  His slides are here :
# 19:59 
[tonz] https://docs.google.com/presentation/d/13Ngq-T2Qdbz1b8apUiioTCBmcsB5s411xBKcklmKyNQ/edit#slide=id.g152f65bfa26_0_123 His proposal was twofold: reduce the amount of active but useless URLs on your site, perhaps block crawlers, and complain to crawlers publicly (e.g. on Twitter) about their wasteful indexing.
[tw2113_Slack_] joined the channel
# 20:13 
[schmarty] ha! that yoast presentation is _spicy_ 🌶
jacky joined the channel
# 20:43 
[tantek]4 👀
[jgarber] joined the channel
# 20:45 
GWG Makes me want to look at getting rid of some of the URLs I don't want crawled and some I do
# 20:46 
GWG I don't do much thinking about SEO, but I'd like to eliminate bot waste
# 20:51 
[snarfed] out of curiosity, why?
# 20:52 
[snarfed] granted, it feels messy and wasteful, and it may actually add up to noticeable bandwidth etc cost for big sites, but for most smaller sites I expect it's negligible
# 20:53 
[tantek]4 a good argument for DRY
AramZS joined the channel
# 21:20 
GWG I don't want to have pages I don't care about indexed
jjuran joined the channel
# 21:34 
GWG For example, the WordPress author page....I have a single author site... I'd like to just disable it
# 21:41 
[tantek]4 snarfed, all the arguments about minimizing (attack) surface. also makes sense from a URL maintenance perspective (e.g. less work to switch /URL_design or to a different CMS etc.)
# 21:41 
[tantek]4 less work to migrate = more freedom to migrate
# 21:42 
[snarfed] sure. still probably low priority for the average personal site, but understood
# 21:43 
[tantek]4 also good advice from a setup perspective, before you unintentionally create a (maintenance) mess
# 21:43 
[tantek]4 better defaults for CMS developers etc.
# 21:44 
[tantek]4 opt-in to turning on features that require separate (especially invisible / feed) URLs
# 21:48 
angelo what is URL footprint
# 21:48 
Loqi It looks like we don't have a page for "URL footprint" yet. Would you like to create it? (Or just say "URL footprint is ____", a sentence describing the term)
# 21:49 
angelo what would you call the collection of URLs served at your domain?
# 21:51 
GWG I'd just write a simple few lines of code in plugin form to disable what I don't want
# 21:54 
[tantek]4 what is a sitemap
# 21:54 
Loqi A sitemap is a list of pages on a website https://indieweb.org/sitemap
# 21:54 
[tantek]4 angelo ^
# 21:57 
angelo yeah but you wouldn't encourage someone to keep their sitemap small
# 21:59 
[tantek]4 why not?
# 22:03 
angelo i suppose it depends on how crawlers use what they find in the sitemap.. i've always thought of it as a list of recommended URLs to make /sure/ the robot will find but the bot will still merrily crawl all of the other things it finds that isn't blocked by robots.txt
# 22:04 
angelo in other words, keeping a small sitemap wouldn't necessarily reduce your overall "URL footprint"
# 22:09 
[tantek]4 there's got to be some prior terminology for this, as web IAs (information architects) have been thinking about it (and writing about it) deliberately since the late 1990s
# 22:09 
angelo can you have a catch-all `Disallow: /` and then hand-curate your urlscape via sitemaps.xml?
# 22:11 
[schmarty] angelo: a behaving bot should check all URLs against robots.txt even if it learned them from a sitemap
# 22:22 
angelo good to know schmarty
jacky, tetov-irc and AramZS joined the channel
# 23:30 
capjamesg angelo You should crawl robots.txt before anything else.
[jgarber]1 and [schmarty]1 joined the channel
# 23:32 
capjamesg A bigger site may see their robots.txt crawled multiple times per day by search engines, presumably to ensure the crawler is adhering to the directives the best it can.
# 23:32 
capjamesg [tantek] Do you have any resources re: what to include in a sitemap?
# 23:33 
capjamesg angelo Sitemaps are one method of discovery for URLs. I put all my blog URLs (even page URLs) in my sitemap just to make the job of the crawler easier.
# 23:34 
capjamesg But my site is a few thousand pages. I don't know how massive sites do it.
# 23:35 
capjamesg angelo IndieWeb Search does this: 1. crawl robots.txt, 2. crawl any sitemaps found in robots.txt, 3. try to crawl sitemap.xml (I think) if it exists, 4. compares a URL to the robots.txt directive, 5. if URL can be crawled, crawl it, do URL discovery, and continue this for every URL.
# 23:37 
capjamesg angelo "can you have a catch-all `Disallow: /` and then hand-curate your urlscape via sitemaps.xml?" If you say disallow /, the crawler to which that applies will not crawl anything.
# 23:37 
capjamesg Actually, I should have a provision for that in IndieWeb Search.
# 23:37 
capjamesg I don't yet. Let me take note of this.
# 23:37 
capjamesg I know I still owe you that h-card data. It's on my to do-list.
# 23:46 
[tantek]4 capjamesg I haven't bothered with a sitemap myself, I defer to our wiki page on that subject