#indieweb 2023-02-26

2023-02-26 UTC
[tantek] and geoffo joined the channel
# 00:23 
[tantek] Very good points [snarfed]. Seems like there's an opportunity for an AP-specific search engine that starts spidering with mastodon.social and strictly obeys robots.txt (including rechecking maybe once a day/week or so to remove/hide sites that change their mind)
bterry, IWSlackGateway, [tantek] and [KevinMarks] joined the channel
# 01:58 
[KevinMarks] The thing that is a bit tricky is that the servers all cache copies of the posts from other ones and afaik don't propogate the robots.txt status of them
# 02:08 
[tantek] why is that tricky? a proper search engine would retrieve each robots.txt directly
# 02:09 
aaronpk same problem with showing comments from webmentions
# 02:10 
aaronpk my site allows search engines to index it, but if tantek comments on my post, and tantek has a robots.txt that blocks search engines, his comment would get indexed from my copy of it that shows up on my site
gRegor, ren, n8chz and [aciccarello] joined the channel
# 06:30 
[aciccarello] Too bad there's no way to noindex just a portion of a page
# 06:30 
[aciccarello] Google does have a way to prevent something from showing in the text snippet though.
[schmarty], petermolnar, btrem, IWSlackGateway, seekr, rocto, rvalue, [tantek], [KevinMarks], lanodan, gRegor, bozo_, mro, ren, geoffo, bterry, n8chz, AramZS, dreamLogic, [felix_wenzel73] and [jacky] joined the channel
# 18:46 
[jacky] that's actually not completely _impossible_ (re: noindex-ing a part of a page). if one collects that info as they process webmentions, you could probably use that for filtering visibility of some mentions
# 18:47 
[jacky] I would have to think on this more but if my site allowed things like IA to archive but not Google to index and your site, aciccarello, has a robots.txt that prohibits IA _and_ Google; if my site's visited by referrer by either of them (or hinted by a user agent, which, YMMV), then I can see that being a way to _exclude_ content on a page
# 18:47 
[jacky] this is kind of the issue, tbh, with consent management being forced on individual sites versus indexing tools being more aware of it (and respecting it)
# 18:48 
[jacky] but if it's done right, I can't see it _not_ being something more people would want to enable
mro, geoffo and [snarfed] joined the channel
# 20:14 
[snarfed] afaik IA started ignoring robots.txt a while ago
# 20:16 
[tantek] IA uses robots.txt for whether it should display results. They archive regardless
geoffo joined the channel
# 20:17 
[snarfed] tantek++
# 20:17 
Loqi tantek has 17 karma in this channel over the last year (79 in all channels)
btrem, geoffo, pmlnr, AramZS, bterry, gRegor and mdemo joined the channel