#indieweb 2023-02-26

2023-02-26 UTC
[tantek] and geoffo joined the channel
#
[tantek]
Very good points [snarfed]. Seems like there's an opportunity for an AP-specific search engine that starts spidering with mastodon.social and strictly obeys robots.txt (including rechecking maybe once a day/week or so to remove/hide sites that change their mind)
bterry, IWSlackGateway, [tantek] and [KevinMarks] joined the channel
#
[KevinMarks]
The thing that is a bit tricky is that the servers all cache copies of the posts from other ones and afaik don't propogate the robots.txt status of them
#
[tantek]
why is that tricky? a proper search engine would retrieve each robots.txt directly
#
aaronpk
same problem with showing comments from webmentions
#
aaronpk
my site allows search engines to index it, but if tantek comments on my post, and tantek has a robots.txt that blocks search engines, his comment would get indexed from my copy of it that shows up on my site
gRegor, ren, n8chz and [aciccarello] joined the channel
#
[aciccarello]
Too bad there's no way to noindex just a portion of a page
#
[aciccarello]
Google does have a way to prevent something from showing in the text snippet though.
[schmarty], petermolnar, btrem, IWSlackGateway, seekr, rocto, rvalue, [tantek], [KevinMarks], lanodan, gRegor, bozo_, mro, ren, geoffo, bterry, n8chz, AramZS, dreamLogic, [felix_wenzel73] and [jacky] joined the channel
#
[jacky]
that's actually not completely _impossible_ (re: noindex-ing a part of a page). if one collects that info as they process webmentions, you could probably use that for filtering visibility of some mentions
#
[jacky]
I would have to think on this more but if my site allowed things like IA to archive but not Google to index and your site, aciccarello, has a robots.txt that prohibits IA _and_ Google; if my site's visited by referrer by either of them (or hinted by a user agent, which, YMMV), then I can see that being a way to _exclude_ content on a page
#
[jacky]
this is kind of the issue, tbh, with consent management being forced on individual sites versus indexing tools being more aware of it (and respecting it)
#
[jacky]
but if it's done right, I can't see it _not_ being something more people would want to enable
mro, geoffo and [snarfed] joined the channel
#
[snarfed]
afaik IA started ignoring robots.txt a while ago
#
[tantek]
IA uses robots.txt for whether it should display results. They archive regardless
geoffo joined the channel
#
[snarfed]
tantek++
#
Loqi
tantek has 17 karma in this channel over the last year (79 in all channels)
btrem, geoffo, pmlnr, AramZS, bterry, gRegor and mdemo joined the channel