#[tantek]Very good points [snarfed]. Seems like there's an opportunity for an AP-specific search engine that starts spidering with mastodon.social and strictly obeys robots.txt (including rechecking maybe once a day/week or so to remove/hide sites that change their mind)
bterry, IWSlackGateway, [tantek] and [KevinMarks] joined the channel
#[KevinMarks]The thing that is a bit tricky is that the servers all cache copies of the posts from other ones and afaik don't propogate the robots.txt status of them
#[tantek]why is that tricky? a proper search engine would retrieve each robots.txt directly
#aaronpksame problem with showing comments from webmentions
#aaronpkmy site allows search engines to index it, but if tantek comments on my post, and tantek has a robots.txt that blocks search engines, his comment would get indexed from my copy of it that shows up on my site
gRegor, ren, n8chz and [aciccarello] joined the channel
#[aciccarello]Too bad there's no way to noindex just a portion of a page
#[aciccarello]Google does have a way to prevent something from showing in the text snippet though.
[schmarty], petermolnar, btrem, IWSlackGateway, seekr, rocto, rvalue, [tantek], [KevinMarks], lanodan, gRegor, bozo_, mro, ren, geoffo, bterry, n8chz, AramZS, dreamLogic, [felix_wenzel73] and [jacky] joined the channel
#[jacky]that's actually not completely _impossible_ (re: noindex-ing a part of a page). if one collects that info as they process webmentions, you could probably use that for filtering visibility of some mentions
#[jacky]I would have to think on this more but if my site allowed things like IA to archive but not Google to index and your site, aciccarello, has a robots.txt that prohibits IA _and_ Google; if my site's visited by referrer by either of them (or hinted by a user agent, which, YMMV), then I can see that being a way to _exclude_ content on a page
#[jacky]this is kind of the issue, tbh, with consent management being forced on individual sites versus indexing tools being more aware of it (and respecting it)
#[jacky]but if it's done right, I can't see it _not_ being something more people would want to enable
mro, geoffo and [snarfed] joined the channel
#[snarfed]afaik IA started ignoring robots.txt a while ago
#[tantek]IA uses robots.txt for whether it should display results. They archive regardless