[tantek]Very good points [snarfed]. Seems like there's an opportunity for an AP-specific search engine that starts spidering with mastodon.social and strictly obeys robots.txt (including rechecking maybe once a day/week or so to remove/hide sites that change their mind)
bterry, IWSlackGateway, [tantek] and [KevinMarks] joined the channel
[KevinMarks]The thing that is a bit tricky is that the servers all cache copies of the posts from other ones and afaik don't propogate the robots.txt status of them
aaronpkmy site allows search engines to index it, but if tantek comments on my post, and tantek has a robots.txt that blocks search engines, his comment would get indexed from my copy of it that shows up on my site
gRegor, ren, n8chz and [aciccarello] joined the channel
[jacky]that's actually not completely _impossible_ (re: noindex-ing a part of a page). if one collects that info as they process webmentions, you could probably use that for filtering visibility of some mentions
[jacky]I would have to think on this more but if my site allowed things like IA to archive but not Google to index and your site, aciccarello, has a robots.txt that prohibits IA _and_ Google; if my site's visited by referrer by either of them (or hinted by a user agent, which, YMMV), then I can see that being a way to _exclude_ content on a page
[jacky]this is kind of the issue, tbh, with consent management being forced on individual sites versus indexing tools being more aware of it (and respecting it)