#capjamesgangelo IndieWeb Search does this: 1. crawl robots.txt, 2. crawl any sitemaps found in robots.txt, 3. try to crawl sitemap.xml (I think) if it exists, 4. compares a URL to the robots.txt directive, 5. if URL can be crawled, crawl it, do URL discovery, and continue this for every URL.