Did You Know? Site Auditor’s Crawling Checklist
There are two levels to understanding the Internet.
At first it’s the nebulous cloud thing in which information appears. Poof! Then, it’s a network of computers where data is shared. Once you start to understand the wizard behind the curtain, it’s easier and easier to build your expertise.
The same is true of Raven’s Site Auditor. The more expertise you build, the more powerful the tool will become to you. Let’s talk about how Raven crawls your site.
What does Auditor do after you click the Start Crawling button?
Based on how many pages your site has and how many crawls are being run across Raven, it can take a bit of time to get back your crawl. (Click the Help Center in Raven if it’s been over 24 hours and we can take a look to see if something is wrong.)
Once it’s your site’s turn to be crawled, here is Auditor’s rough checklist.
1. Check for redirects.
Auditor will first check to see if your Campaign URL is a redirect. If it is, Auditor will follow it until it lands on a site. So, if site.com redirects to www.site.com and www.site.com redirects to www.site.com/index.php, then that’s where Auditor starts.
2. Make sure the site can be crawled.
3. Obey instructions in the robots.txt file.
If there are no issues, Auditor will append the domain it’s crawling with robots.txt to find your robot.txt file. Here’s an example. This file lays out some rules for all crawlers, and Raven obeys your site’s wishes.
4. Obey instructions from the crawler’s settings.
After Site Auditor first crawls your site, you can go to Tool Options > Customize Settings to add Website Path Exclusions if there are parts of your site that aren’t useful to crawl. If you have any exclusions set up, Raven won’t crawl those parts of your site.
5. Crawl links on the homepage.
Then Auditor will crawl your homepage and review all the links it finds. If it finds links to pages within your domain, it will crawl those pages next. So if your homepage has a internal link to an internal blog, about page and contact page, these three pages are crawled next.
6. Crawl links found on secondary pages. Repeat.
Next, Auditor will review any links on your secondary pages and start crawling at that level. In this step, Auditor may find some specific blog posts and maybe a “next 10 pages” page on your blog. In this way, Auditor keeps crawling one level deeper, until it either can’t find anymore internal pages or it reaches its limit of 1,000 pages.
7. Send findings to Raven.
Finally, all the data is converted into what you see in Raven and sent to you for analysis.
So that’s how Site Auditor crawls!
And if you’re wondering how to make the most of the data from your crawl, check out Jon’s post on how to do an SEO site audit like a boss.