Did you know? How to set up exclusions in Site Auditor
In the two months since Site Auditor launched, we’ve crawled more than 4 million pages. We continue to receive great feedback from our users on how to improve it, and have been updating it — adding new features and fixing bugs — almost daily.
Most recently, we added the ability to exclude errors and specific paths, allowing you to fine-tune your crawl so that you’re only notified about issues important to your specific goals.
Let’s go through both of these new exclusion options in detail.
Website path exclusions
In some instances, you’ll want to exclude certain parts of your site from being crawled.
For example, WordPress creates separate archive pages for each tag and category you use on your site. When you apply multiple tags or categories to a post, that post appears on the archive page for each. And when Site Auditor crawls the site, it picks up on this as duplicate content.
Since you know this isn’t an issue to be concerned about, you can stop Site Auditor from crawling these tag and category archives by setting up a Website Path Exclusion. To do this, go to Site > Auditor and click on the wrench icon to get to Settings.
In the Website Path Exclusions section, click the “Create New Exclusion” button.
You can exclude various paths from the crawl, but for this example we’ll exclude all category pages on the Raven blog from being crawled and will enter /category/* for our exclusion.
Adding the asterisk after category/ tells Site Auditor to ignore the URL raventools.com/blog/category and all folders and files below it, like raventools.com/blog/category/seo/ and raventools.com/blog/category/raven/.
Click the “Create New Exclusion” button, and your exclusion will be saved. Going forward, Site Auditor will no longer crawl URLs according to your exclusions.
If you ever change your mind and want Site Auditor to start crawling those excluded URLs again, just go back to the Website Path Exclusions section in Settings and delete one or all of the exclusions you created.
If Site Auditor reports certain issues that you already know about but don’t consider a problem, or don’t want to report to your client, you can exclude them from your total number of issues and from showing up on the summary sub-tabs of Site Auditor.
For example, let’s say that you have your robots.txt file set up to block login pages, or maybe your 404 error page, from being crawled. The first time Site Auditor crawls your site, seeing these URLs in your report is probably a helpful reminder. But if you don’t want this information to be factored into your total crawl issues going forward, you can exclude this metric from future crawls.
To do this, go into Site Auditor > Settings (the wrench icon again), and find the Exclude Errors from Report section. From the drop-down, select “Pages blocked by robots.txt” and then save.
Immediately after saving, “Pages blocked by robots.txt” will no longer appear on the Summary tab under Visibility Issues, and the number of issues related to this metric will no longer count toward the total number of issues reported for your site.
You will still be able to find metrics for the errors you’ve excluded under the main tabs, in case you want to check in from time to time to make sure nothing has gone awry with your site.
If you haven’t crawled your site yet, now’s a great time to get started and try out these new features. let us know what you think!