Did you know? Frequently asked questions about Site Auditor
In the short time it’s been available in beta, we’ve been getting some great feedback on our latest Raven addition, Site Auditor.
Our developers are busy making updates to it every day, and we really appreciate all of the suggestions and bug reports our users have been sending in.
We’ve also been receiving plenty of good questions about the tool, so I wanted to share some of the most frequently asked ones along with their answers here. Read on to make the most of Site Auditor.
I’ve fixed some of the errors Site Auditor reported. Can I re-crawl my site?
Yes, Site Auditor now supports manual crawls. To re-crawl your site, click the Settings icon and then set the crawl frequency to manual crawls.
Then, back on the Summary tab, a Run Crawl button will appear. Click this button to initiate a manual crawl.
When Site Auditor has completed your manual crawl, an email will be sent to the email address of the user who initiated it. If you would like the website to return to being crawled on a schedule, just click the Settings icon again and select either a weekly or monthly crawl schedule.
How long does a crawl take?
The time it takes to crawl a website in Site Auditor depends on two factors: How many pages need to be crawled on your website and how many other websites are in the queue to be crawled. Because of these variables we aren’t able to give a specific ETA for when your crawl will be completed, but if you find you’ve been waiting for more than 24 hours please contact us at firstname.lastname@example.org.
Sites are crawled on a first-come, first-serve basis, and only one site per account is crawled at a time. If you have requested more than one website to be crawled, your second site will be added to the queue once your first site’s crawl is complete.
How will I know when my crawl is complete?
When Site Auditor has completed your crawl, an email will be sent to the email address of the user who initiated it. If you’re sharing a Raven login, make sure the email address tied to that account is checked so you don’t miss your notification message.
Why won’t Site Auditor crawl my website?
If you’ve tried to crawl a website and received an error stating that your website could not be crawled, there are several possible explanations. Some of the most common ones we’ve seen are:
- You’re blocking IP addresses. Raven uses Amazon Web Services (AWS), so if you’re blocking their range of IP addresses Site Auditor won’t be able to crawl your site. You can write an exception to allow access to the user agent ‘RavenCrawler’ in this instance. This should also work if your site is in development and you’re giving access only to specific user agents.
- Your site is blocking search engines. If your robots.txt file is set to disallow page crawls (see robotstxt.org for details), our crawler will not be able to access your site.
If your site can’t be crawled and none of these issues apply, please email email@example.com and we’ll be glad to take a look.
Why was only one page of my website crawled?
There are various reasons why Site Auditor might not be able to access your entire site, but the most common are:
- Your home page is a splash page. If your home page is created using Flash and doesn’t contain any links, RavenCrawler will assume there is nothing more to crawl and will stop.
- Your links lead off-site. We crawled www.huffingtonpost.com/women and saw that only one page was crawled. Upon closer inspection, we found that every single link on the Huffington Post’s Women’s page lives off their home page or in another folder.
For example, looking at www.huffingtonpost.com/women, we see there is a link to an article called “How Women Are — And Aren’t — Better Off Than They Were in 1963.” But the link to this article is http://www.huffingtonpost.com/2013/02/20/feminine-mystique-at-50-better-off-grandmother-history_n_2725202.html?utm_hp_ref=women&ir=Women.
Site Auditor does not follow this link because it’s not contained within the /women sub-directory that we told it we wanted crawled.
What is my usage limit?
While Auditor is in beta, Pro and Agency users can crawl up to 1,000 pages per website for a total of 10,000 pages per account each day. If you reach your limit one day, it resets the next. There are no overage charges for Auditor currently, as we do not allow you to exceed your limit.
Trial users can crawl one website only for up to 1,000 pages.
How do I contact you for help?
If you see a bug, have questions or would like to submit feedback, please use the Report an Issue icon at the top right of your screen to send a message to our Support team. Or, you can always email firstname.lastname@example.org.