There are many SEO tips and tricks that help in optimizing a site but one of those, the importance of which is sometimes underestimated is sitemaps. Sitemaps, as the name implies, are just a map of your site – i.e. on one single page you show the structure of your site, its sections, the links between them, etc. Sitemaps make navigating your site easier and having an updated sitemap on your site is good both for your users and for search engines. Sitemaps are an important way of communication with search engines. While in robots.txt you tell search engines which parts of your site to exclude from indexing, in your site map you tell search engines where you’d like them to go.
Sitemaps are not a novelty. They have always been part of best Web design practices but with the adoption of sitemaps by search engines, now they become even more important. However, it is necessary to make a clarification that if you are interested in sitemaps mainly from a SEO point of view, you can’t go on with the conventional sitemap only (though currently Yahoo! and MSN still keep to the standard html format). For instance, Google Sitemaps uses a special (XML) format that is different from the ordinary html sitemap for human visitors.
One might ask why two sitemaps are necessary. The answer is obvious – one is for humans, the other is for spiders (for now mainly Googlebot but it is reasonable to expect that other crawlers will join the club shortly). In that relation it is necessary to clarify that having two sitemaps is not regarded as duplicate content. In ‘Introduction to Sitemaps’, Google explicitly states that using a sitemap will never lead to penalty for your site.
Why Use a Sitemap
Using sitemaps has many benefits, not only easier navigation and better visibility by search engines. Sitemaps offer the opportunity to inform search engines immediately about any changes on your site. Of course, you cannot expect that search engines will rush right away to index your changed pages but certainly the changes will be indexed faster, compared to when you don’t have a sitemap.
Also, when you have a sitemap and submit it to the search engines, you rely less on external links that will bring search engines to your site. Sitemaps can even help with messy internal links – for instance if you by accident have broken internal links or orphaned pages that cannot be reached in other way (though there is no doubt that it is much better to fix your errors than rely on a sitemap).
If your site is new, or if you have a significant number of new (or recently updated pages), then using a sitemap can be vital to your success. Although you can still go without a sitemap, it is likely that soon sitemaps will become the standard way of submitting a site to search engines. Though it is certain that spiders will continue to index the Web and sitemaps will not make the standard crawling procedures obsolete, it is logical to say that the importance of sitemaps will continue to increase.
Sitemaps also help in classifying your site content, though search engines are by no means obliged to classify a page as belonging to a particular category or as matching a particular keyword only because you have told them so.
Having in mind that the sitemap programs of major search engines (and especially Google) are still in beta, using a sitemap might not generate huge advantages right away but as search engines improve their sitemap indexing algorithms, it is expected that more and more sites will be indexed fast via sitemaps.
Generating and Submitting the Sitemap
The steps you need to perform in order to have a sitemap for your site are simple. First, you need to generate it, then you upload it to your site, and finally you notify Google about it.
Depending on your technical skills, there are two ways to generate a sitemap – to download and install a sitemap generator or to use an online sitemap generation tool. The first is more difficult but you have more control over the output. You can download the Google sitemap generator from here. After you download the package, follow the installation and configuration instructions in it. This generator is a Python script, so your Web server must have Python 2.2 or later installed, in order to run it.
The second way to generate a sitemap is easier. There are many free online tools that can do the job for you. For instance, have a look at this collection of Third-party Sitemap tools. Although Google says explicitly that it has neither tested, nor verified them, this list will be useful because it includes links to online generators, downloadable sitemap generators, sitemap plugins for popular content-management systems, etc., so you will be able to find exactly what you need.
After you have created the sitemap, you need to upload it to your site (if it is not already there) and notify Google about its existence. Notifying Google includes adding the site to your Google Sitemaps account, so if you do not have an account with Google, it is high time to open one. Another detail that is useful to know in advance is that in order to add the sitemap to your account, you need to verify that you are the legitimate owner of the site.
Currently Yahoo! and MSN do not support sitemaps, or at least not in the XML format, used by Google. Yahoo! allows webmasters to submit “a text file with a list of URLs” (which can actually be a stripped-down version of a site map), while MSN does not offer even that but there are rumors that it is indexing sitemaps when they are available onsite. Most likely this situation will change in the near future and both Yahoo! and MSN will catch with Google because user-submitted site maps are just a too powerful SEO tool and cannot be ignored.
The Importance Of XML Sitemaps
In the early days of search engines, I wasn’t much of a believer in XML sitemaps. But over time, I began to see first hand how they can benefit websites.XML sitemaps serve as a way to communicate directly with the search engines, alerting them to new or changed content very quickly and helping to ensure that the content is indexed faster.
For content publishers, it’s become critical to help Google specifically understand if your site is the original publisher of content. Why? Panda.
Content Syndication, Duplicate Content & Panda
It’s not uncommon for publishers to syndicate their content on other websites. Further, it’s also not uncommon for publishers to have their site’s content “curated” by other websites without a formal syndication agreement.
Unfortunately, the definition of content curation is fuzzy at best. In a quick Google search for a recent Search Engine Land article, I found over 47 copies of the article on other sites. (Editor’s note: these are not authorized copies.)
For every publisher site offering syndicated content or having content curated by others (with or without permission), the stakes could not be higher with Google. The Panda algorithm update focused in part on removing duplicate content from search engine results pages — meaning that if a site is not deemed the content originator, it’s at risk of being excluded from the results altogether.
XML sitemaps are just one tool that can help content creators establish their stake as the content originator.
Just how profound can XML sitemaps be for indicating content origination?
In theory, the content originator would likely have the earliest indexed timestamp for the content. But take this example, from a publisher that is not using XML sitemaps, into consideration. The curating or syndicating site is having the same content indexed nearly 40 minutes earlier than the original content:
How To Get Started
So, how should you get started? First, you’ll need to create an XML sitemap for your site. Some content management systems (CMS) have an integrated capability to auto-generate XML sitemaps. For WordPress users, I recommend using the Yoast SEO Plugin as WordPress does not have built in sitemap generation capability. (If you are already using Yoast for SEO, make sure you have updated to the most recent version.)
Ideally, you’ll want to use a plugin for your CMS (or innate CMS functionality) to create a sitemap because these tools normally will automatically update your sitemap as new content is added or content is changed. However if you don’t use a CMS or WordPress, you can also create an XML sitemap using various tools like xml-sitemaps.com; however, you’ll need to update your sitemap manually on a regular basis to ensure that its information is correct and up to date.
If you have a particularly large website, you may also need to employ a sitemap index. Search engines will only index the first 50,000 URLs in a sitemap, so if your site has more than 50,000 URLs, you’ll need to use an index to tie multiple sitemaps together. You can learn how to create indices and more about sitemaps at sitemaps.org.
After you’ve created your sitemaps (and potentially sitemap indices), you’ll need to register them with the various search engines. Both Google and Bing encourage webmasters to register sitemaps and RSS feeds through Google Webmaster Tools and Bing Webmaster Tools.
Taking this step helps the search engines identify where your sitemap is — meaning that as soon as the sitemap is updated, the search engines can react faster to index the new content. Also, content curators or syndicators may be using your RSS feeds to automatically pull your content into their sites.
Registering your sitemap (or RSS feed) with Google and Bing gives the search engines a signal that your content has been created or updated before they find it on the other sites. It’s really a very simple process with both engines. To submit a sitemap to Google:
- Ensure that the XML Sitemap is on your web server and accessible via its URL.
- Log in to Google Webmaster Tools.
- Under “Crawl,” choose “Sitemaps.”
- Click on the red button in the upper right marked “Add/Test Sitemap.” Enter the URL of the sitemap and click “Submit Sitemap.”
To register a sitemap with Bing:
- Ensure that the XML Sitemap is on your web server and accessible via its URL.
- Log in to Bing Webmaster Tools.
- Click on “Configure My Site” and “Sitemaps.”
- Enter the full URL of the sitemap in the “Submit a Sitemap” text box.
- Click “Submit.”
Another great reason to register sitemaps with Google specifically is to catch Sitemap errors. Google Webmaster Tools provides great information about the status of each Sitemap and any errors it finds:
For sites with multiple types of content, there are also additional sitemap types that can be used, including image, video and mobile sitemaps.
Way back in the “good old days” of SEO, many “SEO firms” made a pretty good living “submitting your website to thousands of search engines.” While that has never been a sound tactic/method of achieving SEO nirvana, today’s SEO provides us with opportunities to ensure that we get our content – in all shapes, sizes, and forms – indexed in the search engines, to the best of our ability.
When it comes to the crawling phase of SEO and bot visibility, we often first check what we hold from search engines via robots.txt and meta robots tag usage. But equally important is the content/URLs that we feed search engines.
Long ago, the best practice was to create an HTML sitemap of at least all your higher-level pages and link this HTML sitemap from the footer of all site pages. This allowed search engines the ability to have a buffet of site URLs from any one page on your site.
Then along came XML sitemaps. Extensible Markup Language is the preferred means of data digestion by search engines.
With this tool at our disposal, a site administrator has the ability to tell/feed search engines data on the pages of a site they want crawled as well as the priority or hierarchy of site content alongside information on when the page was last updated.
Let’s walk through the initial first steps of how to create sitemaps for varied content types.
How to Build a Standard XML Sitemap
Below is an anatomy of a standard XML sitemap URL entry.
This points out the areas I noted above where you can provide information on URLs desired for crawl as well as additional URL information.
Some content management systems allow the functionality for dynamic or auto-generated sitemaps. Is this easy? Yes. Is it error free? No. More on that in a moment.
If you don’t have the functionality to generate a sitemap with your CMS, then you must create an XML sitemap from scratch. You wouldn’t want to do this manually because of the time burden. That’s why there are tools for this.
There are many XML sitemap generators. Some are free, but they often have a crawl cap on site URLs, so this defeats the purpose.
Most good sitemap generators are paid. One fairly straightforward tool you can use for sitemap generation is Sitemap Writer Pro. It’s well worth the $25.
If you do choose to use other tools, choose the one that allows you to review the crawl of URLs and allows you to easily remove any duplicated URLs, dynamic parameters, excluded URLs, etc. Remember, you only want to include the pages on the site that you want a search engine to index and value.
How to Upload and Submit Your Sitemap
Now that the standard XML sitemap is built, you need to upload the file to your site. This file should reside directly off the root, with a relevant page naming convention such as /sitemap.xml.
Once you’ve done this, go to Google Webmaster Tools and submit the sitemap:
Then do the same with Bing Webmaster Tools:
Yes, they may find the sitemap on your site, but it’s smart to feed search engines this information and give Google and Bing the ability to report on indexing issues.
How to Find Sitemap Errors
You’ve given your URLs to the top search engines in the preferred XML markup, but how are they indexing the content? Are they having any issues? The wonderful caveat of providing this information directly to Webmaster Tools accounts is that you can review what content you may be withholding from search engines by accident.
Google has done a much better job of sitemap issue transparency compared to Bing, which provides a much smaller amount of data for review.
In this instance, we’ve submitted an XML sitemap and received an error that URLs in the sitemap are also featured in the robots.txt file.
It’s important to pay attention to this type of error and warning information. They may not be able to even read the XML sitemap. And, we can also glean information on what important URLs we are accidently withholding from crawls in the robots.txt file.
As a follow-up to the point above, on the negative aspect of dynamically-generated sitemaps, these can often include many URLs that are excluded from search engine view intentionally in the robots.txt file. The last thing we want to do is tell a search engine to both crawl and not crawl the same page at the same time.
Sitemap monitoring is essential for any SEO initiative. At its most basic point, it will tell you how many URLs in your XML sitemap you have provided them, how many are currently indexed in Google, as well as the last time the sitemap file was processed.
Wash, Rinse, Repeat
You may have run through process above and are feeling pretty confident about transparency and delivery of site URLs to the search giants. But aside from the standard XML sitemap information, you can provide to Google and Bing, these engines also will accept information on your site’s image, video, news and mobile content.
Conveniently, these types of sitemaps can be created, placed on the site and submitted in the same fashion as the standard XML sitemap. Additionally, using the preferred tool I mentioned earlier, you’ll also have the ability/functionality to create these sitemaps.
Anatomy of Supporting XML Sitemaps
Build and submit a sitemap
This page describes how to build a sitemap and make it available to Google.
Build and submit a sitemap: