Making Your Site Visible to Search Engines with Sitemaps

Google webmaster tools sitemap submissionSitemaps are one of the most recognized website navigation elements on the Internet. A sitemap is a method for making a website easily navigable by humans, and highly visible and indexable by search engines. Despite this, there is often confusion about sitemaps, how they are implemented, and how they can be best leveraged with search engine tools for search engine optimization (SEO).

There are generally two categories of sitemaps: human-readable sitemaps and machine-readable sitemaps.

Human-readable sitemaps are just that—sitemaps which a human-being can read and utilize to understand the layout and contents of a website. However, this post is focused on machine-readable sitemaps.

XML Sitemaps

Machine-readable sitemaps also describe the layout and contents of a website, however, they are formatted so computers can read and understand them. This is generally achieved with some sort of mark-up language such as the ubiquitous XML. The most widespread sitemap format is described by the Sitemap Protocol defined at sitemaps.org (which is maintained by Google, Yahoo, and Microsoft). This format conforms to the XML standard, and uses a custom XML schema to describe the layout and content of a website.

Here’s how an example ‘page’ might appear in the sitemap protocol:

1
2
3
4
5
6
7
8
9
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      <loc>http://www.smetoolkit.org/smetoolkit/en/content/en/793/Creating-an-Effective-Business-Plan</loc>
      <lastmod>2009-01-01</lastmod>
      <changefreq>monthly</changefreq>
      <priority>0.8</priority>
   </url>
</urlset>

Basically, the sitemap specifies the URL of a resource, the last time it was updated, how frequently it is expected to change, and how important the resource is to the site overall. At a minimum, a URL must be specified, the other attributes are optional.

Generally, many such resources are defined in a single sitemap. Many content management systems, such as WordPress, provide functionality or plugins which can automatically generate sitemaps conforming to this format.

However, there are some additional tricks to ensuring that your sitemap provides the most value to search engines.

Providing an Index for Multiple Sitemaps

If your sitemap happens to include an incredibly large number of resources (more than 50,000 resources or greater than 10 MB in size), you should instead use multiple sitemaps which can be linked to or connected with a sitemap index.

The sitemap index, also described by the sitemap protocol, provides the URLs to multiple, separate sitemaps. This allows search engines to locate and make use of multiple sitemaps when a single, huge sitemap may be too large to be processed effectively.

Here’s how an example sitemap may appear in a sitemap index:

1
2
3
4
5
6
7
8
9
10
11
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <sitemap>
      <loc>http://www.smetoolkit.org/smetoolkit/en/sitemap_1.xml</loc>
      <lastmod>2011-03-16</lastmod>
   </sitemap>
   <sitemap>
      <loc>http://www.smetoolkit.org/smetoolkit/en/sitemap_2.xml</loc>
      <lastmod>2011-03-16</lastmod>
   </sitemap>
</sitemapindex>

At a minimum, the sitemap index must provide a URL, the last modification date is optional. Essentially, the sitemap index specifies the location of the actual sitemaps which describe a site, and when they were last changed.

Making Sure that Your Sitemap Can Be Located & Accessed

A sitemap is usually named ‘sitemap.xml’, and located at the root of the domain, such as http://www.smetoolkit.org/sitemap.xml. However, in more complicated sites, or sites with content management systems, the sitemap may be located elsewhere, or named differently. Unfortunately, search engines cannot guess every possible location of a sitemap. Therefore, it is often very helpful to include the location of a sitemap in the robots.txt file (which is always located in the same place).

The sitemap directive of the robots.txt file specifies the location of a site’s primary sitemap or sitemap index. The directive takes an absolute URL as an argument, and may specify multiple sitemaps.

An example robots.txt file might read:

1
2
3
4
User-agent: *
Sitemap: http://www.smetoolkit.org/smetoolkit/en/sitemap.xml
Disallow: /javascript/
Disallow: /styles/

In addition to specifying the name and location of an irregular sitemap, the robots.txt directive allows you to signify trust of a sitemap provided by another domain. This could be useful, for example, if you host the sitemap for your site on an alternate domain.

Submitting Your Sitemap to Search Engine Tools

While the robots.txt file provides a convenient method to specify the location of your sitemap, it is often helpful to utilize the proprietary tools provided by a search engine to submit your sitemap. For example, major search engines such as Google and Bing provide a ‘webmaster’ interface to submit sitemaps to the search engine. These tools often provide more features, such as tracking impressions and clicks, and monitoring HTTP 404 (not found) errors.

The URL’s to access (and sign up for) the tools provided by the major search engines are below:

Because the features provided by these search engines are somewhat privileged and private, you must usually validate that you control or own the site before signing up. Ordinarily, this is accomplished by meta tags, an html file placed on the site, or DNS entries.

Conclusion

While the sitemap may seem like a quick, ultra-simple method to improve a website’s SEO, it is important to be aware of the characteristics of sitemaps, how they should be best created and managed, and how they can be leveraged with a search engine’s tools for maximum effectiveness. Too many sites create sitemaps but do not submit them to search engines, or place them in an obscure location which a search engine cannot find automatically. While search engines will generally crawl enough links to find most pages on a site, the sitemap can make this process faster, and ensure that all resources are indexed even if few or no links to them exist.
 

Conversation
  • brent says:

    Great article, what are your thoughts on the change frequency attribute? Some automatic sitemap generators use one value for all the pages. Do you think it is better to not include the attribute at all or to have some pages described incorrectly?

  • Daan says:

    Hi there

    Nice to read. Thanks for the info.
    One question:
    I have a xml sitemap (Drupal module) and added it to google. should i include a link to this sitemap on my website? or is it enough to have the xml file or does it need to be visible?

    thx.

  • Justin Kulesza Justin Kulesza says:

    Hi brent. The change frequency attribute does not have to be 100% accurate. It is only a hint for search engines as to how frequently a page may change. Some search engines will ignore the value completely, and those that recognize it won’t penalize you if it is not accurate.

  • Justin Kulesza Justin Kulesza says:

    Hi Daan. Usually it is not necessary to have a link to your XML sitemap from your website. Submitting your XML sitemap to major search engines, and making its location known using the robots.txt file is generally considered the best practice.

  • mike webb says:

    I have a vista web site for my company, I am a Fencer/Landscaper, I just want my company to be visable when people type in Fencer in Milton Keynes.

    Regards

    Mike

  • Kingsley says:

    Hi, nice article.
    I did an auto generation of a sitemap for my site, i’ve been trying to submit it but to avail. I had to use my feeds url. I know fully well that sitemap submission is very important. Can you please offer any help. Thanks.

  • Comments are closed.