Sitemaps are XML files for search engines to learn what pages to crawl and how frequently to check for changes on each page.
Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.
Web crawlers usually discover pages from links within the site and from other sites. Sitemaps supplement this data to allow crawlers that support Sitemaps to pick up all URLs in the Sitemap and learn about those URLs using the associated metadata. Using the Sitemap protocol does not guarantee that web pages are included in search engines, but provides hints for web crawlers to do a better job of crawling your site.
The sitemap protocol format consists of XML tags. The encoding of the file must be UTF-8 .
The sitemap must begin with <urlset> and end with </urlset>. The name space must be specified inside the <urlset>. An <url> entity for each URL must be included as a parent XML tag. <loc> (URL of the page)child element for each <url> parent tag must be included. Optionally, <lastmod> (The date of last modification of the file in YYYY-MM-DD format), <changefreq> (How frequently the page is likely to change), <priority> (The priority of this URL relative to other URLs on your site.) child elements may be included for each <url> parent tag.
A sample XML Sitemap that contains a single URL.
<?xml version="1.0" encoding="UTF-8"?>
For more information: http://www.sitemaps.org/protocol.php
In November 2006, Google, Yahoo and Microsoft announced that they all use the same sitemap protocol described in Sitemaps.org to index sites around the web. (For more information: http://www.techcrunch.com/2006/11/15/google-yahoo-and-microsoft-agree-to-standard-sitemaps-protocol/)
In April 2007, they announced to use robots.txt to allow webmasters to share their Sitemaps. To do this, simply add the following line to your robot.txt file. (for more information: http://www.ysearchblog.com/archives/000437.html)
All requests to IIS are handled through Internet Server Application Programming Interface (ISAPI) extensions. ASP.NET has its own filter to ensure pages are processed appropriately. By default, the ASP.NET ISAPI filter (aspnet_isapi.dll) only handles ASPX, ASMX, and all other non-display file formats used by .NET and Visual Studio. However, this filter can be registered with other extensions in order to handle requests to those file types, too, but that will be covered later.
Every request flows through a number of HTTP modules, which cover various areas of the application (i.e. authentication and session intofmation). After passing through each module, the request is assigned to a single HTTP handler, which determines how the system will respond to the request. Upon completion of the request handler, the response flows back through the HTTP modules to the user.
HTTP modules are executed before and after the handler and provide a method for interacting with the request. Custom modules must implement the System.Web.IHttpModule interface. Modules are typically synchronized with events of the System.Web.IHttpModule class (implemented within the Global.asax.cs or .vb file).
HTTP handlers process the request and are generally responsible for initiating necessary business logic tied to the request. Custom handlers must implement the System.Web.IHttpHandler interface. Additionally, a handler factory can be created which will analyze a request to determine what HTTP handler is appropriate. Custom handler factories implement the System.Web.IHttpHandlerFactory interface.
For more information:
Here is a brief overview of how SiteMap HttpHandler will work: A request for SiteMap, will be intercepted and passed to our SiteMap HttpHandler which will generate the SiteMap XML.
Step 1: Create HttpHandler
Inside the App_Code folder, create SiteMapHandler.cs.
Here is the code for the Asp.Net Sitemap Handler implementing the IHttpHandler interface.
I commented out the loop that adds pages. You may get URL of your pages from web.sitemap file, database or another sitemap provider.
Step 2: Modify Sitemap
Add the following section inside system.web
<add verb="*" path="sitemap.axd"
In order to test your sitemap; browse the sitemap.axd file.
Step 3: Use robots.txt to announce Sitemap to Search Engines
Create a text file in the root of your application and name it: robots.txt
Insert the following line changing bloggingdeveloper.com with your domain name:
For an online demo: http://www.bloggingdeveloper.com/sitemap.axd
Download Visual Studio Project demo
Want automatic updates?
Subscribe to our RSS feed
Get Email Updates
sent directly to your inbox!