Sitemaps are XML files for search engines to learn what pages to crawl and how frequently to check for changes on each page.


Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site. 

Web crawlers usually discover pages from links within the site and from other sites. Sitemaps supplement this data to allow crawlers that support Sitemaps to pick up all URLs in the Sitemap and learn about those URLs using the associated metadata. Using the Sitemap protocol does not guarantee that web pages are included in search engines, but provides hints for web crawlers to do a better job of crawling your site.

The sitemap protocol format consists of XML tags. The encoding of the file must be UTF-8 .

The sitemap must begin with <urlset> and end with </urlset>. The name space must be specified inside the <urlset>. An <url> entity for each URL must be included as a parent XML tag. <loc> (URL of the page)child element for each <url> parent tag must be included. Optionally, <lastmod> (The date of last modification of the file in YYYY-MM-DD format), <changefreq> (How frequently the page is likely to change), <priority> (The priority of this URL relative to other URLs on your site.) child elements may be included for each <url> parent tag. 

A sample XML Sitemap that contains a single URL.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      <loc>http://www.bloggingdeveloper.com/</loc>
      <lastmod>2007-09-06</lastmod>
      <changefreq>monthly</changefreq>
      <priority>0.8</priority>
   </url>
</urlset>

For more information: http://www.sitemaps.org/protocol.php

In November 2006, Google, Yahoo and Microsoft announced that they all use the same sitemap protocol described in Sitemaps.org to index sites around the web. (For more information: http://www.techcrunch.com/2006/11/15/google-yahoo-and-microsoft-agree-to-standard-sitemaps-protocol/)

In April 2007, they announced to use robots.txt to allow webmasters to share their Sitemaps. To do this, simply add the following line to your robot.txt file. (for more information: http://www.ysearchblog.com/archives/000437.html)

Sitemap: http://www.bloggingdeveloper.com/sitemap.xml


All requests to IIS are handled through Internet Server Application Programming Interface (ISAPI) extensions. ASP.NET has its own filter to ensure pages are processed appropriately. By default, the ASP.NET ISAPI filter (aspnet_isapi.dll) only handles ASPX, ASMX, and all other non-display file formats used by .NET and Visual Studio. However, this filter can be registered with other extensions in order to handle requests to those file types, too, but that will be covered later.

Every request flows through a number of HTTP modules, which cover various areas of the application (i.e. authentication and session intofmation). After passing through each module, the request is assigned to a single HTTP handler, which determines how the system will respond to the request. Upon completion of the request handler, the response flows back through the HTTP modules to the user.

HTTP modules are executed before and after the handler and provide a method for interacting with the request. Custom modules must implement the System.Web.IHttpModule interface. Modules are typically synchronized with events of the System.Web.IHttpModule class (implemented within the Global.asax.cs or .vb file).

HTTP handlers process the request and are generally responsible for initiating necessary business logic tied to the request. Custom handlers must implement the System.Web.IHttpHandler interface. Additionally, a handler factory can be created which will analyze a request to determine what HTTP handler is appropriate. Custom handler factories implement the System.Web.IHttpHandlerFactory interface.

For more information:
http://geekswithblogs.net//flanakin/articles/ModuleHandlerIntro.aspx

Overview


Here is a brief overview of how SiteMap HttpHandler will work: A request for SiteMap, will be intercepted and passed to our SiteMap HttpHandler which will generate the SiteMap XML.

Step 1: Create HttpHandler


Inside the App_Code folder, create SiteMapHandler.cs.

Add App_Code Folder
Add App_Code Folder


Add SiteMapHandler.cs
Add SiteMapHandler.cs


Here is the code for the Asp.Net Sitemap Handler implementing the IHttpHandler interface.

SitemapHandler.cs
SitemapHandler.cs


I commented out the loop that adds pages. You may get URL of your pages from web.sitemap file, database or another sitemap provider.

Step 2: Modify Sitemap


Add the following section inside system.web

<httpHandlers>
   <add verb="*" path="sitemap.axd"
        type="SitemapHandler" validate="false"/>
</httpHandlers>

In order to test your sitemap; browse the sitemap.axd file.

Step 3: Use robots.txt to announce Sitemap to Search Engines


Create a text file in the root of your application and name it: robots.txt

Insert the following line changing bloggingdeveloper.com with your domain name:

Sitemap: http://www.bloggingdeveloper.com/sitemap.axd

For an online demo: http://www.bloggingdeveloper.com/sitemap.axd

Download the Demo VS2005 Project

Digg It!DZone It!StumbleUponTechnoratiRedditDel.icio.usNewsVineFurlBlinkList

Currently rated 5.0 by 3 people

  • Currently 5/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5




Add comment





(will not be displayed)








  • javascript
  • querystring
  • url parameters
  • parse querystring
  • delayed redirect
  • settimeout
  • focus
  • textbox
  • page load
  • after submit
  • set focus in asp.net 2.0/3.5
  • on page load
  • after postback
  • set control focus
  • postback
  • asp.net 1.x
  • substring
  • substr
  • javascript string methods
  • parsefloat
  • convert strings to numbers
  • parseint
  • javascript history
  • history.go
  • history.back
  • http requests
  • image maps
  • css sprites
  • external css
  • external javascript
  • compress javascript
  • javascript compression
  • ajaxcontroltoolkit
  • tab control
  • array
  • length
  • javascipt
  • lastmodified
  • mstsc
  • terminal services
  • remote desktop connections
  • null
  • undefined
  • array.join
  • string concatenation
  • setinterval
  • clearinterval
  • timing events
  • cleartimeout
  • javascript timing events
  • url redirection
  • location.href
  • location.replace
  • redirect
  • redirection
  • system.io.compression
  • viewstate compression
  • compress viewstate
  • gzipstream
  • loadpagestatefrompersistencemedium
  • savepagestatetopersistencemedium
  • form spam
  • captcha
  • prevent spam without captcha
  • url redirect
  • defaultbutton
  • enter key
  • default button
  • asp.net
  • 2.0
  • form
  • panel
  • 1.1
  • form submit
  • dopostback
  • onkeypress
  • onkeydown
  • onkeyup
  • javascript key events
  • keycode 13
  • disable enter key
  • int32.parse
  • convert.toint32
  • int32.tryparse
  • google
  • hoax
  • gmail
  • storage
  • counter
  • mail
  • visual studio 2005
  • vs 2008
  • copy
  • paste
  • clipboard data
  • static variables
  • application object
  • static property
  • server control
  • web file manager
  • iz web file manager
  • convert
  • parse
  • tryparse
  • file upload control
  • maxrequestlength
  • executiontimeout
  • httpruntime
  • asp.net 2.0
  • registering scripts
  • registerclientscript
  • registerstartupscript
  • cross-browser
  • events
  • improve web site performance
  • compression
  • caching
  • elmah
  • error logging
  • exception
  • error
  • httphandler
  • google sitemap generator
  • sitemap
  • internet information services
  • iis7
  • hosts file
  • localhost
  • windows vista
  • search engine optimization
  • seo
  • search engine friendly pages
  • headscriptmanager
  • class library
  • head
  • css
  • c#
  • iis
  • internet information services manager
  • 401.3 unauthorized
  • 500.0 internal server error
  • http error 500.19
  • google toolbar
  • yellow input fields
  • input
  • select
  • background color
  • mozilla firefox
  • medium trust
  • orcas
  • compileroptions
  • warninglevel
  • zyb
  • mobile phones
  • online backup service
  • online services
  • textarea
  • maxlength
  • limit input length
  • custom server control
  • internal error 2739
  • adobe cs3
  • adobe customer support
  • solution
  • error code 0x80004005
  • 500.19 internal server error
  • meta tags
  • keywords meta tag
  • meta
  • description meta tag
  • fake page rank domains
  • scammers
  • google page rank technology
  • ebay
  • general
  • title
  • page rank