Some web sites have multiple copies of the same content accessed via different URLs. Often, these sites will indicate the preferred version of the content by including a special tag indicating the canonical URL. Sitemap Creator can optionally honor such preferences, and automatically ignore non-canonical URLs.
To enable or disable canonical URL support:
- From the Project Properties dialogue, select the URL Normalization category
- Check or uncheck the Honor canonical URLs option
When the option is set, any URL which contain a canonical URL declaration in the page header will be included in the sitemap as normal. If a canonical URL declaration is present however, the URL will be automatically excluded from the site map unless it matches the canonical URL.
See Also
Configuring the Crawler
Working with local files
- Extracting inline data
- Remapping extensions
- Remapping local files
- Updating local time stamps
- Using query string parameters in local filenames
Controlling the crawl
- Content types
- Crawling multiple URLs
- Crawling outside the base URL
- Including additional domains
- Including sub and sibling domains
- Limiting downloads by file count
- Limiting downloads by size
- Limiting scans by depth
- Limiting scans by distance
- Scanning data attributes
- Setting speed limits
- Working with Rules
JavaScript
Security
- Crawling private areas
- Manually logging into a website
- TLS/SSL certificate options
- Working with Forms
- Working with Passwords
Modifying URLs
Advanced
- Aborting the crawl using HTTP status codes
- Cookies
- Defining custom headers
- HEAD vs GET for preliminary requests
- HTTP Compression
- Modifying page titles
- Origin reports
- Overwriting read only files
- Redirects
- Saving link data in a Crawler Project
- Setting the web page language
- Specifying a User Agent
- Specifying accepted content types
- Using Keep-Alive