Some sites may contain mix of prefixed and non-prefixed URLs. For example, domain http://www.example.com
may have links to both http://www.example.com
and http://example.com
. If configured correctly, the non-WWW links should redirect to the WWW links (or vice-versa), but if not, or if you want to copy a site without the interim redirects, Sitemap Creator can help.
To automatically fix mixed prefixes
- From the Project Properties dialogue, select the URL Normalization category
- Check the Ensure internal links match domain prefix option
When this option is set, Sitemap Creator will automatically transform any links that match the source host, but either have a WWW prefix and the source does not, or do not have a WWW prefix while the source does.
Note that this option only applies to URLs matching the host domain; it will not apply to any external domains included in the copy.
See Also
Configuring the Crawler
Working with local files
- Extracting inline data
- Remapping extensions
- Remapping local files
- Updating local time stamps
- Using query string parameters in local filenames
Controlling the crawl
- Content types
- Crawling multiple URLs
- Crawling outside the base URL
- Including additional domains
- Including sub and sibling domains
- Limiting downloads by file count
- Limiting downloads by size
- Limiting scans by depth
- Limiting scans by distance
- Scanning data attributes
- Setting speed limits
- Working with Rules
JavaScript
Security
- Crawling private areas
- Manually logging into a website
- TLS/SSL certificate options
- Working with Forms
- Working with Passwords
Modifying URLs
Advanced
- Aborting the crawl using HTTP status codes
- Cookies
- Defining custom headers
- HEAD vs GET for preliminary requests
- HTTP Compression
- Modifying page titles
- Origin reports
- Overwriting read only files
- Redirects
- Saving link data in a Crawler Project
- Setting the web page language
- Specifying a User Agent
- Specifying accepted content types
- Using Keep-Alive