Sitemap Creator has a fixed set of rules which govern how it crawls a document such as a HTML page or style sheet for additional resources. With the rise of responsive websites, these default rules might not always be sufficient - for example custom data attributes may be applied to the img
tag in order to support retina images. Sitemap Creator allows you to specify additional attributes to scan, either by using simple names or more complex XPath expressions.
To scan custom attributes for links to other resources
- From the Project Properties dialogue, expand the Advanced category and select Custom Attributes
- In the edit field, enter each additional custom attribute you wish to scan
For more advanced scenarios, you can use XPath expressions. For example, if a document contained a p
tag and an img
tag, each with a custom attribute named data-original
, you can scan only those on the img
tag by using the expression //img/@data-original
.
Note
Enter only one attribute name per line
Important
Sitemap Creator does not currently support custom attributes where multiple URLs are contained in a single attribute, or the attribute value includes additional content around the URL.
See Also
Configuring the Crawler
Working with local files
- Extracting inline data
- Remapping extensions
- Remapping local files
- Updating local time stamps
- Using query string parameters in local filenames
Controlling the crawl
- Content types
- Crawling above the root URL
- Crawling additional root URLs
- Including additional domains
- Including sub and sibling domains
- Limiting downloads by file count
- Limiting downloads by size
- Limiting scans by depth
- Limiting scans by distance
- Setting speed limits
- Working with Rules
JavaScript
Security
- Crawling private areas
- Manually logging into a website
- TLS/SSL certificate options
- Working with Forms
- Working with Passwords
Modifying URLs
Advanced
- Aborting the crawl using HTTP status codes
- Cookies
- Defining custom headers
- Following redirects
- HEAD vs GET for preliminary requests
- HTTP Compression
- Modifying page titles
- Origin reports
- Overwriting read only files
- Saving link data in a Crawler Project
- Setting the web page language
- Specifying a User Agent
- Specifying accepted content types
- Using Keep-Alive