Crawling can be left to try and scan as much of a website as it can access, or you can be limit it to only crawl to a certain depth.
Note
Scan depth checks only apply to the main domain being crawled
How does Sitemap Creator determine depth?
Sitemap Creator determines the depth of a URL by looking at the number of path components it is made up of, excluding the document name if possible.
URL | Depth |
---|---|
http://www.example.com/ | 0 |
http://www.example.com/index.html | 0 |
http://www.example.com/products/ | 1 |
http://www.example.com/products/index.html | 1 |
http://www.example.com/products/webcopy | 2 |
Configuring a scan depth
- From the Project Properties dialogue, select the General category
- Check the Limit crawl depth option
- Enter the maximum level that Sitemap Creator will scan
Important
Scan depth is taken from base domain, not the starting address
See Also
Configuring the Crawler
Working with local files
- Extracting inline data
- Remapping extensions
- Remapping local files
- Updating local time stamps
- Using query string parameters in local filenames
Controlling the crawl
- Content types
- Crawling multiple URLs
- Crawling outside the base URL
- Including additional domains
- Including sub and sibling domains
- Limiting downloads by file count
- Limiting downloads by size
- Limiting scans by distance
- Scanning data attributes
- Setting speed limits
- Working with Rules
JavaScript
Security
- Crawling private areas
- Manually logging into a website
- TLS/SSL certificate options
- Working with Forms
- Working with Passwords
Modifying URLs
Advanced
- Aborting the crawl using HTTP status codes
- Cookies
- Defining custom headers
- HEAD vs GET for preliminary requests
- HTTP Compression
- Modifying page titles
- Origin reports
- Overwriting read only files
- Redirects
- Saving link data in a Crawler Project
- Setting the web page language
- Specifying a User Agent
- Specifying accepted content types
- Using Keep-Alive