If set the crawl address for a project to be a nested URL, by default WebCopy will not crawl outside of this base path.In some circumstances it may be desirable to allow this without changing the root URL to be at a higher level.
To enable or disable crawling outside the base URL
- From the Project Properties dialogue, select the Advanced category
- Check or uncheck the Crawl above the root URL option
Examples with outer URL crawling disabled
The following example table demonstrates which URLs would be copied when the Crawl above the root URL setting is disabled (default), assuming a base URL of /features/
.
Address | Skip |
---|---|
/auth/ | Yes |
/elements/ | Yes |
/features/ | No |
/features/sub_feature | No |
/resources/ | Yes |
Examples with outer URL crawling enabled
The following example table demonstrates which URLs would be copied when the Crawl above the root URL setting enabled, assuming a base URL of /features/
.
Address | Skip |
---|---|
/auth/ | No |
/elements/ | No |
/features/ | No |
/features/sub_feature | No |
/resources/ | No |
See Also
Configuring the Crawler
Working with local files
- Extracting inline data
- Remapping extensions
- Remapping local files
- Updating local time stamps
- Using query string parameters in local filenames
Controlling the crawl
- Content types
- Crawling multiple URLs
- Downloading all resources
- Including additional domains
- Including sub and sibling domains
- Limiting downloads by file count
- Limiting downloads by size
- Limiting scans by depth
- Limiting scans by distance
- Scanning data attributes
- Setting speed limits
- Working with Rules
JavaScript
Security
- Crawling private areas
- Manually logging into a website
- TLS/SSL certificate options
- Working with Forms
- Working with Passwords
Modifying URLs
Creating a site map
Advanced
- Aborting the crawl using HTTP status codes
- Cookies
- Defining custom headers
- HEAD vs GET for preliminary requests
- HTTP Compression
- Origin reports
- Redirects
- Saving link data in a Crawler Project
- Setting the web page language
- Specifying a User Agent
- Specifying accepted content types
- Using Keep-Alive