In some cases, you may only wish to download a single page and any secondary or tertiary resources linked to it. You can instruct WebCopy to exclude all URLs not found within a maximum distance from the address being copied.
Configuring a scan distance
- From the Project Properties dialogue, select the General category
- Check the Limit distance from root URL option
- Enter the maximum level that WebCopy will scan
Example
The diagram below shows an example website. Each column
represents URLs that are the specified distance from the root
URL. If the crawl project was set to use 2
as the limit, then
fourth.html
would not be downloaded.
See Also
Configuring the Crawler
Working with local files
- Extracting inline data
- Remapping extensions
- Remapping local files
- Updating local time stamps
- Using query string parameters in local filenames
Controlling the crawl
- Content types
- Crawling multiple URLs
- Crawling outside the base URL
- Downloading all resources
- Including additional domains
- Including sub and sibling domains
- Limiting downloads by file count
- Limiting downloads by size
- Limiting scans by depth
- Scanning data attributes
- Setting speed limits
- Working with Rules
JavaScript
Security
- Crawling private areas
- Manually logging into a website
- TLS/SSL certificate options
- Working with Forms
- Working with Passwords
Modifying URLs
Creating a site map
Advanced
- Aborting the crawl using HTTP status codes
- Cookies
- Defining custom headers
- HEAD vs GET for preliminary requests
- HTTP Compression
- Origin reports
- Redirects
- Saving link data in a Crawler Project
- Setting the web page language
- Specifying a User Agent
- Specifying accepted content types
- Using Keep-Alive