By default, WebCopy will only scan the primary host you specify, for example http://example.com
.
If you need to copy non-HTML resources from other domains (e.g. a CDN), this would normally be automatically handled via the use of the Download all resources option. WebCopy can automatically crawl HTML located on sub and sibling domains.
Important
Some project settings are ignored when crawling additional domains, for example the crawling above the root URL.
Automatically crawling sub or sibling domains
- From the Project Properties dialogue, select the General category
- Select a mode from the Crawl Mode group
Option | Notes |
---|---|
Site Only | Only crawls URLs that match the host name specified in the crawl URL |
Sub domains | Includes any sub domains of the host URL |
Sibling domains | Includes both sub domains and sibling domains of the host URL |
Everything | Will crawl any discovered HTTP or HTTPS URL unless excluded via other settings |
Important
Regardless of the setting above, if the Download all resources option is checked then WebCopy will still query resources on other domains and download any non-HTML content, unless the URL is excluded by custom rules.
Important
Use of the Everything option is not recommended and should only be used on sites which are self contained or where rules are used to explicitly exclude addresses. Use of this option may cause WebCopy to become unstable.
See Also
Configuring the Crawler
Working with local files
- Extracting inline data
- Remapping extensions
- Remapping local files
- Updating local time stamps
- Using query string parameters in local filenames
Controlling the crawl
- Content types
- Crawling multiple URLs
- Crawling outside the base URL
- Downloading all resources
- Including additional domains
- Limiting downloads by file count
- Limiting downloads by size
- Limiting scans by depth
- Limiting scans by distance
- Scanning data attributes
- Setting speed limits
- Working with Rules
JavaScript
Security
- Crawling private areas
- Manually logging into a website
- TLS/SSL certificate options
- Working with Forms
- Working with Passwords
Modifying URLs
Creating a site map
Advanced
- Aborting the crawl using HTTP status codes
- Cookies
- Defining custom headers
- HEAD vs GET for preliminary requests
- HTTP Compression
- Origin reports
- Redirects
- Saving link data in a Crawler Project
- Setting the web page language
- Specifying a User Agent
- Specifying accepted content types
- Using Keep-Alive