By default, WebCopy will only scan the primary host you specify, for example http://example.com
.
If you need to copy non-HTML resources from other domains (e.g. a CDN), this would normally be automatically handled via the use of the Download all resources option. WebCopy can automatically crawl HTML located on sub and sibling domains.
Some project settings are ignored when crawling additional domains, for example the crawling above the root URL.
Option | Notes |
---|---|
Site Only | Only crawls URLs that match the host name specified in the crawl URL |
Sub domains | Includes any sub domains of the host URL |
Sibling domains | Includes both sub domains and sibling domains of the host URL |
Everything | Will crawl any discovered HTTP or HTTPS URL unless excluded via other settings |
Regardless of the setting above, if the Download all resources option is checked then WebCopy will still query resources on other domains and download any non-HTML content, unless the URL is excluded by custom rules.
Use of the Everything option is not recommended and should only be used on sites which are self contained or where rules are used to explicitly exclude addresses. Use of this option may cause WebCopy to become unstable.