As it is impractical for WebCopy to crawl the Internet, by default the crawling is limited to only the domain of the crawl URI, but this can be changed as required.
Option | Notes |
---|---|
Site Only | Only crawls URI's that match the host name specified in the crawl URI |
Sub domains | Includes any sub domains of the host URI |
Sibling domains | Includes both sub domains and sibling domains of the host URI |
Everything | Will crawl any HTTP or HTTPS URI detected |
Use of the Everything option is not recommended and should only be used on sites which are self contained or where rules are used to explicitly exclude addresses. Use of this option may cause WebCopy to become unstable.
While you may not wish to crawl external sites, it is still possible to download any files directly linked from the site you are crawling. When the Download all resources option is set, WebCopy will automatically download any external file, as long as the reported content type is not text/html
. The downloaded file will not be crawled, allowing easy downloading of linked images, sounds and other files.