As it is impractical for WebCopy to crawl the Internet, by default the crawling is limited to only the domain of the crawl URI, but this can be changed as required.
- From the Project Properties dialog, select the General category
- Select a mode from the Crawl Mode group
- Optionally, check the Download all resources option to always download linked resources
|Site Only||Only crawls URI's that match the host name specified in the crawl URI|
|Sub domains||Includes any sub domains of the host URI|
|Sibling domains||Includes both sub domains and sibling domains of the host URI|
|Everything||Will crawl any HTTP or HTTPS URI detected|
Use of the Everything option is not recommended and should only be used on sites which are self contained or where rules are used. Use of this option may cause WebCopy to become unstable.
Downloading all resources
While you may not wish to crawl external sites, it is still possible to download any files directly linked from the site you are crawling. When the Download all resources option is set, WebCopy will automatically download any external file, as long as the reported content type is not
text/html. The downloaded file will not be crawled, allowing easy downloading of linked images, sounds and other files.
- Customizing Projects
- Specifying the web site
- Specifying the save folder
- Remapping extensions
- Specifying default documents
- Crawling additional root URL's
- Updating local time stamps
- Scanning custom attributes
- Fixing sites using mixed prefixes
- Extracting inline data