By default, WebCopy will only scan the primary host you specify, for example http://example.com.

If you need to copy non-HTML resources from other domains (e.g. a CDN), this would normally be automatically handled via the use of the Download all resources option. WebCopy can automatically crawl HTML located on sub and sibling domains.

Important

Some project settings are ignored when crawling additional domains, for example the crawling above the root URL.

Automatically crawling sub or sibling domains

  1. From the Project Properties dialogue, select the General category
  2. Select a mode from the Crawl Mode group
OptionNotes
Site OnlyOnly crawls URLs that match the host name specified in the crawl URL
Sub domainsIncludes any sub domains of the host URL
Sibling domainsIncludes both sub domains and sibling domains of the host URL
EverythingWill crawl any discovered HTTP or HTTPS URL unless excluded via other settings

Regardless of the setting above, if the Download all resources option is checked then WebCopy will still query resources on other domains and download any non-HTML content, unless the URL is excluded by custom rules.

Use of the Everything option is not recommended and should only be used on sites which are self contained or where rules are used to explicitly exclude addresses. Use of this option may cause WebCopy to become unstable.

See Also

Configuring the Crawler

Working with local files

Controlling the crawl

JavaScript

Security

Modifying URLs

Creating a site map

Advanced

Deprecated features

© 2010-2024 Cyotek Ltd. All Rights Reserved.
Documentation version 1.10 (buildref #186.15944), last modified 2024-08-18. Generated 2024-08-18 08:00 using Cyotek HelpWrite Professional version 6.20.0