By default, WebCopy will only scan the primary host you specify, for example http://example.com. You can instruct WebCopy to include other hosts with completely different domain names, for example if the site you are copying makes use of a CDN.

Some project settings are ignored when crawling additional hosts, for example the crawling above the root URL.

Important

Consider making use of the Download all resources option for scenarios where non-HTML content is located on secondary servers. The Crawl Mode option can be used to include sub or sibling domains of the root host.

Configuring additional hosts

  1. From the Project Properties dialog, select the Additional Hosts category
  2. Enter each additional host you want to crawl, one host per line. Do not enter protocol or path information, only include the domain name. You can use regular expressions if required.
  3. Click OK to save your changes. When you next crawl this website, any URLs belonging to the hosts you specify will no longer be skipped, but will be crawled as though they were part of the primary project URL.

Important

If your expression includes any of the ^, [, ., $, {, *, (, \, +, ), |, ?, <, > characters and you want them to processed as plain text, you need to "escape" the character by preceding it with a backslash. For example, if your expression was application/epub+zip this would need to be written as application/epub\+zip otherwise the + character would have a special meaning and no matches would be made. Similarly, if the expression was example.com, this should be written as example\.com, as . means "any character" which could lead to unexpected matches.

See Also

Configuring the Crawler

Working with local files

Controlling the crawl

JavaScript

Security

Modifying URLs

Creating a site map

Advanced

Deprecated features