By default, WebCopy will only scan the primary host you specify, for example http://example.com
.
If you need to copy non-HTML resources from other domains (e.g. a CDN), this would normally be automatically handled via the use of the Download all resources option. However, if you wanted to crawl HTML that isn't located on a sub- or sibling-domain, you can configure WebCopy to download HTML from additional domains.
Some project settings are ignored when crawling additional domains, for example the crawling above the root URL.
If your expression includes any of the
^
,[
,.
,$
,{
,*
,(
,\
,+
,)
,|
,?
,<
,>
characters and you want them to processed as plain text, you need to "escape" the character by preceding it with a backslash. For example, if your expression wasapplication/epub+zip
this would need to be written asapplication/epub\+zip
otherwise the+
character would have a special meaning and no matches would be made. Similarly, if the expression wasexample.com
, this should be written asexample\.com
, as.
means "any character" which could lead to unexpected matches.