WebCopy will attempt to download any document it can find on a given website. Supported documents, such as HTML pages or style sheets will also be scanned in order to try and detect additional resources, such as images, video's, and file downloads. The tables below describe the default rules that WebCopy uses to scan content.
These rules apply to any document with a content type of text/html
.
Element | Attribute | Notes |
---|---|---|
(Any) | href | The href attribute of the base element is not directly scanned. |
(Any) | src | |
(Any) | style | Content is parsed according the CSS Scanning Rules in the next section. |
img | srcset | |
meta | content | Only meta elements containing a http-equiv attribute with the value refresh will be scanned. |
object | codebase | |
object | data | |
param | movie | Only if param is a child of object |
source | srcset | Only if source is a child of picture |
style | n/a | Content is parsed according the CSS Scanning Rules in the next section. |
video | poster |
In addition to the above rules, you can configure your own using either simple attributes or more complex XPath expressions.
These rules apply to any document with a content type of text/css
. Note that content within CSS comments (/* ... */
) is currently ignored.
Directive / Selector | Value | Notes |
---|---|---|
@imports | (Any) | Supports URLs wrapped in url() or just a standalone URL. |
(Any) | url() | Any property which uses the url() syntax will be scanned. The inner value can be wrapped in single quotes, double quotes or unquoted. |
Additional content types can be supported via plugins.