We're no longer updating this content regularly. Recommended Version

How does WebCopy scan pages?

WebCopy will attempt to download any document it can find on a given website. Supported documents, such as HTML pages or style sheets will also be scanned in order to try and detect additional resources, such as images, video's, and file downloads. The tables below describe the default rules that WebCopy uses to scan content.

HTML Scanning Rules

These rules apply to any document with a content type of text/html.

Element	Attribute	Notes
(Any)	`href`	The `href` attribute of the `base` element is not directly scanned.
(Any)	`src`
`img`	`srcset`
`source`	`srcset`	Only if `source` is a child of `picture`
`meta`	`content`	Only `meta` elements containing a `http-equiv` attribute with the value `refresh` will be scanned.
`object`	`data`
`object`	`codebase`
(Any)	`style`	Content is parsed according the CSS Scanning Rules in the next section.
`style`	n/a	Content is parsed according the CSS Scanning Rules in the next section.
`param`	`movie`	Only if `param` is a child of `object`

In addition to the above rules, you can configure your own using either simple attributes or more complex XPath expressions.

CSS Scanning Rules

These rules apply to any document with a content type of text/css. Note that content within CSS comments (/* ... */) is currently ignored.

Directive / Selector	Value	Notes
`@imports`	(Any)	Supports URL's wrapped in `url()` or just a standalone URL.
(Any)	`url()`	Any property which uses the `url()` syntax will be scanned. The inner value can be wrapped in single quotes, double quotes or unquoted.

Additional content types can be supported via plugins.

Cyotek WebCopy Help

How does WebCopy scan pages?

HTML Scanning Rules

CSS Scanning Rules

See Also

Getting Started