How does WebCopy scan pages?

WebCopy will attempt to download any document it can find on a given website. Supported documents, such as HTML pages or style sheets will also be scanned in order to try and detect additional resources, such as images, video's, and file downloads. The tables below describe the default rules that WebCopy uses to scan content.

HTML Scanning Rules

These rules apply to any document with a content type of text/html.

Element	Attribute	Notes
(Any)	`href`	The `href` attribute of the `base` element is not directly scanned.
(Any)	`src`
(Any)	`style`	Content is parsed according the CSS Scanning Rules in the next section.
`img`	`srcset`
`meta`	`content`	Only `meta` elements containing a `http-equiv` attribute with the value `refresh` will be scanned.
`object`	`codebase`
`object`	`data`
`param`	`movie`	Only if `param` is a child of `object`
`source`	`srcset`	Only if `source` is a child of `picture`
`style`	n/a	Content is parsed according the CSS Scanning Rules in the next section.
`video`	`poster`

In addition to the above rules, you can configure your own using either simple attributes or more complex XPath expressions.

CSS Scanning Rules

These rules apply to any document with a content type of text/css. Note that content within CSS comments (/* ... */) is currently ignored.

Directive / Selector	Value	Notes
`@imports`	(Any)	Supports URLs wrapped in `url()` or just a standalone URL.
(Any)	`url()`	Any property which uses the `url()` syntax will be scanned. The inner value can be wrapped in single quotes, double quotes or unquoted.

Additional content types can be supported via plugins.

Cyotek WebCopy Help

How does WebCopy scan pages?

HTML Scanning Rules

CSS Scanning Rules

See Also

Getting Started