We're no longer updating this content regularly. Recommended Version

What is a rule?

At the most basic level a rule is a pattern to match a given URI and then one or more action flags. Currently WebCopy only supports regular expressions for the pattern.

When a website is crawled, for each detected URI all enabled rules are ran. If the pattern for a rule is a match for the URI, then the action flags are processed. This could mean that the URI is excluded from the crawl, or that additional processing is made on the URI.

If a rule fails to execute, for example due to an invalid regular expression, the rule will be automatically disabled to allow the remainder of the site to copy.

Option	Description
Expression	A regular expression with is matched against a URL while analysing or copying a website. If a match is found, and the rule is enabled, then the attributes below are processed.
Exclude	Specifies that the URL should be excluded
Crawl Content	Specifies that although the URL is excluded, its contents should still be scanned
Include	Specifies that the URL should be included
Don't Crawl Content	Specifies that although the URL is included, its contents should not be scanned
Use Full URI	By default, the pattern is only matched on the path and query string of the URL. If this option is specified, the pattern is checked against the entire URL, including domain, schema etc.
Enable this rule	Specifies if the rule is enabled or not
Stop processing more rules	By default, WebCopy will try and process all rules. You can use this flag to control this process; if set and the rule is matched, no further rules will be processed

The Do not allow children to inherit this rule and Reverse flags have been deprecated and will be removed from a subsequent version of WebCopy.

Cyotek WebCopy Help

What is a rule?

See Also

Working with Rules