At the most basic level a rule is a pattern to match a given URI and then one or more action flags. Currently WebCopy only supports regular expressions for the pattern.

When a website is crawled, for each detected URI all enabled rules are ran. If the pattern for a rule is a match for the URI, then the action flags are processed. This could mean that the URI is excluded from the crawl, or that additional processing is made on the URI.

If a rule fails to execute, for example due to an invalid regular expression, the rule will be automatically disabled to allow the remainder of the site to copy.

OptionDescription
ExpressionA regular expression with is matched against a URL while analysing or copying a website. If a match is found, and the rule is enabled, then the attributes below are processed.
ExcludeSpecifies that the URL should be excluded
Crawl ContentSpecifies that although the URL is excluded, its contents should still be scanned
IncludeSpecifies that the URL should be included
Don't Crawl ContentSpecifies that although the URL is included, its contents should not be scanned
Use Full URIBy default, the pattern is only matched on the path and query string of the URL. If this option is specified, the pattern is checked against the entire URL, including domain, schema etc.
Enable this ruleSpecifies if the rule is enabled or not
Stop processing more rulesBy default, WebCopy will try and process all rules. You can use this flag to control this process; if set and the rule is matched, no further rules will be processed

The Do not allow children to inherit this rule and Reverse flags have been deprecated and will be removed from a subsequent version of WebCopy.

© 2010-2018 Cyotek Ltd. All Rights Reserved.
Documentation version 1.7 (buildref #600.-), last modified 2018-11-23. Generated 2023-04-02 08:04 using Cyotek HelpWrite Professional version 6.19.1