Adding a new rule via the Rules Editor

  1. Select Rules Rules from the Project menu, or press Control+R to display the rule editor.
  2. Click the Add button to add a blank rule
  3. Enter a search expression into the Expression field
  4. Choose what components the search expression will be matched against in the Compare with field
  5. Select any other options to control how the rule behaves
  6. Click the OK button to save the new rule and close the editor

Important

If your expression includes any of the ^, [, ., $, {, *, (, \, +, ), |, ?, <, > characters and you want them to processed as plain text, you need to "escape" the character by preceding it with a backslash. For example, if your expression was application/epub+zip this would need to be written as application/epub\+zip otherwise the + character would have a special meaning and no matches would be made. Similarly, if the expression was example.com, this should be written as example\.com, as . means "any character" which could lead to unexpected matches.

Compare Options

This table outlines the different compare options available. The example match is based on the following sample address

http://www.example.com/folder/products?sort=name&order=asc

OptionDescriptionExample
AuthorityThe URL domainwww.example.com
Authority, Path, and Query StringThe domain, path and query string of the URLwww.example.com/folder/products?sort=name&order=asc
Content TypeThe detected content type of the URLn/a
Entire URLThe complete URLhttp://www.example.com/folder/products?sort=name&order=asc
PathThe path of the URL, including file names if applicablefolder/products
Path and Query StringThe path and query string of the URLfolder/products?sort=name&order=asc
Query StringThe query string of the URLsort=name&order=asc
OperandDescription
MatchesSpecifies the rule will be processed if the given input matches the rule expression
Does Not MatchSpecifies the rule will be processed if the given input does not match the rule expression

Rule Options

OptionDescription
Enable this ruleSpecifies if the rule is enabled or not. Disabled rules will be ignored
ExcludeSpecifies that the URL should be excluded
IncludeSpecifies that the URL should be included. This allows you to have a wider rule to exclude content, and then a narrower rule to include specific content.
Crawl ContentSpecifies that although the URL is excluded, its contents should still be scanned (applies to HTML documents only). This means that although a permanent copy of the URL is not downloaded, a temporary copy is still made in order to scan for additional URLs to crawl.
Don't Crawl ContentSpecifies that although the URL is included, its contents should not be scanned (applies to HTML documents only). This means that while a permanent copy of the URL is created, it will not be scanned for additional URLs to crawl.
Stop processing more rulesBy default all rules are processed sequentially. You can use this flag to control this process; if set and the rule is matched, no further rules will be processed
Download PriorityAllows the download priority for URLs matching the rule to be changed. High priority will mean the URL will be downloaded immediately, while Low means the URL will be downloaded when all other URLs have been processed1.

1 The Download Priority options is only supported for rules that match against a URL, it is ignored for rules matching against content types.