The third tutorial covers rules. Rules allow you to configure how the web site is downloaded.

This tutorial assumes you have followed the steps in the first tutorial.

Adding a new rule

  1. Select Rules Rules from the Project menu, or press Control+R to display the rule editor.
  2. Click the Add button to add a new blank rule and select it for editing
  3. In the Expression field, enter \.gif. This field allows you to enter regular expressions that will be matched against each crawled URL.
  4. The new rule has automatically default to have the Excluded flag, meaning any URL matching the expression will be excluded from the crawl. By tweaking these options more powerful crawl functionality can be utilised, for example to only download images
  5. Click OK to save the rule and close the editor
  6. Press Shift+F5 to copy the project

When the copy has finished, the Skipped table will show that all URLs containing .gif were skipped. A yellow icon indicates that the file was skipped due to a rule.

Editing a rule

  1. Select Rules Rules from the Project menu, or press Control+R to display the rule editor.
  2. Select the rule from the list
  3. Enter \.gif(?:$|#|\?) into the Pattern field
  4. Click OK to save the rule and close the editor

If you copy the website now, you'll get the same results as before. However, the rule is now a little more robust - instead of blindingly ignoring any URL containing .gif, it will only ignore any URL which

  • ends in .gif (http://somewhere.com/test.gif)
  • has .gif before the fragment (http://somewhere.com/test.gif#bookmark)
  • has .gif before the query string (http://somewhere.com/test.gif?value1=a)

By entering regular expressions as rules, you have powerful control over what content is downloaded and what content is skipped. WebCopy includes a regular expression editor to help build and test rule expressions.

For another example on how use rules to control the crawl, see the how to only copy images example topic.

© 2010-2024 Cyotek Ltd. All Rights Reserved.
Documentation version 1.10 (buildref #186.15944), last modified 2024-08-18. Generated 2024-08-18 08:01 using Cyotek HelpWrite Professional version 6.20.0