The third tutorial covers rules. Rules allow you to configure how the web site is downloaded.
This tutorial assumes you have followed the steps in the first tutorial.
Adding a new rule
- Select Rules
from the Project menu, or press Control+R to display the rule editor.
- In the Pattern field, enter \.gif. This field allows you to enter regular expressions that will be matched against each crawled URI.
- Click the Add button to add the rule
- Click OK to save the rule and close the editor
- Press Shift+F5 to copy the project
When the copy has finished, the Skipped table will show that all URL's containing .gif
were skipped. A yellow icon indicates that the file was skipped due to a rule.
Editing a rule
- Select Rules
from the Project menu, or press Control+R to display the rule editor.
- Select the rule from the list
- Enter \.gif(?:$|#|\?) into the Pattern field
- Click OK to save the rule and close the editor
If you copy the website now, you'll get the same results as before. However, the rule is now a little more robust - instead of blindingly ignoring any URL containing .gif, it will only ignore any URL which
- ends in .gif (
http://somewhere.com/test.gif
) - has .gif before the fragment (
http://somewhere.com/test.gif#bookmark
) - has .gif before the query string (
http://somewhere.com/test.gif?value1=a
)
By entering regular expressions as rules, you have powerful control over what content is downloaded and what content is skipped.