Configuring the crawler

WebCopy requires two user provided pieces of information before a website can be crawled, the first being the primary address to copy and the second the location where to store downloaded materials.

The crawl process can be configured in many ways, from the use of technical settings for controlling the HTTP protocol to using rules for control what content is download and what is ignored. The following topics detail these options.

Important

Web crawling is not an exact science and while the default crawl settings should work for many websites, some customisation and knowledge of how the website to be copied is structured and built may be required

To display the project properties dialog

From the Project menu, click Project Properties .

Tip

If you don't know exactly where a setting lies, press Ctrl+E and enter a setting name such as "cookie" or "zip" and the dialogue will be filtered to only show categories containing the search phrase.