Downloading a website can take considerable time and sometimes you might wish to quickly check the structure of a site in order to fine tune which documents to copy and which to exclude. WebCopy's Quick Scan functionality can help with this.
The quick scan currently only processes each unique page once. For example,
page1.html?valueb=numberwould appear as
page1.htmlin the results and only the first occurrence would be crawled.
Once the scan is complete, a diagram of all found documents is displayed. Each node is colour coded to show how it will be processed when the website is copied.
After the initial scan has complete, additional controls will be displayed allowing you to select how you want the website to be crawled. Changing an option here will automatically update the diagram to indicate how the new mode would affect a copy.
Use of the Everything option is not recommended and should only be used on sites which are self contained or where rules are used to explicitly exclude addresses. Use of this option may cause WebCopy to become unstable.
By default, WebCopy won't crawl any domain that doesn't match the primary host. Changing the crawl mode settings allows you to expand the crawl to include sub domains or sibling domains, or linked resources no matter where they are located.
WebCopy includes the ability to specify additional domains that will be included in the crawl. You can easily configure these from the Quick Scan window - right click the root node on the diagram for the domain you wish to include and click the Exclude option to toggle the status. The domain should now be highlighted in green, indicating it will be copied.
If you change your might, repeat the process and the domain will be excluded.
To include or exclude any page, right click the relevant diagram node and click Exclude to toggle the status.
Click OK to update your project with the specified configuration changes and close the Quick Scan window.