Downloading a website can take considerable time and sometimes you might wish to quickly check the structure of a site in order to fine tune which documents to copy and which to exclude. WebCopy's Quick Scan functionality can help with this.
Performing a Quick Scan
- From the Project menu, click Quick Scan or press Shift+F6.
- Select the maximum depth you wish to scan at
- Select the number of pages per domain to process
- Click the Scan button to perform the scan
The quick scan currently only processes each unique page once. For example,
page1.html?valueb=numberwould appear as
page1.htmlin the results and only the first occurrence would be crawled.
Viewing the results
Once the scan is complete, a diagram of all found documents is displayed. Each node is colour coded to show how it will be processed when the website is copied.
- The document will be copied
- The document will not be copied
- The document is a non-HTML resource located on a different domain to the website being copied, however it will be copied with the website as the Copy all resources option is set
Changing the crawl mode
After the initial scan has complete, additional controls will be displayed allowing you to select how you want the website to be crawled. Changing an option here will automatically update the diagram to indicate how the new mode would affect a copy.
Use of the Everything option is not recommended and should only be used on sites which are self contained or where rules are used to explicitly exclude addresses. Use of this option may cause WebCopy to become unstable.
Including or excluding a domain
By default, WebCopy won't crawl any domain that doesn't match the primary host. Changing the crawl mode settings allows you to expand the crawl to include sub domains or sibling domains, or linked resources no matter where they are located.
WebCopy includes the ability to specify additional domains that will be included in the crawl. You can easily configure these from the Quick Scan window - right click the root node on the diagram for the domain you wish to include and click the Exclude option to toggle the status. The domain should now be highlighted in green, indicating it will be copied.
If you change your might, repeat the process and the domain will be excluded.
Including or excluding individual pages
To include or exclude any page, right click the relevant diagram node and click Exclude to toggle the status.
Click OK to update your project with the specified configuration changes and close the Quick Scan window.