We're no longer updating this content regularly. Recommended Version

Quickly scanning a website

Downloading a website can take considerable time and sometimes you might wish to quickly check the structure of a site in order to fine tune which documents to copy and which to exclude. WebCopy's Quick Scan functionality can help with this.

Performing a Quick Scan

From the Project menu, click Quick Scan or press Shift+F6.
Select the maximum depth you wish to scan at
Select the number of pages per domain to process
Click the Scan button to perform the scan

The quick scan currently only processes each unique page once. For example, page1.html\#fragment, page1.html?valuea=string, page1.html?valueb=number would appear as page1.html in the results and only the first occurrence would be crawled.

Viewing the results

Once the scan is complete, a diagram of all found documents is displayed. Each node is colour coded to show how it will be processed when the website is copied.

Green: The document will be copied
Red: The document will not be copied
Yellow: The document is a non-HTML resource located on a different domain to the website being copied, however it will be copied with the website as the Copy all resources option is set

Changing the crawl mode

After the initial scan has complete, additional controls will be displayed allowing you to select how you want the website to be crawled. Changing an option here will automatically update the diagram to indicate how the new mode would affect a copy.

Use of the Everything option is not recommended and should only be used on sites which are self contained or where rules are used to explicitly exclude addresses. Use of this option may cause WebCopy to become unstable.

Including or excluding a domain

By default, WebCopy won't crawl any domain that doesn't match the primary host. Changing the crawl mode settings allows you to expand the crawl to include sub domains or sibling domains, or linked resources no matter where they are located.

WebCopy includes the ability to specify additional domains that will be included in the crawl. You can easily configure these from the Quick Scan window - right click the root node on the diagram for the domain you wish to include and click the Exclude option to toggle the status. The domain should now be highlighted in green, indicating it will be copied.

If you change your might, repeat the process and the domain will be excluded.

Including or excluding individual pages

To include or exclude any page, right click the relevant diagram node and click Exclude to toggle the status.

Saving changes

Click OK to update your project with the specified configuration changes and close the Quick Scan window.

Cyotek WebCopy Help