Quickly scanning a website

Downloading a website can take considerable time and sometimes you might wish to quickly check the structure of a site in order to fine tune which documents to copy and which to exclude. WebCopy's Quick Scan functionality can help with this.

Important

Quick scan functionality is deliberately restricted to limit the ability to scan an entire website - use the Scan or Download options for unrestricted scanning.

Performing a Quick Scan

From the Project menu, click Quick Scan or press Shift+F6.
If not already set, enter the address of the site to copy
Select the maximum depth to which you wish to scan
Select the number of pages per domain to process
Click the Scan button to perform the scan

Important

The quick scan currently only processes each page once. For example, page1.html#fragment, page1.html?valuea=string, and page1.html?valueb=number would appear as page1.html in the results and only the first occurrence would be crawled.

Tip

The Maximum depth and Maximum pages per host options in the Quick scan settings group only apply to the quick scan, and do not have any affect for full scans or downloads. In addition, these settings are not saved with the project.

Viewing the results

Once the scan is complete, a diagram of all found documents is displayed. Each node is colour coded to show how it will be processed when the website is copied.

Colour Key	Description
Green	The document will be copied
Yellow	The document is a non-HTML resource located on a different domain to the website being copied, however it will be copied with the website as the Copy all resources option is set
Red	The document will not be copied

Changing the crawl mode

After the initial scan has complete, additional controls will be displayed allowing you to select how you want the website to be crawled. Changing an option here will automatically update the diagram to indicate how the new mode would affect a copy.

Important

Although you can configure the Limit crawl depth setting, it is temporarily overridden by the Maximum depth field in the Quick Scan Setting group.

Use of the Everything option is not recommended and should only be used on sites which are self contained or where rules are used to explicitly exclude addresses. Use of this option may cause WebCopy to become unstable.

Including or excluding a domain

By default, WebCopy won't crawl any domain that doesn't match the primary host. Changing the crawl mode settings allows you to expand the crawl to include sub domains or sibling domains, or linked resources no matter where they are located.

WebCopy includes the ability to specify additional domains that will be included in the crawl. You can easily configure these from the Quick Scan window - right click the root node on the diagram for the domain you wish to include and click the Exclude option to toggle the status. The domain should now be highlighted in green, indicating it will be copied.

If you change your mind, repeat the process and the domain will be excluded.

Including or excluding individual pages

To include or exclude any page, right click the relevant diagram node and click Exclude to toggle the status.

Saving changes

Click OK to update your project with the specified configuration changes and close the Quick Scan window.

Cyotek WebCopy Help

Quickly scanning a website

Important

Performing a Quick Scan

Important

Tip

Viewing the results

Changing the crawl mode

Important

Including or excluding a domain

Including or excluding individual pages

Saving changes

See Also

Scanning and Copying