In some cases, you may only wish to download a single page and any secondary or tertiary resources linked to it. You can instruct WebCopy to exclude all URLs that contain HTML content and which are not found within a specific distance from the address being copied.

Configuring a scan distance

  1. From the Project Properties dialogue, select the General category
  2. Check the Limit distance from root URL option
  3. Enter the maximum level that WebCopy will scan

Important

This setting only applies to pages with a content type of text/html. It does not apply to any other resource. This allows resources to be downloaded no matter where they are located, but restricted the scanning of HTML pages to the distance.

Example

The diagram below shows an example website. Each column represents URLs that are the specified distance from the root URL. If the crawl project was set to use 2 as the limit, then fourth.html would not be downloaded.

See Also

Configuring the Crawler

Working with local files

Controlling the crawl

JavaScript

Security

Modifying URLs

Creating a site map

Advanced

Deprecated features

© 2010-2024 Cyotek Ltd. All Rights Reserved.
Documentation version 1.9 (buildref #182.15707), last modified 2024-03-15. Generated 2024-03-15 22:36 using Cyotek HelpWrite Professional version 6.19.1