In some cases, you may only wish to download a single page and any secondary or tertiary resources linked to it. You can instruct WebCopy to exclude all URLs that contain HTML content and which are not found within a specific distance from the address being copied.
This setting only applies to pages with a
content type of text/html
. It does not apply to any other
resource. This allows resources to be downloaded no matter
where they are located, but restricted the scanning of HTML
pages to the distance.
The diagram below shows an example website. Each column
represents URLs that are the specified distance from the root
URL. If the crawl project was set to use 2
as the limit, then
fourth.html
would not be downloaded.