Crawling private areas

The crawl engine is designed to scan all pages it can access and understand, however it is important to note that it doesn't have context of a page's purpose. So while a human would avoid clicking a Delete button, WebCopy will, if it can.

Most websites are properly written, so the previously mentioned button is an actual BUTTON or INPUT element with a backing FORM. WebCopy will ignore these; it doesn't submit forms it detects. This also applies if the "button" is a hyperlink with JavaScript events bound to it as WebCopy cannot execute JavaScript. But if the button is a simple A pointing to delete.asp without a confirmation (or, with a confirmation that only exists as JavaScript on that A tag), then following the link could lead to data change or destruction.

For that reason it is not recommended to allow WebCopy to crawl private areas of websites unless you have verified that it won't do any harm. And if you do find that your website is allowing for data changes via GET or HEAD requests - upgrade your software!

As a final point, question why you want to scan the private area - it is next to certain that any data management pages in the copy will no longer function or sitemap pages be accessible, so consider the benefit of making the copy or sitemap in the first place.

Important

As per the license agreement, WebCopy is provided "AS IS" and we are not responsible for how you use this software.

Cyotek WebCopy Help

Crawling private areas

Important

See Also

Configuring the Crawler

Working with local files

Controlling the crawl

JavaScript

Security

Modifying URLs

Creating a site map

Advanced

Deprecated features