The crawl engine is designed to scan all pages it can access and understand, however it is important to note that it doesn't have context of a page's purpose. So while a human would avoid clicking a Delete button, WebCopy will, if it can.
Most websites are properly written, so the previously mentioned button is an actual BUTTON
or INPUT
element with a backing FORM
. WebCopy will ignore these; it doesn't submit forms it detects. This also applies if the "button" is a hyperlink with JavaScript events bound to it as WebCopy cannot execute JavaScript. But if the button is a simple A
pointing to delete.asp
without a confirmation (or, with a confirmation that only exists as JavaScript on that A
tag), then following the link could lead to data change or destruction.
For that reason it is not recommended to allow WebCopy to crawl private areas of websites unless you have verified that it won't do any harm. And if you do find that your website is allowing for data changes via GET
or HEAD
requests - upgrade your software!
As a final point, question why you want to scan the private area - it is next to certain that any data management pages in the copy will no longer function or sitemap pages be accessible, so consider the benefit of making the copy or sitemap in the first place.
As per the license agreement, WebCopy is provided "AS IS" and we are not responsible for how you use this software.