The crawl engine is designed to scan all pages it can access and understand, however it is important to note that it doesn't have context of a page's purpose. So while a human would avoid clicking a Delete button, WebCopy will, if it can.
Most websites are properly written, so the previously mentioned button is an actual
INPUT element with a backing
A pointing to
A tag), then following the link could lead to data change or destruction.
For that reason it is not recommended to allow WebCopy to crawl private areas of websites unless you have verified that it won't do any harm. And if you do find that your website is allowing for data changes via
HEAD requests - upgrade your software!
As a final point, question why you want to scan the private area - it is next to certain that any data management pages in the copy will no longer function or sitemap pages be accessible, so consider the benefit of making the copy or sitemap in the first place.
As per the license agreement, WebCopy is provided "AS IS" and we are not responsible for how you use this software.
Configuring the Crawler
Working with local files
- Extracting inline data
- Remapping extensions
- Remapping local files
- Updating local time stamps
- Using query string parameters in local filenames
Controlling the crawl
- Content types
- Crawling multiple URLs
- Crawling outside the base URL
- Downloading all resources
- Including additional domains
- Including sub and sibling domains
- Limiting downloads by file count
- Limiting downloads by size
- Limiting scans by depth
- Limiting scans by distance
- Scanning data attributes
- Setting speed limits
- Working with Rules
Creating a site map
- Aborting the crawl using HTTP status codes
- Defining custom headers
- HEAD vs GET for preliminary requests
- HTTP Compression
- Origin reports
- Saving link data in a Crawler Project
- Setting the web page language
- Specifying a User Agent
- Specifying accepted content types
- Using Keep-Alive