The crawl engine is designed to scan all pages it can access and understand, however it is important to note that it doesn't have context of a page's purpose. So while a human would avoid clicking a Delete button, Sitemap Creator will, if it can.
Most websites are properly written, so the previously mentioned button is an actual BUTTON
or INPUT
element with a backing FORM
. Sitemap Creator will ignore these; it doesn't submit forms it detects. This also applies if the "button" is a hyperlink with JavaScript events bound to it as Sitemap Creator cannot execute JavaScript. But if the button is a simple A
pointing to delete.asp
without a confirmation (or, with a confirmation that only exists as JavaScript on that A
tag), then following the link could lead to data change or destruction.
For that reason it is not recommended to allow Sitemap Creator to crawl private areas of websites unless you have verified that it won't do any harm. And if you do find that your website is allowing for data changes via GET
or HEAD
requests - upgrade your software!
As a final point, question why you want to scan the private area - it is next to certain that any data management pages in the copy will no longer function or sitemap pages be accessible, so consider the benefit of making the copy or sitemap in the first place.
Important
As per the license agreement, Sitemap Creator is provided "AS IS" and we are not responsible for how you use this software.
See Also
Configuring the Crawler
Working with local files
- Extracting inline data
- Remapping extensions
- Remapping local files
- Updating local time stamps
- Using query string parameters in local filenames
Controlling the crawl
- Content types
- Crawling multiple URLs
- Crawling outside the base URL
- Including additional domains
- Including sub and sibling domains
- Limiting downloads by file count
- Limiting downloads by size
- Limiting scans by depth
- Limiting scans by distance
- Scanning data attributes
- Setting speed limits
- Working with Rules
JavaScript
Security
Modifying URLs
Advanced
- Aborting the crawl using HTTP status codes
- Cookies
- Defining custom headers
- HEAD vs GET for preliminary requests
- HTTP Compression
- Modifying page titles
- Origin reports
- Overwriting read only files
- Redirects
- Saving link data in a Crawler Project
- Setting the web page language
- Specifying a User Agent
- Specifying accepted content types
- Using Keep-Alive