Performing a scan allows you to detect the structure of a website, without spending time download resources such as images or archives. It also allows you to test that you have configured your crawl correctly, e.g. to exclude areas you don't want to copy, customise cookies or headers, or myriad other options.
Although this is a scan operation, it still needs to download HTML content in order to crawl it to detect other pages and resources on the target site.
Before performing a full scan, you may wish to make use of the quick scan to detect any obvious areas of a website to exclude.
Scanning a website
- From the Project menu, click Scan Website or press F6.
It is currently not possible to pause and resume a scan.
After the scan has completed, you can use the information panels at the bottom of the window to explore the structure of the site, view errors, or information about detected files and pages.
Scanning and Copying
Configuring the Crawler
Working with local files
- Extracting inline data
- Remapping extensions
- Remapping local files
- Updating local time stamps
- Using query string parameters in local filenames
Controlling the crawl
- Content types
- Crawling multiple URLs
- Crawling outside the base URL
- Downloading all resources
- Including additional domains
- Including sub and sibling domains
- Limiting downloads by file count
- Limiting downloads by size
- Limiting scans by depth
- Limiting scans by distance
- Scanning data attributes
- Setting speed limits
- Working with Rules
- Crawling private areas
- Manually logging into a website
- TLS/SSL certificate options
- Working with Forms
- Working with Passwords
Creating a site map
- Aborting the crawl using HTTP status codes
- Defining custom headers
- HEAD vs GET for preliminary requests
- HTTP Compression
- Origin reports
- Saving link data in a Crawler Project
- Setting the web page language
- Specifying a User Agent
- Specifying accepted content types
- Using Keep-Alive