WebCopy requires two user provided pieces of information before a website can be crawled, the first being the primary address to copy and the second the location where to store downloaded materials.
The crawl process can be configured in many ways, from the use of technical settings for controlling the HTTP protocol to using rules for control what content is download and what is ignored. The following topics detail these options.
Important
Web crawling is not an exact science and while the default crawl settings should work for many websites, some customisation and knowledge of how the website to be copied is structured and built may be required
To display the project properties dialog
- From the Project menu, click Project Properties
.
Tip
If you don't know exactly where a setting lies, press Ctrl+E and enter a setting name such as "cookie" or "zip" and the dialogue will be filtered to only show categories containing the search phrase.
Configuring the Crawler
Working with local files
- Extracting inline data
- Remapping extensions
- Remapping local files
- Updating local time stamps
- Using query string parameters in local filenames
Controlling the crawl
- Content types
- Crawling multiple URLs
- Crawling outside the base URL
- Downloading all resources
- Including additional domains
- Including sub and sibling domains
- Limiting downloads by file count
- Limiting downloads by size
- Limiting scans by depth
- Limiting scans by distance
- Scanning data attributes
- Setting speed limits
- Working with Rules
JavaScript
Security
- Crawling private areas
- Manually logging into a website
- TLS/SSL certificate options
- Working with Forms
- Working with Passwords
Modifying URLs
Creating a site map
Advanced
- Aborting the crawl using HTTP status codes
- Cookies
- Defining custom headers
- HEAD vs GET for preliminary requests
- HTTP Compression
- Origin reports
- Redirects
- Saving link data in a Crawler Project
- Setting the web page language
- Specifying a User Agent
- Specifying accepted content types
- Using Keep-Alive