Before you can copy a web site, you need to specify the path where the downloaded files will be saved. You can also choose to optionally flatten the downloaded site hierarchy.
Specifying the save folder
- From the Project Properties dialog, select the Folder category
- Enter the folder where the downloaded files will be stored in the Save folder field
Creating sub-folders by domain
By default, WebCopy will download all web sites to
C:\Downloaded Web Sites. Each new project will also have the Create folder for domain option enabled which will force WebCopy to store downloaded material in a sub-folder based on the domain being copied. For example, if the site being copied was the WebCopy demonstration site
https://demo.cyotek.com, the download location would be
C:\Downloaded Web Sites\demo.cyotek.com
To disable this behaviour, uncheck the Create folder for domain option and ensure the Save folder is set to a unique path.
The default save location for new crawler projects can be configured from the Options dialog.
Flattening website folders
WebCopy will try and mirror the folder structure of the local copy to that of the website. However, in some cases this may not be appropriate. By setting the Flatten website folder option, all downloaded content will be in the same folder regardless of its location on the remote site.
Emptying the website folder
If you are rescanning an existing web, WebCopy will try to reuse existing filenames. However, if the website URLs frequently, or if you do not save the link meta data into the project, setting the Empty website folder before copy option will delete the contents of the save folder prior to copying the web site.
Check the Use Recycle Bin option to have WebCopy move all deleted files into the Recycle Bin. If this option is unchecked, files will be permanently erased.
The WebCopy GUI application will prompt to continue if the Empty website folder before copy option is set and files are present in the destination folder. The console CLI client will not prompt.
Configuring the Crawler
Working with local files
- Extracting inline data
- Remapping extensions
- Remapping local files
- Updating local time stamps
- Using query string parameters in local filenames
Controlling the crawl
- Content types
- Crawling above the root URL
- Crawling additional hosts
- Crawling additional root URLs
- Downloading all resources
- Including sub and sibling domains
- Limiting downloads by file count
- Limiting downloads by size
- Limiting scans by depth
- Limiting scans by distance
- Scanning data attributes
- Setting speed limits
- Working with Rules
Creating a site map
- Aborting the crawl using HTTP status codes
- Defining custom headers
- Following redirects
- HEAD vs GET for preliminary requests
- HTTP Compression
- Origin reports
- Saving link data in a Crawler Project
- Setting cookies
- Setting the web page language
- Specifying a User Agent
- Specifying accepted content types
- Using Keep-Alive