Some sites may contain mix of HTTP and HTTPS URLs for the same host. For example, domain www.example.com
may have links to both http://www.example.com
and https://www.example.com
. If configured correctly, the HTTP requests should automatically redirect to the more secure HTTPS protocol, but if not, or if you want to copy a site without the interim redirects, WebCopy can help.
Note
When WebCopy forces a URL from HTTP to HTTPS, it marks the original URL as a redirect in the same way it would if the website automatically supported HTTP to HTTPS redirects. However, it will never make any requests to the original URL.
To automatically force all URLs to be HTTPS
- From the Project Properties dialogue, select the URL Normalization category
- Select the Always option in the Force HTTPS group
Important
If a site doesn't support HTTPS this will cause copy jobs to fail
To automatically force URLs matching one or more hosts to be HTTPS
- From the Project Properties dialogue, select the URL Normalization category
- Select the Custom option in the Force HTTPS group
- Enter the host names that you wish to force to be HTTPS
To disable forced HTTPS
- From the Project Properties dialogue, select the URL Normalization category
- Select the Never option in the Force HTTPS group
See Also
Configuring the Crawler
Working with local files
- Extracting inline data
- Remapping extensions
- Remapping local files
- Updating local time stamps
- Using query string parameters in local filenames
Controlling the crawl
- Content types
- Crawling multiple URLs
- Crawling outside the base URL
- Downloading all resources
- Including additional domains
- Including sub and sibling domains
- Limiting downloads by file count
- Limiting downloads by size
- Limiting scans by depth
- Limiting scans by distance
- Scanning data attributes
- Setting speed limits
- Working with Rules
JavaScript
Security
- Crawling private areas
- Manually logging into a website
- TLS/SSL certificate options
- Working with Forms
- Working with Passwords
Modifying URLs
Creating a site map
Advanced
- Aborting the crawl using HTTP status codes
- Cookies
- Defining custom headers
- HEAD vs GET for preliminary requests
- HTTP Compression
- Origin reports
- Redirects
- Saving link data in a Crawler Project
- Setting the web page language
- Specifying a User Agent
- Specifying accepted content types
- Using Keep-Alive