HTTP defines a number of status codes used to redirect one request to another, for example when the requested URL has been moved to another location, or when the URL can't be exactly matched, but similar locations are available. By default, WebCopy will following internal redirects only.
To configure how redirects are processed
- From the Project Properties dialogue, select the Redirects category
- Select a mode from the Redirects group
- Optionally, enter a value into the Maximum redirect chain length field. This value will be used to break chains where one redirect points to another without ever resolving to a concrete URL
Redirect modes
Option | Notes |
---|---|
Don't follow redirects | Specifies that no redirects should be followed |
Follow internal redirects | Follow redirects that point to the site currently being crawled |
Follow all redirects | Follows any redirect, regardless of destination. Note that crawl modes and rules should be used to control if any actions should be taken on the external URL |
See Also
Configuring the Crawler
Working with local files
- Extracting inline data
- Remapping extensions
- Remapping local files
- Updating local time stamps
- Using query string parameters in local filenames
Controlling the crawl
- Content types
- Crawling multiple URLs
- Crawling outside the base URL
- Downloading all resources
- Including additional domains
- Including sub and sibling domains
- Limiting downloads by file count
- Limiting downloads by size
- Limiting scans by depth
- Limiting scans by distance
- Scanning data attributes
- Setting speed limits
- Working with Rules
JavaScript
Security
- Crawling private areas
- Manually logging into a website
- TLS/SSL certificate options
- Working with Forms
- Working with Passwords
Modifying URLs
Creating a site map
Advanced
- Aborting the crawl using HTTP status codes
- Cookies
- Defining custom headers
- HEAD vs GET for preliminary requests
- HTTP Compression
- Origin reports
- Saving link data in a Crawler Project
- Setting the web page language
- Specifying a User Agent
- Specifying accepted content types
- Using Keep-Alive