Automated crawlers such as WebCopy can scan websites substantially faster than a human browser. This can result in small servers being overloaded, or server administrators to put in place automated blocks for clients that pull too much data at once. WebCopy includes some basic limit settings that the discerning user can enable to comply with the rules of remote hosts.
Disabling limits
To disable all limits and let the crawler run at maximum speed
- From the Project Properties dialogue, select the Speed Limits category
- Select the Do not use limits option
Limiting to specific requests per second
To allow only a maximum number of URLs to be processed per second
- From the Project Properties dialogue, select the Speed Limits category
- Select the Limit to requests per second option
- Enter the maximum number of requests WebCopy is allowed to perform in the Maximum requests per second field
Limiting to specific requests per minute
To allow only a maximum number of URLs to be processed per minute
- From the Project Properties dialogue, select the Speed Limits category
- Select the Limit to requests per minute option
- Enter the maximum number of requests WebCopy is allowed to perform in the Maximum requests per minute field
See Also
Configuring the Crawler
Working with local files
- Extracting inline data
- Remapping extensions
- Remapping local files
- Updating local time stamps
- Using query string parameters in local filenames
Controlling the crawl
- Content types
- Crawling multiple URLs
- Crawling outside the base URL
- Downloading all resources
- Including additional domains
- Including sub and sibling domains
- Limiting downloads by file count
- Limiting downloads by size
- Limiting scans by depth
- Limiting scans by distance
- Scanning data attributes
- Working with Rules
JavaScript
Security
- Crawling private areas
- Manually logging into a website
- TLS/SSL certificate options
- Working with Forms
- Working with Passwords
Modifying URLs
Creating a site map
Advanced
- Aborting the crawl using HTTP status codes
- Cookies
- Defining custom headers
- HEAD vs GET for preliminary requests
- HTTP Compression
- Origin reports
- Redirects
- Saving link data in a Crawler Project
- Setting the web page language
- Specifying a User Agent
- Specifying accepted content types
- Using Keep-Alive