HTTP persistent connection, also called HTTP keep-alive, or HTTP connection reuse, is the idea of using a single TCP connection to send and receive multiple HTTP requests/responses, as opposed to opening a new connection for every single request/response pair. This can improve crawl performance but can sometimes cause requests to fail with a protocol violation error.
To disable keep-alive
- From the Project Properties dialogue, select the Advanced category
- Uncheck the Keep alive option
Note
It is not recommended this option unless you are having specific problems with a given website.
See Also
Configuring the Crawler
Working with local files
- Extracting inline data
- Remapping extensions
- Remapping local files
- Updating local time stamps
- Using query string parameters in local filenames
Controlling the crawl
- Content types
- Crawling multiple URLs
- Crawling outside the base URL
- Downloading all resources
- Including additional domains
- Including sub and sibling domains
- Limiting downloads by file count
- Limiting downloads by size
- Limiting scans by depth
- Limiting scans by distance
- Scanning data attributes
- Setting speed limits
- Working with Rules
JavaScript
Security
- Crawling private areas
- Manually logging into a website
- TLS/SSL certificate options
- Working with Forms
- Working with Passwords
Modifying URLs
Creating a site map
Advanced
- Aborting the crawl using HTTP status codes
- Cookies
- Defining custom headers
- HEAD vs GET for preliminary requests
- HTTP Compression
- Origin reports
- Redirects
- Saving link data in a Crawler Project
- Setting the web page language
- Specifying a User Agent
- Specifying accepted content types