Many web servers can compress data prior to sending it, using a variety of different compression methods. When a client makes a request of a server, it includes the Accept-Encoding
header which informs the server which encoding methods the client supports. WebCopy supports the following compression methods
- Compress (Legacy)
- Deflate
- GZip
- Brotli (Non-standard)
- BZip2 (Non-standard)
- Identity (no compression, not directly selectable)
Note
Disabling compression will cause the Identity
value to be sent for the Accept-Encoding
header, informing the web server not to compress content before serving it.
Important
Encoding options generally only apply to static content, such as HTML, CSS and JavaScript. Other files such as Pdf or Zip are already compressed and normally won't be recompressed by a server.
To enable or disable compression
- From the Project Properties dialogue, select the HTTP Compression option group.
- Check or uncheck the types of compression methods you wish to support.
Tip
It is recommended to always ensure that at least Deflate and GZip options are enabled as this helps makes downloads of HTML and static content smaller and faster.
Important
Some servers may not support all available compression options. If you receive 406 (Not Accepted) errors when trying to copy a website, specifying an unsupported encoding could be the cause.
See Also
Configuring the Crawler
Working with local files
- Extracting inline data
- Remapping extensions
- Remapping local files
- Updating local time stamps
- Using query string parameters in local filenames
Controlling the crawl
- Content types
- Crawling multiple URLs
- Crawling outside the base URL
- Downloading all resources
- Including additional domains
- Including sub and sibling domains
- Limiting downloads by file count
- Limiting downloads by size
- Limiting scans by depth
- Limiting scans by distance
- Scanning data attributes
- Setting speed limits
- Working with Rules
JavaScript
Security
- Crawling private areas
- Manually logging into a website
- TLS/SSL certificate options
- Working with Forms
- Working with Passwords
Modifying URLs
Creating a site map
Advanced
- Aborting the crawl using HTTP status codes
- Cookies
- Defining custom headers
- HEAD vs GET for preliminary requests
- Origin reports
- Redirects
- Saving link data in a Crawler Project
- Setting the web page language
- Specifying a User Agent
- Specifying accepted content types
- Using Keep-Alive