The HTTP protocol supports a number of "verbs" for issuing commands to a remote server. The most common is
GET, which requests a representation of the specified resource. A less common is
HEAD, which is similar to
GET but only returns the meta data about the resource, not the resource itself. If supported by remote hosts, this can reduce load by not loading the entire resource which is subsequently not used.
WebCopy uses the meta data (such as content type) to determine if a resource should be fully downloaded or skipped and so attempts to use
HEAD requests by default. The HTTP specification states that if a server does not support head, it should return status 405 (Method Not Allowed) but some servers return a misleading code such as 404 (Not Found) or 401 (Unauthorised).
When a new host is encountered during a crawl, and head checking is enabled, WebCopy will test the host by attempting to request the root document via
HEAD. If this is successful,
HEAD requests will be enabling for the host. If not successful, it will automatically disable
HEAD requests for that domain.
Unfortunately some web servers support
HEAD in piecemeal fashion - for one support request it was discovered that
HEAD was supported fine for requests that returned HTML, but for those that were returning images, a 404 was returned.
WebCopy allows you to disable heading checking at the project level. If this option is set, all automatic detection is disabled and all requests to retrieve resources will use