While you may not wish to crawl external sites, it is still possible to download any files directly linked from the site you are crawling without having to change multiple settings. When the Download all resources option is set, WebCopy will automatically download any external file, as long as the reported content type is not text/html
. The downloaded file will not be crawled, allowing easy downloading of linked images, sounds and other files.
Enabling the downloading of all resources
- From the Project Properties dialogue, select the General category
- Check the Download all resources option to always download non-HTML content
Note
Rules can still be used to exclude content even when this setting is enabled.
See Also
Configuring the Crawler
Working with local files
- Extracting inline data
- Remapping extensions
- Remapping local files
- Updating local time stamps
- Using query string parameters in local filenames
Controlling the crawl
- Content types
- Crawling multiple URLs
- Crawling outside the base URL
- Including additional domains
- Including sub and sibling domains
- Limiting downloads by file count
- Limiting downloads by size
- Limiting scans by depth
- Limiting scans by distance
- Scanning data attributes
- Setting speed limits
- Working with Rules
JavaScript
Security
- Crawling private areas
- Manually logging into a website
- TLS/SSL certificate options
- Working with Forms
- Working with Passwords
Modifying URLs
Creating a site map
Advanced
- Aborting the crawl using HTTP status codes
- Cookies
- Defining custom headers
- HEAD vs GET for preliminary requests
- HTTP Compression
- Origin reports
- Redirects
- Saving link data in a Crawler Project
- Setting the web page language
- Specifying a User Agent
- Specifying accepted content types
- Using Keep-Alive