As a simpler alternative to creating rules, you can give WebCopy a list of content types that want to download, and it will scan the website, downloading the allowed types and ignoring everything else.
Including all content types
To reset WebCopy to the default behaviour and include all resources regards of type
- From the Project Properties ue, select the Content Types category
- In the Content Types group, select Include all
Including only the selected content types
Important
This functionality does not work correctly in WebCopy 1.8 and lower if text/html
or text/css
is excluded. Please update to version 1.9 or higher.
To automatically download only a given set of content types and ignore all others
- From the Project Properties dialogue, select the Content Types category
- In the Content Types group, select Include only resources with the content types listed below
- In the Types to include field, enter each content type you wish to include, one per line
Tip
Click Select Types to display a dialogue box for selecting content types either from those detected in the site to be copied, or from a global database
Including everything except selected content types
To automatically download all documents except those matching specific content types
- From the Project Properties dialogue, select the Content Types category
- In the Content Types group, select Include all resources except for the content types listed below
- In the Types to exclude field, enter each content type you wish to exclude, one per line
Tip
Click Select Types to display a dialogue box for selecting content types either from those detected in the site to be copied, or from a global database
See Also
Configuring the Crawler
Working with local files
- Extracting inline data
- Remapping extensions
- Remapping local files
- Updating local time stamps
- Using query string parameters in local filenames
Controlling the crawl
- Crawling multiple URLs
- Crawling outside the base URL
- Downloading all resources
- Including additional domains
- Including sub and sibling domains
- Limiting downloads by file count
- Limiting downloads by size
- Limiting scans by depth
- Limiting scans by distance
- Scanning data attributes
- Setting speed limits
- Working with Rules
JavaScript
Security
- Crawling private areas
- Manually logging into a website
- TLS/SSL certificate options
- Working with Forms
- Working with Passwords
Modifying URLs
Creating a site map
Advanced
- Aborting the crawl using HTTP status codes
- Cookies
- Defining custom headers
- HEAD vs GET for preliminary requests
- HTTP Compression
- Origin reports
- Redirects
- Saving link data in a Crawler Project
- Setting the web page language
- Specifying a User Agent
- Specifying accepted content types
- Using Keep-Alive