Important
This functionality is currently under review and may be removed in a future version of WebCopy. If you currently use this feature, we would be grateful if you could email [email protected] and explain your use case for the feature.
If the site to be copied has links to domains that you wish to automatically convert to the domain being crawled, you can use the Domain Aliases function.
To customise domain aliases
- From the Project Properties dialogue, select the Domain Aliases category
Adding a new alias
- Click the Add button
- In the Expression field, enter the name of the domain that will be aliased. This field supports regular expressions
Important
If your expression includes any of the ^
, [
, .
, $
, {
, *
, (
, \
, +
, )
, |
, ?
, <
, >
characters and you want them to processed as plain text, you need to "escape" the character by preceding it with a backslash. For example, if your expression was application/epub+zip
this would need to be written as application/epub\+zip
otherwise the +
character would have a special meaning and no matches would be made. Similarly, if the expression was example.com
, this should be written as example\.com
, as .
means "any character" which could lead to unexpected matches.
Deleting an alias
- Select one or more of the aliases you wish to remove from the list
- Click the Delete button
Updating an alias
- Select the alias you wish to edit from the list. The Expression field will be updated to match the selection
- Enter new value for the alias pattern.
See Also
Configuring the Crawler
Working with local files
- Extracting inline data
- Remapping extensions
- Remapping local files
- Updating local time stamps
- Using query string parameters in local filenames
Controlling the crawl
- Content types
- Crawling multiple URLs
- Crawling outside the base URL
- Downloading all resources
- Including additional domains
- Including sub and sibling domains
- Limiting downloads by file count
- Limiting downloads by size
- Limiting scans by depth
- Limiting scans by distance
- Scanning data attributes
- Setting speed limits
- Working with Rules
JavaScript
Security
- Crawling private areas
- Manually logging into a website
- TLS/SSL certificate options
- Working with Forms
- Working with Passwords
Modifying URLs
Creating a site map
Advanced
- Aborting the crawl using HTTP status codes
- Cookies
- Defining custom headers
- HEAD vs GET for preliminary requests
- HTTP Compression
- Origin reports
- Redirects
- Saving link data in a Crawler Project
- Setting the web page language
- Specifying a User Agent
- Specifying accepted content types
- Using Keep-Alive