When you analyse a website using WebCopy, a map is created of all incoming and outgoing links for each scanned URL, and other relevant attributes such as response code, content type and more.
By default, link map information will be saved into your project file, however if your web site has many links you may wish to disable this. Also by default, the link map is automatically preserved each time the source web site is scanned. This can also be changed on a per project basis.
The link map can be viewed from within WebCopy.
Many of the advanced functionality of WebCopy requires a link map to be present. If you disable the saving of link map data, some functionality may be impaired.
Important
If link information is not preserved in a crawler project, or is cleared prior to starting a crawl, features such as only downloading new or updated files will not be available.
To toggle the saving of the link map in a project file
- From the Project Properties dialogue, expand the Advanced category select the Link Map category
- Check or uncheck the Save link information in project field
- Optionally, check or the Include headers option to save HTTP request and response headers
Note
Saving of headers may cause project files to be much larger, and performance of open/save operations may be affected. Required information, such as content type or content size is always stored regards of if all header data is stored.
To toggle the clearing of link maps before analysing a web site
- From the Project Properties dialogue, expand the Advanced category select the Link Map category
- Check or uncheck the Clear link information before scan field
See Also
Configuring the Crawler
Working with local files
- Extracting inline data
- Remapping extensions
- Remapping local files
- Updating local time stamps
- Using query string parameters in local filenames
Controlling the crawl
- Content types
- Crawling multiple URLs
- Crawling outside the base URL
- Downloading all resources
- Including additional domains
- Including sub and sibling domains
- Limiting downloads by file count
- Limiting downloads by size
- Limiting scans by depth
- Limiting scans by distance
- Scanning data attributes
- Setting speed limits
- Working with Rules
JavaScript
Security
- Crawling private areas
- Manually logging into a website
- TLS/SSL certificate options
- Working with Forms
- Working with Passwords
Modifying URLs
Creating a site map
Advanced
- Aborting the crawl using HTTP status codes
- Cookies
- Defining custom headers
- HEAD vs GET for preliminary requests
- HTTP Compression
- Origin reports
- Redirects
- Setting the web page language
- Specifying a User Agent
- Specifying accepted content types
- Using Keep-Alive