You can configure your project to automatically apply one or more cookies before crawling commences. This can be used to provide authentication cookies in the event that the built in authentication features of WebCopy are not sufficient.
Cookies stored in a WebCopy project must conform to the Set-Cookie header syntax. The minimum required is
<cookie-value> should be URL encoded if appropriate, WebCopy does not perform any automatic encoding.
If you use Forms, Passwords or Cookies to authenticate with a website, you should consider adding a custom rule to exclude any logout pages. Otherwise, if WebCopy detects this page, eventually it will access it and your session will be logged out, potentially affecting the remainder of the crawl.
To customise cookies
- From the Project Properties dialogue, expand the Advanced category and select the Cookies sub-category
Adding a cookie
- Click the Add button
- Enter the the cookie data into the Data field
Deleting a cookie
- Select one or more cookies that you wish to remove
- Click the Delete button
Updating a cookie
- Select the cookie to edit from the list. The Data field will be updated to contain the value of the cookie
- Enter new value for the cookie data
Reading cookies from an external file
To read cookies from an external file, enter the file name in the Read Cookies From field, or click the Browse button to select a file.
Only cookies in the Netscape cookie file format are supported.
Discarding session cookies
To discard any session cookies from the external file, ensure the Discard session cookies option is checked. When set, any cookies without an expiry date will be skipped.
Writing cookies to an external file
Once a copy operation has complete, WebCopy can optionally write all cookies into a file, using the Netscape cookie format. To write cookies to a file, enter the file name in the Write Cookies To field, or click the Browse button select a file.
Cookies are only written when performing a copy, not when performing a read-only scan.
Configuring the Crawler
Working with local files
- Extracting inline data
- Remapping extensions
- Remapping local files
- Updating local time stamps
- Using query string parameters in local filenames
Controlling the crawl
- Content types
- Crawling multiple URLs
- Crawling outside the base URL
- Downloading all resources
- Including additional domains
- Including sub and sibling domains
- Limiting downloads by file count
- Limiting downloads by size
- Limiting scans by depth
- Limiting scans by distance
- Scanning data attributes
- Setting speed limits
- Working with Rules
- Crawling private areas
- Manually logging into a website
- TLS/SSL certificate options
- Working with Forms
- Working with Passwords
Creating a site map
- Aborting the crawl using HTTP status codes
- Defining custom headers
- HEAD vs GET for preliminary requests
- HTTP Compression
- Origin reports
- Saving link data in a Crawler Project
- Setting the web page language
- Specifying a User Agent
- Specifying accepted content types
- Using Keep-Alive