Sometimes you may wish to adjust a URL that WebCopy has detected when crawling a page, before the URL is further processed. For example, you might want to convert simple JavaScript navigation links into a URL, or remove a middle-man redirection page.

To configure the URL transforms

  • From the Project Properties dialogue, expand the Advanced category and select the URL Transforms option

Adding a new transform

  1. Click the Add button
  2. In the Expression field, enter the search pattern using a regular expression
  3. Enter the text to replace the matched pattern with in the Replacement field
  4. Optionally, if you only wish the transform to be executed for specific pages, enter the URL into the URL Expression field. This field supports regular expressions.

If your expression includes any of the ^, [, ., $, {, *, (, \, +, ), |, ?, <, > characters and you want them to processed as plain text, you need to "escape" the character by preceding it with a backslash. For example, if your expression was application/epub+zip this would need to be written as application/epub\+zip otherwise the + character would have a special meaning and no matches would be made. Similarly, if the expression was example.com, this should be written as example\.com, as . means "any character" which could lead to unexpected matches.

Deleting a transform

  1. Select the transform you wish to remove from the list
  2. Click the Delete button

Updating a transform

  1. Select the transform you wish to edit from the list. The Expression, Replacement and URL Expression fields will be updated to match the selection
  2. Enter new values as appropriate. The selected item in the list will be updated with the changes you specify

Changing the order transform are processed

  1. Select the transform you wish to move
  2. Click the Move Up and Move Down buttons to re-order the transform list

See Also

Configuring the Crawler

Working with local files

Controlling the crawl

JavaScript

Security

Modifying URLs

Creating a site map

Advanced

Deprecated features

© 2010-2024 Cyotek Ltd. All Rights Reserved.
Documentation version 1.10 (buildref #186.15944), last modified 2024-08-18. Generated 2024-08-18 08:01 using Cyotek HelpWrite Professional version 6.20.0