WebCopy will attempt to create local filenames based on the remote URL, with a numeric prefix if a file already exists.
Without deterministic filenames, multi-threaded crawls process URLs in an unpredictable order, potentially causing the same URLs to receive different numeric suffixes across runs. This makes comparing crawls difficult, breaks automated workflows that depend on consistent paths, and prevents reliable incremental updates. To address this, WebCopy can embed a CRC32 checksum of the remote URL in the filename, ensuring the same URL always generates the same local filename regardless of processing order.
Example file names:
| Remote URL | Non-deterministic | Deterministic |
|---|---|---|
/products | products.html | products-6dd9dcef.html |
/products?onlyinstock=1 | products-1.html | products-9169ec94.html |
It is still possible for filenames to be non-deterministic if the full path name is longer than 260 characters as it will then get automatically truncated.
This feature is automatically enabled for new WebCopy projects. It remains disabled when opening projects saved using older versions of WebCopy, although can be toggled on.