Cyotek has provided a demonstration website, demo.cyotek.com that can be used to test many WebCopy features. A corresponding project, demo.cwp, ships with WebCopy and is designed to work with this site and demonstrate functionality.

This topic describes the customisations made to the project as a guide to how you may wish to use similar functionality when creating your own copy jobs.

Forms

The website has a faux protected area that you can log into using a form. The Capture tool was used to create a definition for automatically logging into the website before copying starts.

The presence of features/authenticationprofile.html in the downloaded website indicates the login was successful.

Passwords

Although a little old fashioned these days as many websites prefer to use form based authentication as above, some websites still do issue 401 challenges. The project has two passwords defined, a global one which will be used for any challenge by the site, and another which is tied to a specific page.

If you copy the website and open statuscodes/4xx/401basic.html you will note that it references guest2, whilst statuscodes/4xx/401digest.html will reference guest3, showing that the different credentials were used accordingly.

Rules

Rules can be used to exclude parts of a website. In this demo, a rule has been created to exclude the /features/downloadtest.php URL as the files it creates aren't actual valid.

Another rule also exists to exclude /features/authenticationlogout.php as it is somewhat pointless logging into a website only to log right back out again!

Download All Resources

The Download All Resource option is used to allow non-HTML resources to be downloaded from any location, unless explicitly disabled by rules. By default, the Download All Resources option is set for new projects, however the project has this disabled. This stops the crawl from hitting third party sites.

Additional URLs

On occasion there may not be a direct link to a resource you want to copy. Several pages on the demonstration site are only accessible via JavaScript so as not to cause a default crawl to display a bunch of errors (it is a testing website after all!) and so the project also includes a pair of additional URLs to hit the 401 challenge pages mentioned above.

The free text field also demonstrates the use of comments and whitespace to break up the URL list.

Custom Attributes

Some websites use data attributes to contain links to other resources, typically images. The project defines one custom attribute, data-original used to pick up a pair of images in the website that otherwise wouldn't be detected.

The files assets/img/background3.png and assets/img/background4.png will be present in the copied website if this attribute is defined.

Custom Headers

Request headers can be used to direct the server how to respond, for example to use a different language or enable specific compression options. Custom headers are also supported which could be used for various purposes - for example an API key to access a resource. The project defines a single custom header, X-Transport-Version, to send with every request.

If you copy the website and open features/requestheaders.html the custom header and value will be referenced in this page.