Template Actions

A template can perform two kinds of actions:

  • Full page load
  • JavaScript

Full page load actions are used to navigate to a completely new web page, and JavaScript actions are used to trigger JavaScript events and AJAX requests on the current web page.

Detecting the Appropriate Action Automatically

Visual Web Ripper can often automatically detect the most appropriate template action. Visual Web Ripper will automatically configure the template action if you set the template action to AutoDetect. The template action will be configured when you open the template in the editor for the first time.

Full Page Load Options

Selecting the Full page load option will give you access to the following action options:

Full Page Load Options

Start new web browser

 


(applicable only to Link and FormSubmit
templates)

When Visual Web Ripper runs in WebBrowser mode, a single browser instance will be used to navigate the website. If you are iterating through a long list of links, Visual Web Ripper will click on a link to open a new webpage, extract data from the new webpage, and then move back to the list of links in order to proceed with the next link.

This can be a slow process, because Visual Web Ripper needs to navigate two webpages for each link in the list. First it navigates the link and then it navigates back to the list of links.

If you were navigating the list of links manually, you would probably open each link in a new window or browser tab to avoid navigating back to the list of links each time. Visual Web Ripper can do the same if you select the Start new web browser option.

AJAX before full page load

When navigating to a new webpage, navigation should start immediately. Visual Web Ripper does not wait for navigation to start. Instead, it assumes navigation has failed if it does not begin within a few seconds. Sometimes websites execute a JavaScript before navigation to show a "Please wait..." message. Such a JavaScript may take more than a few seconds to execute, especially if the JavaScript activates an AJAX call. In such a case, you can select the AJAX before full page load option to ensure Visual Web Ripper does not report a page load failure.

Data collector

 

(applicable only to Link and LinkArea templates)

Visual Web Ripper can extract data in WebBrowser or WebCrawler mode. WebCrawler mode is much faster, but it does not work on all webpages, because it ignores JavaScript and all dynamic content.

Websites often use dynamic content only on some webpages. For example, a website may use JavaScript for page navigation on a search result webpage, but once you click on the detail link for each search result, JavaScript is no longer used. In such cases, you can improve data extraction performance significantly by switching to WebCrawler mode when clicking on the detail links.

See WebCrawler Collector for more information about extracting data in WebCrawler mode.

Link transformation

 

 

(applicable only to Link and LinkArea templates)

Link transformation is used to transform a link URL before Visual Web Ripper opens the URL. This is often useful when the link calls a JavaScript that opens a new webpage. Because the link calls a JavaScript, you cannot use WebCrawler mode. However, you can often determine what the static URL would be and use Link transformation to transform the JavaScript call into a static URL, which allows you to use WebCrawler mode to open the webpage.

For example, a link may activate a JavaScript that looks like this:

openNewWebPage('/products/category/products.aspx?id=3456')

You can use Link transformation to transform the above JavaScript into the following static URL:

/products/category/products.aspx?id=3456

See Transformation Scripts for more information about transformation scripts.

 

JavaScript Options

There are three different kinds of JavaScript actions.

  1. Asynchronous. All JavaScript is executed synchronously by default, but JavaScript can use the function setTimeout to execute JavaScript code after a given timeout, and a JavaScript function or event may therefore return before all related JavaScript code has been executed. Visual Web Ripper cannot automatically wait for asynchronous JavaScript to execute, so you need to make additional configurations to help Visual Web Ripper determine when the JavaScript has completed executing.
  2. Synchronous. Synchronous JavaScript does not return before all JavaScript code has been executed, so Visual Web Ripper can automatically wait for synchronous JavaScript to execute.
  3. AJAX. AJAX is a special form of asynchronous JavaScript. Visual Web Ripper can often hook into AJAX requests, and therefore automatically determine when an asynchronous AJAX request has completed. Visual Web Ripper will not always be able to hook into AJAX requests, and asynchronous JavaScript should be used in such cases.

Selecting the Asynchronous JavaScript option gives you access to the following action options:

Asynchronous JavaScript Options

Wait for element

Visual Web Ripper is unable to detect when asynchronous JavaScript has completed, except by looking for changes on the webpage. Visual Web Ripper will continue waiting for  asynchronous JavaScript  until the content of the selected Wait Element changes.

The Wait Element can be any appropriate template or content element in your project. Visual Web Ripper will wait for the HTML element that is selected by the Wait Element. If you do not choose a Wait Element, Visual Web Ripper automatically chooses the first content element in the template as the Wait Element.

Script wait condition

Visual Web Ripper goes through a default list of steps when waiting for  asynchronous JavaScript  to complete. The default steps work for many scenarios, but sometimes you need to add a wait script to tell Visual Web Ripper when it should stop waiting for asynchronous JavaScript to complete.

Advanced Options

The following advanced action options are rarely used, but may be needed to extract data from some websites:

Advanced Action Options

Partial page load

(applicable only to Full page load actions)

When a webpage loads, it passes through the following three states:

1.       Loading

2.       Interactive

3.       Completed

When a webpage is Loading, it is not accessible. Visual Web Ripper always has to wait for this state to pass. When a webpage is Interactive, some parts of the webpage have loaded, but other parts, such as images, are still loading. The webpage is accessible in Interactive mode, but Visual Web Ripper may not have access to all the elements on the webpage. When a webpage is Completed, it has completely finished loading all the elements of the webpage (except for dynamic content that is delay-loaded with AJAX).

Sometimes a website may load external content, such as ad images. The external content may load very slowly, which can significantly slow down data extraction performance. You can use the Partial page load option to tell Visual Web Ripper that it can begin processing a webpage when the page enters Interactive mode. Visual Web Ripper will wait a given time interval in Interactive mode before beginning to process the webpage. The time interval is set in the Connection tab for project options.

Sometimes you may want Partial page load to apply to all actions in a project and not just a specific template action. You can set the Partial page load option for the entire project in the Connection tab for project options.

Click to get new URL

(applicable only to Full page load actions)

When you switch from WebBrowser to WebCrawler collector or use the Start new web browser option, Visual Web Ripper tries to extract the URL from the selected link element and then navigates to this URL in WebCrawler mode or in the new web browser.

Sometimes the selected link element uses JavaScript to navigate to the new webpage. In that case, Visual Web Ripper will be unable to extract a static URL from the link element.

You can use Click to get new URL to tell Visual Web Ripper that it should click on the link to try to extract the URL. Visual Web Ripper will click on the link and begin navigating, but as soon as it knows where the browser is navigating it will stop navigation and extract the URL.

AJAX in page areas

(applicable only to JavaScript actions)

PageArea templates limit element selections to a specific area of a webpage. If a PageArea template has a Link sub-template and this sub-template is opened, the page area no longer has any effect, because Visual Web Ripper has navigated to a new webpage and the page area no longer exists.

If the Link sub-template has an AJAX or JavaScript action, Visual Web Ripper does not navigate to a new webpage, but may instead load dynamic content onto the same webpage. The page area may therefore still exist after an AJAX or JavaScript action, and you can use the AJAX in page areas option to specify that the page area should still be applicable after an AJAX or JavaScript action.

Form submit links

(applicable only to Full page load actions in WebCrawler mode)

When Visual Web Ripper opens a Link template in WebBrowser mode, it clicks on the selected link to navigate to a new webpage. Because Visual Web Ripper clicks on the link, it will still navigate to the new webpage regardless of whether it is a real link or a form button.

In WebCrawler mode, Visual Web Ripper always extracts the URL property of a link and navigates to that URL. If the selected link element is a form button, the Web Crawler is unable to navigate, because the form button doesn't have a URL property. The Form submit links option can be used to tell Visual Web Ripper that the link element is actually a form button and that it should try to submit the form in order to navigate to the new webpage.

Action events

Default template actions always fire a single-click event on the selected element. Sometimes you may need to fire other events to perform the desired action. For example, an AJAX action may need to hover over an HTML element in order to activate a dynamic popup window and extract data from it. In such a case, you can add the onMouseOver event, causing Visual Web Ripper to fire that event instead of the click event.

Visual Web Ripper supports three non-standard events. The domclick event fires the click event and performs the default action on the selected element. The domscroll event scrolls to the bottom right corner of the selected element. The windowscroll event scrolls the window of the selected element to the bottom right corner. 

Redirect on meta refresh

(applicable only to Full page load actions)

Some webpages refresh or redirect after a short time interval. Visual Web Ripper redirects immediately if a meta refresh HTML tag is identified, but you can use this option if you want Visual Web Ripper to stay on the webpage without redirecting.

Visit each page only once

(applicable only to Full page load actions)

A Link list template follows all selected links, even duplicate links. You can use this option to tell Visual Web Ripper that it should not navigate a link that has already been navigated.

Visual Web Ripper looks only at the static URL of a link element. Many websites use sessions or post variables to generate different versions of a webpage with the same URL, so you should use this option carefully. Otherwise, you may end up extracting data from only a single webpage when you expected to extract data from many more webpages.

Click on exact element

If a Link template selects an HTML element that is not a link element, Visual Web Ripper looks for child elements that are link elements. If it finds a link element, it clicks on this element instead of the selected element.

Some websites use JavaScript to attach click events to HTML elements that are not normally link elements. You can use the Click on exact element option to tell Visual Web Ripper that it should click on the selected element and not look for child link elements.

Delay after action

Some websites fail or block you if they detect that you are navigating too quickly within a website. You can use the Delay after action option to set a fixed delay after a template action.

If you want to apply this option to all actions in the entire project, use the Page load delay option in the Connection tab for project options.

Is block popup

(applicable only to Full page load actions)

Sometimes a website may generate a popup window as part of the page load event. Visual Web Ripper navigates automatically to the URL of this popup window. If, for example, the popup window is an advertisement that does not interest you, use this option to block such popup windows.

Restart browser session

(applicable only to Full page load actions)

This option can be used to emulate a web browser restart before the action.

Repeated AJAX calls

(applicable only to AJAX actions)

Use this option if you want to repeat an AJAX action as long as the action results in an AJAX callback. This option can be used in combination with the actions windowscroll or domscroll to process content that is dynamically loaded when a window or a web element is scrolled down.

Wait element change optional

(applicable only to JavaScript actions)

Sometimes Visual Web Ripper should not wait for asynchronous JavaScript to make changes to a Wait element, but should only wait until the Wait element exists on the webpage.

Wait element change optional on first action

(applicable only to JavaScript actions)

Use this option when you want Visual Web Ripper to wait for a Wait element, except when clicking on the first link in a list.

For example, you may be extracting images from an image gallery and have a list of links that use AJAX to load the different images into the gallery. The first image in the gallery may have been pre-loaded when the webpage first loaded, so waiting for the image to change after clicking the first image link would fail.