Extracting Content from a Webpage
You can configure a data extraction project to extract content from a webpage by adding content elements to a template. You can add content elements to a template by using one of these methods:
- Right-click on an HTML element in the web browser and select Add Content from the context menu.
- Left-click on an HTML element in the web browser and click the New button in the Elements window.
- Press Shift as you left-click on an HTML element in the web browser.
After you have added a content element of type Element, you can choose Capture Type from the options window. Visual Web Ripper can extract any property from the selected HTML element, such as text, HTML or an element attribute.
Fine-Tuning a Content Selection
Content may be located in multiple locations on a single type of webpage. For example, you may have search results that span several pages where a content selection extracts content well from the first page, but not on the following pages. In such a scenario, you need to fine-tune the content selection manually so that it works for all pages in the search results.
A content transformation script is often used in conjunction with content elements to modify the extracted data. A content transformation script can extract smaller pieces from the extracted data. For example, a single HTML element may contain a full address. The content element extracts the full address into one data field, but you can use content transformation to extract the state or zip code from the full address.