Visual Web Ripper Logo Visual Web Ripper Logo

Highlighted features

Project Summary


Extracting data from a product catalogue

requested: 2/4/2010 version: 2.30.11

Demonstrates optional templates and "Continuing link area" page navigation templates.

Target URL: www.airgas.com

Download demo project and sample data extract Airgas.zip

Request

Starting at the airgas.com site the goal is to extract the image, part number, description, price, UOM for each product offered. Ideally one project would be able to navigate thru all the main categories, such as Gases, Safety Products, Janiatorial etc. and thru all of the sub categories under the main categories to return every product offered on the site.

An excel spreadsheet for the output would be preferred.

 

Thank you.  

   

Solution

This project demonstrates standard data extraction from a product catalogue.

The "Optional" template option is used for the category templates, because the number of category levels varies, so this allows Visual Web Ripper to skip categories if they don't exist.

The website implements a rarely used page navigation concept (the same concept is used by google search). The page navigation bar automatically moves forward as you click on page numbers. This kind of page navigation is handled by the "Continuing link area" option in the page navigation template.

I normally recommend using the WebCrawler collector for large product catalogues, but this website contains quite a lot of invalid HTML syntax, which makes it hard to design the project for the WebCrawler collector, and the project performs quite well in WebBrowser mode, so no effort has been made to try and optimize performance.

The first many products on the website don't have an image, which is why the sample data extract doesn't contain images, but the project will extract any images and save them to local disk.

Download demo project and sample data extract Airgas.zip

Discuss this project

2/11/2010

Could you tell me if there is a tutorial available which is similar in scope to what was done for the Airgas project? Would it be possible to provide a more detailed step by step process on how this project was created?
 
Thansk you. 

2/11/2010

Sequentum Support

All the techniques used to create this project are explained in the main introduction video, except for the navigation template. Navigation templates are explained in the second introduction video (although a bit outdated).

  Required Field - required field
Comment Required Field
Attachement
Loading...
Add
  • Very user friendly visual project designer.
  • Extract complete data structures, such as product catalogues.
  • Repeatedly submit forms for all possible input values.
  • Extract data from highly dynamic web sites including AJAX web sites.
  • Web data extraction scheduler with email notifications and logging.
  • Custom post-processing and comprehensive API.
  • Only $299 including 1 year maintenance.

© 2009-2010 Sequentum  |  Terms & Conditions  |  Privacy Statement  |  Login