Extracting data from a product catalogue
requested: 2/4/2010
version: 2.30.11
Demonstrates optional templates and "Continuing link area" page navigation templates.
Target URL: www.airgas.com
Download demo project and sample data extract
Airgas.zip
Starting at the airgas.com site the goal is to extract the image, part number, description, price, UOM for each product offered. Ideally one project would be able to navigate thru all the main categories, such as Gases, Safety Products, Janiatorial etc. and thru all of the sub categories under the main categories to return every product offered on the site.
An excel spreadsheet for the output would be preferred.
Thank you.
This project demonstrates standard data extraction from a product catalogue.
The "Optional" template option is used for the category templates, because the number of category levels varies, so this allows Visual Web Ripper to skip categories if they don't exist.
The website implements a rarely used page navigation concept (the same concept is used by google search). The page navigation bar automatically moves forward as you click on page numbers. This kind of page navigation is handled by the "Continuing link area" option in the page navigation template.
I normally recommend using the WebCrawler collector for large product catalogues, but this website contains quite a lot of invalid HTML syntax, which makes it hard to design the project for the WebCrawler collector, and the project performs quite well in WebBrowser mode, so no effort has been made to try and optimize performance.
The first many products on the website don't have an image, which is why the sample data extract doesn't contain images, but the project will extract any images and save them to local disk.
Download demo project and sample data extract
Airgas.zip