Visual Web Ripper Logo Visual Web Ripper Logo

Highlighted features

Project Summary


Yellow page website

requested: 3/5/2010 version: 2.33.12

Demonstrates how to extract data from a yellow page website, including email from a hidden form field.

Target URL: http://yellowpages.com.au/search/listingsSearch.do?region=australia&headingCode=11894&sortByDetail=t...

Download demo project and sample data extract YellowpagesEmail.zip

Request

This is a search results page for yellowpages in Australia. These results are listed as a series of blocks running down the page and seperated by grey lines.
Each address block has an element id or page area id. within the address block is the name of the club which have an element id, there is also an address element and a phone number element. The link to website which may or may not appear at the bottom of each address block has a string within which is the webaddress that is linked to. The email link which also may or may not appear at the bottom of each address block opens a pop up window and contained within the page source is the email address that is used to contact the relevant company.

What I would like to extract from each block is: The name at the top of each block plus the actual address and phone number within each block along with the website url contained within the website link  of each block if it exists and also the email contained within the page source of the popup window from the email link in each block if it exists.

I would like the project to be able to accept a list of urls (the individual page urls from a multi page search result)

Solution

This should be an easy project to create, except for these two issues:

1.

The email address is placed in a hidden form field and cannot be selected in the browser. It is necessary to use the tree view to select the hidden form field. Select a visible form field, which will be close to the hidden form field, and click the tree view toolbar button. The tree view will open and mark the visible form field you selected. Now, look for the hidden form field containing the email address and then right click on that element. Choose "Select Element in Browser" from the context menu. The hidden form field is still not visible in the browser, but you can see in the Capture Window that the element is selected.

2.

The next page navigation link is an image with a URL that includes a session ID. Visual Web Ripper will by default include the session ID i the selection path, so the selection will not work when the session expires. The selection path needs to be manually edited to remove the session ID.

 

Download demo project and sample data extract YellowpagesEmail.zip

Discuss this project

3/11/2010

Sadly the download link does not work as can be seem from the attached screen capture image I am not allowed access.

I have copy and pasted the url from the email, I have clicked directly on the link and I have typed it in by hand with the same results each time.
 
I have tried in both ie and firefox
 
regards
 
jim barnes

Attachment: error.jpg

4/17/2010

Sequentum Support

Please notice that this target website has changed and it is no longer possible to extract email addresses, since the email addresses are no longer sent to the client browser.

  Required Field - required field
Comment Required Field
Attachement
Loading...
Add
  • Very user friendly visual project designer.
  • Extract complete data structures, such as product catalogues.
  • Repeatedly submit forms for all possible input values.
  • Extract data from highly dynamic web sites including AJAX web sites.
  • Web data extraction scheduler with email notifications and logging.
  • Custom post-processing and comprehensive API.
  • Only $299 including 1 year maintenance.

© 2009-2010 Sequentum  |  Terms & Conditions  |  Privacy Statement  |  Login