This is a search results page for yellowpages in Australia. These results are listed as a series of blocks running down the page and seperated by grey lines.
Each address block has an element id or page area id. within the address block is the name of the club which have an element id, there is also an address element and a phone number element. The link to website which may or may not appear at the bottom of each address block has a string within which is the webaddress that is linked to. The email link which also may or may not appear at the bottom of each address block opens a pop up window and contained within the page source is the email address that is used to contact the relevant company.
What I would like to extract from each block is: The name at the top of each block plus the actual address and phone number within each block along with the website url contained within the website link of each block if it exists and also the email contained within the page source of the popup window from the email link in each block if it exists.
I would like the project to be able to accept a list of urls (the individual page urls from a multi page search result)
This should be an easy project to create, except for these two issues:
1.
The email address is placed in a hidden form field and cannot be selected in the browser. It is necessary to use the tree view to select the hidden form field. Select a visible form field, which will be close to the hidden form field, and click the tree view toolbar button. The tree view will open and mark the visible form field you selected. Now, look for the hidden form field containing the email address and then right click on that element. Choose "Select Element in Browser" from the context menu. The hidden form field is still not visible in the browser, but you can see in the Capture Window that the element is selected.
2.
The next page navigation link is an image with a URL that includes a session ID. Visual Web Ripper will by default include the session ID i the selection path, so the selection will not work when the session expires. The selection path needs to be manually edited to remove the session ID.
Download demo project and sample data extract
YellowpagesEmail.zip