Visual Web Ripper Logo Visual Web Ripper Logo
Welcome Guest Search | Active Topics | Log In | Register

Tag as favorite
Experiencing issues with variable tables
JC
#1 Posted : Thursday, July 21, 2011 5:43:21 PM
Groups: Registered
Joined: 11/8/2010
Posts: 18
Hello,

I am trying to crawl a site that has the info within separate tables down each listing's page.

It seems every listing removes a table when there is no info available, which seems to throw the collection out of whack.

When I crawl it, the software simply does not collect anything below the first table set when the # of tables changes from the original content selections. Is there an efficient way to tell VWR to ignore the variable positioning and simply collect what's available?

I have attached the project.

Thank you!
File Attachment(s):
tab_var.rip (236kb) downloaded 25 time(s).
Sequentum Support
#2 Posted : Sunday, July 24, 2011 11:37:36 PM

Groups: Administrators
Joined: 4/10/2010
Posts: 1,239
Location: Sydney, Australia
Hi,

You can use Filter to get a precise selection.
JC
#3 Posted : Monday, July 25, 2011 12:58:47 PM
Groups: Registered
Joined: 11/8/2010
Posts: 18
Sequentum Support wrote:
Hi,

You can use Filter to get a precise selection.
Well, I used the filter option to specify the rows I needed, and I even tried setting it up so that it would just collect everything and then I'd chuck it during cleanup.

Both attempts were again skewed when the results showed a variable # of tables.

We've paid for a lot of projects from you guys - and I'd honestly prefer not to pay for another one when we have 90% of it done. It would be great if you could maybe specify a bit on this particular problem? Using the filter option in two different project builds by selecting either elements or row text did not work. Do you have any other suggestions please?

Thank you
Sequentum Support
#4 Posted : Wednesday, July 27, 2011 2:09:20 AM

Groups: Administrators
Joined: 4/10/2010
Posts: 1,239
Location: Sydney, Australia
I'm not quite sure which element you have problem. I 've created a sample element named "MySelection" using manually edited XPath. It will select data based on Table header and first column.
File Attachment(s):
tab_var.rip (246kb) downloaded 29 time(s).
JC
#5 Posted : Tuesday, August 09, 2011 2:12:46 PM
Groups: Registered
Joined: 11/8/2010
Posts: 18
Sequentum Support wrote:
I'm not quite sure which element you have problem. I 've created a sample element named "MySelection" using manually edited XPath. It will select data based on Table header and first column.

Thank you for taking a crack at it but it looks like the site has changed now. Though my original question is still valid... just looking for some clarity please.

The HOAs and condos are mixed together. if you go here and type a space in the first field, then search - you'll find that there are two different layouts of data throughout the site.

Example 1 (HOAs): No committee member tables:
https://secure.utah.gov/hoa/details.html?id=1

Example 2 (Condos): Listed committee member tables as well as very different information given:
https://secure.utah.gov/hoa/details.html?id=19


So... my main question is, how do I get VWR to recognize and collect the data we want from each page when there are variations to how the content is displayed down the list of links?

Thank you
Sequentum Support
#6 Posted : Thursday, August 11, 2011 2:54:51 AM

Groups: Administrators
Joined: 4/10/2010
Posts: 1,239
Location: Sydney, Australia
Hi,

You can check at MySelection element as an example. You might need to study XPath to customise your selection.

//TABLE[TBODY/TR[1]/TH[1][startswith(., 'Committee Member')]][2]/TBODY/TR[TD[1][.='Zip:']]/TD[2]

In the example above, I select the table that has title started with "Committee Member". There can be many tables. However, I select only the second table (notice [2]). If found that table, then select the row that has the text "Zip:" and select data from the second column.
File Attachment(s):
tab_var.rip (247kb) downloaded 26 time(s).
Users browsing this topic
Guest
Tag as favorite
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.

Powered by YAF 1.9.4 RC1 | YAF © 2003-2009, Yet Another Forum.NET
This page was generated in 0.091 seconds.