Visual Web Ripper Logo Visual Web Ripper Logo
Welcome Guest Search | Active Topics | Log In | Register

Tag as favorite
Device not ready
Moffice
#1 Posted : Tuesday, August 24, 2010 2:21:17 AM
Groups: Registered
Joined: 8/24/2010
Posts: 18
I runned the attached project for 1h 30min and then it stopped with that error (Device not ready).
I don't know if it's a bug or it's related to my computer or the site I was scannig.

I've used the latest version of visual web ripper, on Windows Server 2003, Service Pack 2.

I wold also like to adress an other question: How can I extract the class from a div?
For example: I have a list and one atribute is only listed in the div's class. It wold suffice if I could add a collum where to mark if I find the word "lelvel0" in the div's atributes. I've managed so far to extract this data by makeing separate lists/tables eatch haveing the atribute's name but this makes it paifull to analise the data, and repatching it takes time.
File Attachment(s):
Eniro-complete-B2+.rip (947kb) downloaded 28 time(s).
Sequentum Support
#2 Posted : Tuesday, August 24, 2010 4:00:36 AM

Groups: Administrators
Joined: 4/10/2010
Posts: 1,239
Location: Sydney, Australia
The error is most likely caused by a hardware failure, or because your computer goes into sleep mode. Try and turn off sleep mode on your computer.

You can create lists in many ways. Sometimes you may even have to manually edit the selection XPath to make it work correctly. In your case the easiest way is to use the "Free repeat" list option. This will give you a selection that includes a little bit too much, so you can right click on the company link and select the filter option "Must include this element". See attached project.
File Attachment(s):
Eniro-complete-B2%2b.rip (145kb) downloaded 30 time(s).
Moffice
#3 Posted : Tuesday, August 24, 2010 4:34:20 AM
Groups: Registered
Joined: 8/24/2010
Posts: 18
That's not the problem, I know how to select them all but the problem is that I want to get the data behind the visible text, from the div's attributes (the pay level / and maybe other data at other webpages)

Can this be done? or dose it have to be done using a regular expression in a post processing operation after extracting all the html content from the div?
Sequentum Support
#4 Posted : Tuesday, August 24, 2010 7:35:23 PM

Groups: Administrators
Joined: 4/10/2010
Posts: 1,239
Location: Sydney, Australia
I'm not completely sure I understand what you want to do.

You can extract the class attribute from the DIV element and then use regex content transformation to get any sub-stings from the attribute.

If you don't want to use regex, you can select directly on the class attribute. For example:

//DIV[contains(@class,'lelvel0')]

This will select the DIV element if its class attribute contains the string lelvel0.
Guest
#5 Posted : Wednesday, August 25, 2010 4:05:25 AM
Groups:
Joined: 4/10/2010
Posts: 112
Thanks, didn't see the atribute option for capture(sorry). Actualy that's what I was asking.

Also the "contains" comand in the Xpath is good to know. Where can I find the complete sintax for Xpath?

Thanks for all the help.
Sequentum Support
#6 Posted : Wednesday, August 25, 2010 4:25:03 AM

Groups: Administrators
Joined: 4/10/2010
Posts: 1,239
Location: Sydney, Australia
You'll get a list of supported XPath functions if you select the XPath options tab and then click the Help button.

Here is a good XPath reference:

http://www.w3schools.com/XPath/xpath_syntax.asp
Moffice
#7 Posted : Wednesday, August 25, 2010 7:45:56 AM
Groups: Registered
Joined: 8/24/2010
Posts: 18
Great, thanks - didn't know it existed, I thought that it was one of your features :).
These tools make things a lot easier.

I'll go to work then, thanks for the great support.
Moffice
#8 Posted : Wednesday, August 25, 2010 9:15:40 AM
Groups: Registered
Joined: 8/24/2010
Posts: 18
I have an other question(it's not related to the toppic but it's the same project):
How dose data row catch work when you have multi threding enabled for the webcrawler?

I have a list item named "Company" that I would like to write to the HDD every 1000 entries. I've tried to set the data_row_catch=1000 but it didn't do anything and it went all the way to 3000 entries without writing anything to the HDD.

Setting it to the company category just makes it a mess as some categories have 1 company as others have 2000.

Setting it to the company letter is the only option that I think will work, but I'm afraid that the program might crash due to Out of memory issues.
Sequentum Support
#9 Posted : Thursday, August 26, 2010 3:15:37 AM

Groups: Administrators
Joined: 4/10/2010
Posts: 1,239
Location: Sydney, Australia
Visual Web Ripper looks at a specific page loop when it determines if data should be written to disk, so you'll need to set the cache size on the category or category letter template. However, I don't think you'll have problems with memory in this project. Have you already tried?
Moffice
#10 Posted : Thursday, August 26, 2010 9:12:17 AM
Groups: Registered
Joined: 8/24/2010
Posts: 18
It ran OK on the Company Category Letter, thanks:)
the old version had some bugs, I see this one runs more smooth.
Users browsing this topic
Guest
Tag as favorite
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.

Powered by YAF 1.9.4 RC1 | YAF © 2003-2009, Yet Another Forum.NET
This page was generated in 0.108 seconds.