Hello
Thanks for the fast reply and the files you sent me, they both work.
There is another question, do you have any idea how to remove, on the fly the watermarks from images when extracting.
For example, see http://buletinulauto.ro/vanzari-masini-Audi-an-fabricatie-1993-capacitate-1896-cmc--351870.html
I hope you know what C# or VB classes can be used at post-processing.
Regards
George
|
I'm not sure it's so easy to remove watermarks, but if you just want to crop the images to get rid of the yellow/orange URLs, then you should be able to use the Bitmap class. Here is a tutorial:
|
Hi
I want to extract info from a detail page, but some pages/products do have certain attributes while other do not have.
Please tell me how to extract the attributes Size, Sizes, Clothing Sizes, Colours, Flavour (the title, description and price seems to work well but I'll appreciate if you'll also send these fields) that are present on some pages while on other are absent. There are also 2 types of images, ones handled by javascript when clicked and plays as links, while other are not links. I don't know when to handle each image (I need the image itself as well as the URL to the online image)
PS. Here are a few of the urls with products with specific attributes:
http://www.milletsports.co.uk/triathlon/bikes-and-accessories/lights-and-computers/polar-elastic-straps/
http://www.milletsports.co.uk/triathlon/shoes/mens/brooks-adrenaline-gts-9-mens-running-shoes-black/
http://www.milletsports.co.uk/running/sports-nutrition/powders-and-sachets/maximuscle-promax-extreme-high-protein-908g-2lbs/
http://www.milletsports.co.uk/hockey/clothing/mens/tops/adidas-mens-t8-clima-polo/
The urls with different kind of images:
http://www.milletsports.co.uk/triathlon/goggles-and-masks/speedo-rift-tri-power-adult-goggles/
http://www.milletsports.co.uk/triathlon/shoes/mens/brooks-adrenaline-gts-9-mens-running-shoes-black/
Thanks for you help
|
It all depends on how you want to structure your data, but you can have a look at the attached project which should work on all your listed URLs.
In regards to the images, you can use an alternate content element to cover both possible image positions, or you can just change the selection xpath slightly, which is what I've done in the attached project.
Attachment: Milletsports3.zip
|
Hi
Thanks for your help, there is another site I would like to extract data from - can you get it a try?
http://www.tgw.com/customer/search2.jsp?scid=1086&sortmfr=N
PS. I actually using the Web Content Extractor from http://newprosoft.com for my extraction needs and looking for a soft that is better and cheaper
|
It does take at least 1-2 hours for us to create a demo project, so we can only create one demo project per user, but if you have specific questions, then you are more than welcome to submit a support enquiry.
I don't think you'll find anything that's cheaper than "Web Content Extractor", but you can probably find something that's better.
|
My specific question is - how to extract info form a page with dependent listboxes as on this page
If you select the #3 Wood value on the Wood(s) listbox, the value on the Loft listbox is 19.0
but if you select the Wood(s) to be #4, then Loft is 22.0
Hope this is not difficult to response
|
You need two "buttonless" form templates in this case. One for each listbox. Each of the form fields need to have a page load action, and you also need to set the option "Always fire event" on both form fields. The loft listbox is not available for some brands, so you need to set the "Optional" option on the loft form field.
See attached project and sample data extract.
Attachment: Tgw.rar
|
Thank you very much for your fast reply.
I can see no Always fire event. Is it on the Options tab? Which subtab?
You said that each form fields need to have a page load action - you mean the Full page load radiobutton on the Action subtab of the Options tab?
Regards and thank you again for you help. I have sent the same question for the WCE support, but I received still no response for the same question.
Will ask my customer I extract the info for to move to Visual Web Ripper
|
The "Always fire event" is an advanved option and is located in the "More" options tab. Normally, Visual Web Ripper will not fire an event when selecting a value that is already selected, but in this case the listbox fires the event all the time.
Yes, I mean the "Full page load" option. This causes Visual Web Ripper to wait for a full page load when a value is selected in the listbox.
Normally you don't need two form templates in such scenarios, but your taget website works a bit odd with these list boxes.
|
I have played all day today with dependent listbox, but it doesn't work correctly, can you upload here a small demo file.
Thank you for your valued help
|
Did the demo project I attached earlier (included in Tgw.rar) not work correctly?
|
It does work, but it doesn't include extraction from a detail page with dependent listboxes.
|
Here is the file you sent me, I changed it a little to include page navigation, but it doesn't work correctly, I can do pagination if there is a next >> link on page as in your demos, but for pages with 1,2,3 links I'm not sure how to manage it. Can you take a look?
Also, include, please, a dependent listboxes extractor in the same file and sent it back to me
Thank you very much
Attachment: Tgw.zip
|
I must have misunderstood something. My original project handled the dependent list boxes Brand and Loft, but you've removed Loft from the project, so I assume you're referring to some other dependent list boxes?
When adding a link area navigation template, you need to select a list of links. See attached project where I've selected a list of links, but excluded the last link "all", since that's not really a page.
Attachment: Tgw.rar
|
I meant a dependent listbox extractor applied to this page - http://www.tgw.com/customer/category/product.jsp/SUBCATEGORY_ID/14670/refScid/1086/sattr0/Loft/sattrVal0/ht%2813%29
PS. If you have the original Tgw.rar you posted 1st time, I would like to download it again, because it seems that when you upload a new file, it rewrite the previous
|
Ok, that's a completely different scenario of course. See attached project that extracts hand and loft for a single product.
Attachment: Tgw2.rar
|
Hello
It's almost what I did want, can you extend the extractor for all the child listboxes (Shaft Type, Flex), I tried by myself bit it's not 100% clean.
Please apply the logic for this page - http://www.tgw.com/customer/category/product.jsp/SUBCATEGORY_ID/10542/refScid/1086/sattr0/Shaft+Type/sattrVal0/steel
|
It would be better if you try this link - http://www.tgw.com/customer/category/product.jsp/SUBCATEGORY_ID/9706/refScid/1086/sattr0/Shaft+Type/sattrVal0/steel
Here the price is changing too, I want to extract the price and the SKU#
Can you do this, please?
|
Attachment: Tgw2.rar
|
Hi
I want to extract some info from:
http://www.golfonline.co.uk/taylormade-r9-supertri-tp-driver-2010-p-6719.html, http://www.golfonline.co.uk/yes-golf-stealex-callie-putter-shaft-weighting-system-p-6840.html and
http://www.golfonline.co.uk/ben-sayers-ladies-mx4-golf-set-graphite-shaft-p-6808.html
I need only info from dependent listboxes, under the Available Options:
Options
Bag
Colours
Loft
Flex
Shaft Options
Lenght
Weight
Can you send me a demo?
Regards
George
|
George, please submit a separate request for a demo project here:
|