Visual Web Ripper Logo Visual Web Ripper Logo

Highlighted features

Project Summary


Thumb images loading full size images using AJAX

requested: 2/18/2010 version: 2.31.10

Demonstrates how to extract full size images from a list of thumbnails loading the full size images using AJAX

Target URL: http://www.milletsports.co.uk/hockey/bags/stick-bags/mercian-skb-hockey-stick-kit-bag/

Download demo projects and sample data extracts Milletsports.zip

Request

Installed your soft today and there is a small trouble when trying to handle the AJAX images from this urls - http://www.milletsports.co.uk/hockey/bags/stick-bags/mercian-skb-hockey-stick-kit-bag/
 
Please give me an idea how to handle those AJAX images that are thumb images of the main image

Solution

The images can be extracted in two different ways, so I have attached two projects that demonstrate each approach.
 
The first project demonstrates how to extract the images using a link template with an AJAX action. It is important to notice that the first image is shown by default, so when Visual Web Ripper clicks on the first image it should not wait for the main image to change. Therefore, I've set the AJAX option "Wait is optional".
 
The second project demonstrates how to extract the images by ignoring AJAX and using content transformation instead, which is much faster. You'll notice the thumb images all ends with -x.jpg and the full size images have the same name but ends with -n.jpg. I use this information by simply extracting each thumb image, but I add a content transformation script that replaces "-x." with "-n.", so Visual Web Ripper will download the full size images instead of the thumbnails.
 

Download demo projects and sample data extracts Milletsports.zip

Discuss this project

2/18/2010

Hello
 
Thanks for the fast reply and the files you sent me, they both work.
 
There is another question, do you have any idea how to  remove, on the fly the watermarks from images when extracting.
For example, see http://buletinulauto.ro/vanzari-masini-Audi-an-fabricatie-1993-capacitate-1896-cmc--351870.html
 
I hope you know what C# or VB classes can be used at post-processing.
 
Regards
George

2/19/2010

Sequentum Support

I'm not sure it's so easy to remove watermarks, but if you just want to crop the images to get rid of the yellow/orange URLs, then you should be able to use the Bitmap class. Here is a tutorial:
 

2/28/2010

Hi
 
Still trying to learn how to use your soft to extract info from  http://www.milletsports.co.uk, but still I'm not able to finish it.
 
I want to extract info from a detail page, but some pages/products do have certain attributes while other do not have.
Please tell me how to extract the attributes Size, Sizes, Clothing Sizes, Colours, Flavour  (the title, description and price seems to work well but I'll appreciate if you'll also send these fields) that are present on some pages while on other are absent. There are also 2 types of images, ones handled by javascript when clicked and plays as links, while other are not links. I don't know when to handle each image (I need the image itself as well as the URL to the online image)
 
PS. Here are a few of the urls with products with specific attributes:
http://www.milletsports.co.uk/triathlon/bikes-and-accessories/lights-and-computers/polar-elastic-straps/
http://www.milletsports.co.uk/triathlon/shoes/mens/brooks-adrenaline-gts-9-mens-running-shoes-black/
http://www.milletsports.co.uk/running/sports-nutrition/powders-and-sachets/maximuscle-promax-extreme-high-protein-908g-2lbs/
http://www.milletsports.co.uk/hockey/clothing/mens/tops/adidas-mens-t8-clima-polo/
 
The urls with different kind of images:
http://www.milletsports.co.uk/triathlon/goggles-and-masks/speedo-rift-tri-power-adult-goggles/
http://www.milletsports.co.uk/triathlon/shoes/mens/brooks-adrenaline-gts-9-mens-running-shoes-black/
 
Thanks for you help

2/28/2010

Sequentum Support

It all depends on how you want to structure your data, but you can have a look at the attached project which should work on all your listed URLs.
 
In regards to the images, you can use an alternate content element to cover both possible image positions, or you can just change the selection xpath slightly, which is what I've done in the attached project.

Attachment: Milletsports3.zip

3/6/2010

Hi
 
Thanks for your help, there is another site I would like to extract data from - can you get it a try?
http://www.tgw.com/customer/search2.jsp?scid=1086&sortmfr=N

 
PS. I actually using the Web Content Extractor from http://newprosoft.com for my extraction needs and looking for a soft that is better and cheaper

3/7/2010

Sequentum Support

It does take at least 1-2 hours for us to create a demo project, so we can only create one demo project per user, but if you have specific questions, then you are more than welcome to submit a support enquiry.
 
I don't think you'll find anything that's cheaper than "Web Content Extractor", but you can probably find something that's better.

3/7/2010

My specific question is - how to extract info form a page with dependent listboxes as on this page
 
If you select the #3 Wood value on the Wood(s) listbox, the value on the Loft listbox is 19.0
but if you select the Wood(s) to be  #4, then Loft is 22.0
 
Hope this is not difficult to response

3/7/2010

Sequentum Support

You need two "buttonless" form templates in this case. One for each listbox. Each of the form fields need to have a page load action, and you also need to set the option "Always fire event" on both form fields. The loft listbox is not available for some brands, so you need to set the "Optional" option on the loft form field.
 
See attached project and sample data extract.

Attachment: Tgw.rar

3/7/2010

Thank you very much for your fast reply.
I can see no Always fire event. Is it on the Options tab? Which subtab?
You said that each form fields need to have a page load action - you mean the Full page load radiobutton on the Action subtab of the Options tab?
 
Regards and thank you again for you help. I have sent the same question for the WCE support, but I received still no response for the same question.
Will ask my customer I extract the info for to move to Visual Web Ripper

3/7/2010

Sequentum Support

The "Always fire event" is an advanved option and is located in the "More" options tab. Normally, Visual Web Ripper will not fire an event when selecting a value that is already selected, but in this case the listbox fires the event all the time.
 
Yes, I mean the "Full page load" option. This causes Visual Web Ripper to wait for a full page load when a value is selected in the listbox.
 
Normally you don't need two form templates in such scenarios, but your taget website works a bit odd with these list boxes.

3/8/2010

I have played all day today with dependent listbox, but it doesn't work correctly, can you upload here a small demo file.
 
Thank you for your valued help

3/8/2010

Sequentum Support

Did the demo project I attached earlier (included in Tgw.rar) not work correctly?

3/8/2010

It does work, but it doesn't include extraction from a detail page with dependent listboxes.

3/8/2010

Here is the file you sent me, I changed it a little to include page navigation, but it doesn't work correctly, I can do pagination if there is a next >> link on page as in your demos, but for pages with 1,2,3 links I'm not sure how to manage it. Can you take a look?
Also, include, please, a dependent listboxes extractor in the same file and sent it back to me
 
Thank you very much

Attachment: Tgw.zip

3/8/2010

Sequentum Support

I must have misunderstood something. My original project handled the dependent list boxes Brand and Loft, but you've removed Loft from the project, so I assume you're referring to some other dependent list boxes?

When adding a link area navigation template, you need to select a list of links. See attached project where I've selected a list of links, but excluded the last link "all", since that's not really a page.


Attachment: Tgw.rar

3/8/2010

I meant a dependent listbox extractor applied to this page - http://www.tgw.com/customer/category/product.jsp/SUBCATEGORY_ID/14670/refScid/1086/sattr0/Loft/sattrVal0/ht%2813%29
 
PS. If you have the original Tgw.rar you posted 1st time, I would like to download it again, because it seems that when you upload a new file, it rewrite the previous

3/9/2010

Sequentum Support

Ok, that's a completely different scenario of course. See attached project that extracts hand and loft for a single product.

Attachment: Tgw2.rar

3/9/2010

Hello
 
It's almost what I did want, can you extend the extractor for all the child listboxes (Shaft Type, Flex), I tried by myself bit it's not 100% clean.
Please apply the logic for this page - http://www.tgw.com/customer/category/product.jsp/SUBCATEGORY_ID/10542/refScid/1086/sattr0/Shaft+Type/sattrVal0/steel
 
 

3/9/2010

It would be better if you try this link - http://www.tgw.com/customer/category/product.jsp/SUBCATEGORY_ID/9706/refScid/1086/sattr0/Shaft+Type/sattrVal0/steel
 
Here the price is changing  too, I want to extract the price and the SKU#
 
Can you do this, please?

3/9/2010

Sequentum Support

See attached project.

Attachment: Tgw2.rar

5/20/2010

Hi
 
I want to extract some info from:
http://www.golfonline.co.uk/taylormade-r9-supertri-tp-driver-2010-p-6719.html, http://www.golfonline.co.uk/yes-golf-stealex-callie-putter-shaft-weighting-system-p-6840.html and
http://www.golfonline.co.uk/ben-sayers-ladies-mx4-golf-set-graphite-shaft-p-6808.html 
 
I need only info from dependent listboxes, under the Available Options:
Options
Bag
Colours
Loft
Flex
Shaft Options
Lenght
Weight
 
Can you send me a demo?
Regards
George

5/20/2010

Sequentum Support

George, please submit a separate request for a demo project here:
 

  Required Field - required field
Comment Required Field
Attachement
Loading...
Add
  • Very user friendly visual project designer.
  • Extract complete data structures, such as product catalogues.
  • Repeatedly submit forms for all possible input values.
  • Extract data from highly dynamic web sites including AJAX web sites.
  • Web data extraction scheduler with email notifications and logging.
  • Custom post-processing and comprehensive API.
  • Only $299 including 1 year maintenance.

© 2009-2010 Sequentum  |  Terms & Conditions  |  Privacy Statement  |  Login