Visual Web Ripper Logo Visual Web Ripper Logo
Welcome Guest Search | Active Topics | Log In | Register

Tag as favorite
Custom XPath functions in web scraping projects
Sequentum Support
#1 Posted : Thursday, September 09, 2010 5:46:14 AM

Groups: Administrators
Joined: 4/10/2010
Posts: 1,239
Location: Sydney, Australia
XPaths are used to select HTML elements on a web page. XPath syntax is very flexible, but sometimes you may need to create a custom XPath function to get the exact selection you want.

Custom XPath functions are C# or VB.NET scripts and are only available when you manually edit the selection XPath for a content or template.

This example shows how to create an XPath that selects all HTML elements with text matching a Visual Web Ripper Input Parameter.

Step 1

Add the input parameter that will contain the text to search for.



Step 2

Create the custom XPath function that will compare the current node text with the Input Parameter.

Custom XPath functions are created/edited from the Advanced options tab. Custom XPath functions are shared across the entire project.



Custom XPath functions can have any number of parameters, but the optional WrXpathArguments must be the first parameter. The return type does not need to be a Boolean, but could also be an Integer for example.

Code:
using System;
using mshtml;
using VisualWebRipper;
public class Script
{    
    public static bool SelectTagByText(WrXpathArguments args, string tagText)
    {
        try
        {
            if(args.InputParameters.Contains("tag_text"))
                return args.InputParameters["tag_text"].Equals(tagText,
                    StringComparison.InvariantCultureIgnoreCase);
            else
                return false;                
        }
        catch(Exception exp)
        {
            args.WriteDebug(exp.Message);
            return false;
        }
    }    
}


Step 3

Add a content element and set the XPath manually to:

//*[SelectTagByText(.)]

This XPath will select all HTML elements on the web page that contain inner text that matches the value of the input parameter "tag_text".

Notice that the XPath method argument WrXpathArguments is automatically added by Visual Web Ripper, so you don't have to specify this argument when you use the custom XPath method.

Please see this post for more information about Input Parameters:

http://www.visualwebripp...-scraping-projects.aspx

File Attachment(s):
xpathDemo.rip (26kb) downloaded 98 time(s).
Users browsing this topic
Guest
Tag as favorite
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.

Powered by YAF 1.9.4 RC1 | YAF © 2003-2009, Yet Another Forum.NET
This page was generated in 0.068 seconds.