Visual Web Ripper Logo Visual Web Ripper Logo
Welcome Guest Search | Active Topics | Log In | Register

Tag as favorite
Scanning Websites for Presence of Analytics JS code
jkatinger
#1 Posted : Friday, July 16, 2010 10:28:43 AM
Groups: Registered
Joined: 7/16/2010
Posts: 4
Location: USA
Hi, I'm trying to use your product to scan my client websites and look at the SOURCE of each page to detect the presence of web analytics code. I want to tell your tool to crawl all pages of www.examplesite.com and look through the source code for an instance of "/media/js/hbx_parameters.js" (or any other string I define). The result should be a report, CSV preferably, where each row is a page URL and column 2 would be "Yes" or "No" for the presence of the string I fed it.

Additionally, if I could give the tool a JS variable name, and have it return back the value it finds, instead of just Yes or No, that would be even better. Example, I tell it to scan www.examplesite.com for ['pageTracker._setAccount', 'XXXXXXXX'] and it would tell me if it finds anything or not, and if it does it would give me the value of XXXXXXXX.

Hope this makes sense?
Sequentum Support
#2 Posted : Friday, July 16, 2010 10:22:31 PM

Groups: Administrators
Joined: 4/10/2010
Posts: 1,239
Location: Sydney, Australia
Visual Web Ripper is not a web spider and it cannot scan a website. It follows navigation and extraction patterns you define in a data extraction project. So basically, you need to tell Visual Web Ripper how to navigate a website. It can't just scan all web pages.

It may be easy enough to get Visual Web Ripper to navigate the complete website (depending on the complexity of the website), and then you could do what you want. I'm not sure if www.examplesite.com is the actual website you want to target, because it doesn't seem to contain any instances of "/media/js/hbx_parameters.js".
jkatinger
#3 Posted : Sunday, July 18, 2010 7:36:05 PM
Groups: Registered
Joined: 7/16/2010
Posts: 4
Location: USA
The site I'm actually interested in analyzing is www.webcpa.com. But it sounds like this isn't what your tool does. What if I had a series of URLs in a CSV file that I could instruct your tool to open and check for the presence of JS?
Sequentum Support
#4 Posted : Sunday, July 18, 2010 7:44:21 PM

Groups: Administrators
Joined: 4/10/2010
Posts: 1,239
Location: Sydney, Australia
That should work. If you attach a short list of real URLs I can show you how this can be done.
jkatinger
#5 Posted : Sunday, July 18, 2010 8:33:48 PM
Groups: Registered
Joined: 7/16/2010
Posts: 4
Location: USA
How about a massive list of URLs? :) See attached, cut down the list as you see fit. The object I'm looking for in the source code of each page is the Google Analytics account number UA-219761-62
File Attachment(s):
toppages-subdomain-www.webcpa.com.xls (1,109kb) downloaded 39 time(s).
Sequentum Support
#6 Posted : Sunday, July 18, 2010 8:59:22 PM

Groups: Administrators
Joined: 4/10/2010
Posts: 1,239
Location: Sydney, Australia
The input file must be a CSV file and not an Excel file, but it can be as big as you want.

I've attached demo project, input file and sample data extract. The demo project extracts any analytics account number found on the page (doesn't need to be UA-219761-62), or an empty value if no account number is found.

File Attachment(s):
Webcpa.zip (108kb) downloaded 44 time(s).
jkatinger
#7 Posted : Sunday, July 18, 2010 10:40:32 PM
Groups: Registered
Joined: 7/16/2010
Posts: 4
Location: USA
That definitely seems to do the trick. However I have no idea how you did that. And when I try to change the account number, or feed it something different to look for in that list of URLs, it doesn't seem to "take." Any chance I could get a quick phone/skype demo of how to modify this report?
Sequentum Support
#8 Posted : Sunday, July 18, 2010 11:11:06 PM

Groups: Administrators
Joined: 4/10/2010
Posts: 1,239
Location: Sydney, Australia
Here is how you do it:

1. Set the Input Data Source. Click the Input Data source toolbar button and select your input CSV file.

2. Click the toolbar button "Project options" and open the "Start URLs" tab.

3. Set the radio button "Feed URLs from input data source" and then close the project options window.

2. Add a content element of type PageAttribute, and select HTML in the Element options tab to the right.

3. While still editing the content element, click the More options tab and find the "Content transformation" option, then click "Click to edit script"

4. Enter the regex script used to find the analytics JavaScript match. You can change this regex, but you need to know regex in order to do this. See http://www.regular-expre...ons.info/reference.html for a regex reference.

5. Save the new content element, and you're ready to run the project.
Users browsing this topic
Guest
Tag as favorite
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.

Powered by YAF 1.9.4 RC1 | YAF © 2003-2009, Yet Another Forum.NET
This page was generated in 0.104 seconds.