Data Mining

This is a discussion on "Data Mining" within the Classic ASP section. This forum, and the thread "Data Mining are both part of the Program Your Website category.


 Subscribe in a reader

Go Back   Webforumz.com > Main Forums > Program Your Website > Classic ASP

Notices




Closed Thread
 
LinkBack Thread Tools
  #1  
Old Mar 15th, 2005, 15:21
New Member
Join Date: Mar 2005
Posts: 8
Thanks: 0
Thanked 0 Times in 0 Posts
Data Mining

Good Day,

I'm looking for some tips on code to prevent data miners from data mining my site...

Currently I have an idea like this...
Code: Select all
   Const iTimeToBrowse = 120       ' Range in Secs
   Const iMaxRequests = 100         ' Number of hits (max)
   Const iBanTime = 60                 ' Prevent Access for XX Secs

   ' Log RemoteIP, Accesses, lastAccessTime
   ' If Now() < lastAccessTime + iTimeToBrowse Then
   '     Log RemoteIP, Increment Accesses + 1, lastAccessTime (previous)
   '     if Accesses > iMaxRequests Then
   '          Response.Status = 501   ' ban code in here
   '          ' make it exit all future routines
   '     end if
   ' End If
(First, the idea and 1/6 code is from a PHP/Perl? script from another forum I found this past weekend (at webmasterworld? or some place...)

Also the code is just a pseudo code as I'm sleepy right now =)

Anybody have any ideas on how to prevent dataming without creating an even heavy load on the server(s) trying to prevent them?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!

  #2  
Old Mar 16th, 2005, 14:36
New Member
Join Date: Mar 2005
Posts: 8
Thanks: 0
Thanked 0 Times in 0 Posts
Ok, here is code I came up with lastnight (Based off of a PHP Version and I will try to get a link, but will take some time...) -- I didn't write this from scratch; I covnerted some PHP Code to ASP... (AKA I'm not trying to rip off someone elses work, I just can't find it at the moment =(

Code: Select all
   <% 
   	Const cTimeRange = 5	' Range in Secs
   	Const cHitLimit  = 40	' Limit of max hits
   	Const cPenaltyTime = 60	' Penalty in seconds
   
   
   	Dim sIP, dTime, dFuture
   	sIP = Request.ServerVariables("REMOTE_ADDR")
   	dTime = Now()
   
   ' First Attempt at Site
   	If IsEmpty(Application(sIP)) Then
   	   Application.Lock
 	 Application(sIP) = dTime		 'DateAdd("s", cTimeRange, dTime)
   	   Application(sIP & "Count") = cLNG("0")
   	   Application.Unlock
   ' N'th Attempt at Site
   	Else
   	   Application(sIP & "Count") = cLNG(Application(sIP & "Count")) + 1
   	   If Application(sIP) < dTime Then 
   	' It's been a while, clear them
   		  Application.Lock
   		  Application(sIP) = dTime
   		  Application(sIP & "Count") = cLNG("0")
   		  Application.Unlock
   	   End If
   	   dFuture = DateAdd("s", cTimeRange, Application(sIP))
   
   	   If dFuture >= DateAdd("s", (cTimeRange * cHitLimit), dTime) Then
      ' Penalize them!
   		   Application.Lock
 		 Application(sIP) = DateAdd("s", (cTimeRange*(cHitLimit-1) + cPenaltyTime), dTime)
   		   Application.Unlock
   		   Response.Status = 503
 		 Response.Write "<html><body>

Server under heavy load
"
 		 Response.Write "Please wait " & cPenaltyTime & " seconds and try again</p></body></html>"
   		   Response.End
   	   Else
   		   Application.Lock
   		   Application(sIP) = dFuture
   		   Application.Unlock
   	   End If
   	End If
   %>
   Time to Clear <%= Application(sIP) %> 

   Count <%= Application(sIP & "Count") %>
Also Note the Time Delays need to be adjusted. The Above is the time delays I used for testing...

Anyone have Ideas or Suggestions on this code? Good Bad or otherwise?
(I also have it posted here: http://forums.aspfree.com/t51140/s.html; as noone seems to be worried about miners (or doesn't have problems)...)
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
  #3  
Old Mar 16th, 2005, 16:03
Highly Reputable Member
Join Date: Jul 2003
Location: Ipswich, UK
Posts: 690
Thanks: 0
Thanked 0 Times in 0 Posts
hi, can you explain more about what you are trying to stop people from doing?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
  #4  
Old Mar 16th, 2005, 16:31
New Member
Join Date: Mar 2005
Posts: 8
Thanks: 0
Thanked 0 Times in 0 Posts
What am I trying to stop?

Currently we house information on Court Information... Our seaching ability allows for a LOT of play (... it needs fixed I know ...)

People Do a Search like: All Information of this type since 1/1/1990 to 1/1/1991
And then, with those results, they loop through each one until they have all the information that we have... (Could be 0 or 10,000+)

In itself its not that bad - we don't want them doing it but we can't actually stop them - now the problem is that some of these people are in a hurry and when they are in a hurry the searches they perform can be rather Taxing on the servers and cause "500 Internal Server Error" to ANYONE who accesses our site at that point because the servers are responding to some people data mining operations.

I can't base it off the User-Agent as it's generic IE and well, who doesn't use IE? =)

We get about 21-24 million hits per month (900,000ish per day) and our servers do great EXCEPT when the minors run wild...

All I need is to throttle the minor so they don't hose my servers at this point in time.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
  #5  
Old Mar 16th, 2005, 17:31
Most Reputable Member
Join Date: Jul 2003
Posts: 1,856
Thanks: 0
Thanked 0 Times in 0 Posts
Could you not use a session or cookie to check the last time a visitor accessed a page on your site? You could add in a minimum amount of time between requests...say... 1 second maybe 2?

As this is based on a search and searches are taxing you could make the minimum time slightly longer and regular visitors wouldn't mind...
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
  #6  
Old Mar 16th, 2005, 18:02
New Member
Join Date: Mar 2005
Posts: 8
Thanks: 0
Thanked 0 Times in 0 Posts
Session:
We don't really use sessions because we have a cluster (All MS Sessions with clusters all have problems) -- We simulate sessions through a database BUT everyone shares the same session... Complicated to explain but there is reason behind it...
Also I am not sure the server is seeing sessions on the miners (not sure how to tell to be honest... I do know that MOST of the miners have no referers (not all, but most)

Cookie:
This can be easily avoided so it should remain server side (I myself disable cookies on a few domains)

Application:
I know this is a PER Server thing (like session) but it is more global so if by chance they have 4 browsers all hitting the server (or simulation of 4 browsers) it will still catch them...

SQL Server:
The only way I can figure to keep all servers in sync is to make them log to a Database but I'm not sure I really want the extra traffic to my database and it will make me have to run extra tasks to keep the database fairly clean (I won't need data longer than 1 day so why keep it when I have it logged in the IIS Logs)
I really don't know how performance would be with this kind of thing... (It is possible I will go this way though; just seems like overkill)

Files:
This could be a mess - save IPs into text files? Ouch...
1: Clean up of unused files
2: TONS of files in 1 directory (or series of dirs (more load))
3: Server Space -- Most data is in DB so space sometimes becomes issues (when they forget to archive the logs)

Files would be ok for a small site with not somany users (5000ish?)
Cookies would be ok if we required cookies to get the info
Sessions I am unsure of
SQL I am unsure of
Application - the only other option i see?
(Memory shouldn't be an issue... that I can see =)

How does IIS handle Sessions? Isn't it per browser or per IP? If brwoser, when the miner figures that out it will be over... IPs for a few minors will be rather difficult for them to change as they are registered specifically for them...

My biggest thing is to implement this WITHOUT affecting other ASP Code so it will either be an include file to call a routine or another COM Object before the one already there for the primary script.

Don't get me wrong, anyone can still sell me on another idea, but I need to see the Whys of another method or how mine fails...

Quote:
As this is based on a search and searches are taxing you could make the minimum time slightly longer and regular visitors wouldn't mind...
This I do agree with =)
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
  #7  
Old Mar 16th, 2005, 23:07
spinal007's Avatar
Moderator
Join Date: Mar 2004
Location: Good Ol'London
Age: 23
Posts: 1,669
Blog Entries: 1
Thanks: 1
Thanked 4 Times in 4 Posts
micah:
because your method only tracks the user by their ip and doesnt take into account the page address, won't it stop you from using Response.Redirect?
how do you work around it, or do your just not use it?
Last Blog Entry: Random String in Javascript (Apr 21st, 2008)
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
  #8  
Old Mar 17th, 2005, 08:58
Highly Reputable Member
Join Date: Jul 2003
Location: Ipswich, UK
Posts: 690
Thanks: 0
Thanked 0 Times in 0 Posts
You should make your users register before they can use your site, then if they want to search they have to login, then you can check how many searches there are per logged in user and limit each user to a certain amount of searches per day/hour/minute?

Or

Search cache - store similar searches (results) as flat html pages so that the amount of hits to your database is decreased. Or i think SQL Server has something built in for caching queries, im not sure, you'll have to search google.

Also ASP.NET has alot of data caching features that means less pressure on your database, maybe its time to upgrade?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
  #9  
Old Mar 17th, 2005, 15:33
Most Reputable Member
Join Date: Jul 2003
Posts: 1,856
Thanks: 0
Thanked 0 Times in 0 Posts
You might think of what you're trying to prevent as a small scale DoS attack. Any of the resources on preventing those should apply:

http://www.google.com/search?en&q=pr...ng+dos+attacks
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
  #10  
Old Mar 18th, 2005, 12:33
New Member
Join Date: Mar 2005
Posts: 8
Thanks: 0
Thanked 0 Times in 0 Posts
Make the users register... I like it but my bosses would freak! =) hehe
(Lot of ticked off users too... Bad ATM... We plan for some pay for services one day, we just haven't been able to get the time to work on it.)

Cache Searches could make since in one aspect... (The current code really doesn't allow this (explaining why would be a possibly security risk))

But the Minors, this wouldn't help with at all as we have tons of information...
(I have no statisitics for you but I know it is really high and all live data) ---
We have 12 different Databases and each database may have 1 year to 10 years online... (doesn't sound so bad until you start thinking of individual cases)

Also, I know the miner will have multiple page requests within the same second...

A Reminder (I think I posted it above) - I cannot modify code at this time =( -- I can create a new function on the ASP Side, but I can't modify any of the code used to process data...

... Oh, thought of a change to my previous code...
Code: Select all
      ' Penalize them!
In this block, instead of sending a 503 error, send them one of those images with text and make them type it in to continue... Possibly, each time they get the image, make the background of the image darker (In case they have OCR Software)

This might be better as then the server says its up and forces a user to type in a code to continue... Should only affect the minors and not aggitate the normal users... (May need to explain the reason of it on the page though =)

Catalyst -> Thanks, I will look into your search results =)
Everyone else -> Thanks =)

I'm still game to ideas... I just can't change code... The reason my code above will work is because it changes a script (2 lines of code within the script and it goes above all other work)

The ideas you have would be really useful later on (When I'm allowed to change code - or rewrite more like it)
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
  #11  
Old Mar 18th, 2005, 13:21
spinal007's Avatar
Moderator
Join Date: Mar 2004
Location: Good Ol'London
Age: 23
Posts: 1,669
Blog Entries: 1
Thanks: 1
Thanked 4 Times in 4 Posts
Quote:
send them one of those images with text and make them type it in to continue...
sounds like a good idea, and very professional.
I'd be impressed if I saw that on a website (of course I wouldnt coz I'm no risk...) 8)
Last Blog Entry: Random String in Javascript (Apr 21st, 2008)
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
  #12  
Old Mar 18th, 2005, 13:27
Highly Reputable Member
Join Date: Jul 2003
Location: Ipswich, UK
Posts: 690
Thanks: 0
Thanked 0 Times in 0 Posts
http://www.captcha.net/
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
  #13  
Old Mar 18th, 2005, 14:22
New Member
Join Date: Mar 2005
Posts: 8
Thanks: 0
Thanked 0 Times in 0 Posts
So you all like the image idea better than the temporary ban idea?

(I like it better to be honest.)
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Closed Thread

Tags
data, mining

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Similar Threads
Thread Thread Starter Forum Replies Last Post
Accessing Data OmiE Starting Out 1 May 9th, 2007 20:22
Get Data From CSV to MS SQL DB Brain-Chemical Databases 3 Feb 16th, 2006 06:25
sql data to xml bfelix Other Programming Languages 0 Jan 18th, 2006 17:39
Data Retrival xbenx Classic ASP 0 Jan 5th, 2006 02:01
Passing Data ... TheShadow Classic ASP 2 May 30th, 2005 09:31


All times are GMT. The time now is 15:48.


Powered by vBulletin®
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0 RC8
© 2003-2008 Webforumz.com : All Rights Reserved

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42