This is a discussion on "Data Mining" within the Classic ASP section. This forum, and the thread "Data Mining are both part of the Program Your Website category.
|
|
|
|
|
![]() |
||
Data Mining
|
||
| Notices |
![]() |
|
|
LinkBack | Thread Tools |
|
#1
|
|||
|
|||
|
Data Mining
Good Day,
I'm looking for some tips on code to prevent data miners from data mining my site... Currently I have an idea like this...
Also the code is just a pseudo code as I'm sleepy right now =) Anybody have any ideas on how to prevent dataming without creating an even heavy load on the server(s) trying to prevent them? |
|
|
|
#2
|
|||
|
|||
|
Ok, here is code I came up with lastnight (Based off of a PHP Version and I will try to get a link, but will take some time...) -- I didn't write this from scratch; I covnerted some PHP Code to ASP... (AKA I'm not trying to rip off someone elses work, I just can't find it at the moment =(
Anyone have Ideas or Suggestions on this code? Good Bad or otherwise? (I also have it posted here: http://forums.aspfree.com/t51140/s.html; as noone seems to be worried about miners (or doesn't have problems)...) |
|
#3
|
|||
|
|||
|
hi, can you explain more about what you are trying to stop people from doing?
|
|
#4
|
|||
|
|||
|
What am I trying to stop?
Currently we house information on Court Information... Our seaching ability allows for a LOT of play (... it needs fixed I know ...) People Do a Search like: All Information of this type since 1/1/1990 to 1/1/1991 And then, with those results, they loop through each one until they have all the information that we have... (Could be 0 or 10,000+) In itself its not that bad - we don't want them doing it but we can't actually stop them - now the problem is that some of these people are in a hurry and when they are in a hurry the searches they perform can be rather Taxing on the servers and cause "500 Internal Server Error" to ANYONE who accesses our site at that point because the servers are responding to some people data mining operations. I can't base it off the User-Agent as it's generic IE and well, who doesn't use IE? =) We get about 21-24 million hits per month (900,000ish per day) and our servers do great EXCEPT when the minors run wild... All I need is to throttle the minor so they don't hose my servers at this point in time. |
|
#5
|
|||
|
|||
|
Could you not use a session or cookie to check the last time a visitor accessed a page on your site? You could add in a minimum amount of time between requests...say... 1 second maybe 2?
As this is based on a search and searches are taxing you could make the minimum time slightly longer and regular visitors wouldn't mind... |
|
#6
|
|||
|
|||
|
Session:
We don't really use sessions because we have a cluster (All MS Sessions with clusters all have problems) -- We simulate sessions through a database BUT everyone shares the same session... Complicated to explain but there is reason behind it... Also I am not sure the server is seeing sessions on the miners (not sure how to tell to be honest... I do know that MOST of the miners have no referers (not all, but most) Cookie: This can be easily avoided so it should remain server side (I myself disable cookies on a few domains) Application: I know this is a PER Server thing (like session) but it is more global so if by chance they have 4 browsers all hitting the server (or simulation of 4 browsers) it will still catch them... SQL Server: The only way I can figure to keep all servers in sync is to make them log to a Database but I'm not sure I really want the extra traffic to my database and it will make me have to run extra tasks to keep the database fairly clean (I won't need data longer than 1 day so why keep it when I have it logged in the IIS Logs) I really don't know how performance would be with this kind of thing... (It is possible I will go this way though; just seems like overkill) Files: This could be a mess - save IPs into text files? Ouch... 1: Clean up of unused files 2: TONS of files in 1 directory (or series of dirs (more load)) 3: Server Space -- Most data is in DB so space sometimes becomes issues (when they forget to archive the logs) Files would be ok for a small site with not somany users (5000ish?) Cookies would be ok if we required cookies to get the info Sessions I am unsure of SQL I am unsure of Application - the only other option i see? (Memory shouldn't be an issue... that I can see =) How does IIS handle Sessions? Isn't it per browser or per IP? If brwoser, when the miner figures that out it will be over... IPs for a few minors will be rather difficult for them to change as they are registered specifically for them... My biggest thing is to implement this WITHOUT affecting other ASP Code so it will either be an include file to call a routine or another COM Object before the one already there for the primary script. Don't get me wrong, anyone can still sell me on another idea, but I need to see the Whys of another method or how mine fails... Quote:
|
|
#7
|
||||
|
||||
|
micah:
because your method only tracks the user by their ip and doesnt take into account the page address, won't it stop you from using Response.Redirect? how do you work around it, or do your just not use it?
Last Blog Entry: Random String in Javascript (Apr 21st, 2008)
|
|
#8
|
|||
|
|||
|
You should make your users register before they can use your site, then if they want to search they have to login, then you can check how many searches there are per logged in user and limit each user to a certain amount of searches per day/hour/minute?
Or Search cache - store similar searches (results) as flat html pages so that the amount of hits to your database is decreased. Or i think SQL Server has something built in for caching queries, im not sure, you'll have to search google. Also ASP.NET has alot of data caching features that means less pressure on your database, maybe its time to upgrade? |
|
#9
|
|||
|
|||
|
You might think of what you're trying to prevent as a small scale DoS attack. Any of the resources on preventing those should apply:
http://www.google.com/search?en&q=pr...ng+dos+attacks |
|
#10
|
|||
|
|||
|
Make the users register... I like it but my bosses would freak! =) hehe
(Lot of ticked off users too... Bad ATM... We plan for some pay for services one day, we just haven't been able to get the time to work on it.) Cache Searches could make since in one aspect... (The current code really doesn't allow this (explaining why would be a possibly security risk)) But the Minors, this wouldn't help with at all as we have tons of information... (I have no statisitics for you but I know it is really high and all live data) --- We have 12 different Databases and each database may have 1 year to 10 years online... (doesn't sound so bad until you start thinking of individual cases) Also, I know the miner will have multiple page requests within the same second... A Reminder (I think I posted it above) - I cannot modify code at this time =( -- I can create a new function on the ASP Side, but I can't modify any of the code used to process data... ... Oh, thought of a change to my previous code...
This might be better as then the server says its up and forces a user to type in a code to continue... Should only affect the minors and not aggitate the normal users... (May need to explain the reason of it on the page though =) Catalyst -> Thanks, I will look into your search results =) Everyone else -> Thanks =) I'm still game to ideas... I just can't change code... The reason my code above will work is because it changes a script (2 lines of code within the script and it goes above all other work) The ideas you have would be really useful later on (When I'm allowed to change code - or rewrite more like it) |
|
#11
|
||||
|
||||
|
Quote:
I'd be impressed if I saw that on a website (of course I wouldnt coz I'm no risk...) 8)
Last Blog Entry: Random String in Javascript (Apr 21st, 2008)
|
|
#12
|
|||
|
|||
|
#13
|
|||
|
|||
|
So you all like the image idea better than the temporary ban idea?
(I like it better to be honest.) |
![]() |
| Tags |
| data, mining |
| Thread Tools | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Accessing Data | OmiE | Starting Out | 1 | May 9th, 2007 20:22 |
| Get Data From CSV to MS SQL DB | Brain-Chemical | Databases | 3 | Feb 16th, 2006 06:25 |
| sql data to xml | bfelix | Other Programming Languages | 0 | Jan 18th, 2006 17:39 |
| Data Retrival | xbenx | Classic ASP | 0 | Jan 5th, 2006 02:01 |
| Passing Data ... | TheShadow | Classic ASP | 2 | May 30th, 2005 09:31 |