Links a crawler should ignore

This is a discussion on "Links a crawler should ignore" within the Web Page Design section. This forum, and the thread "Links a crawler should ignore are both part of the Design Your Website category.



Go Back   Webforumz.com > Main Forums > Design Your Website > Web Page Design

Notices


Reply
 
LinkBack Thread Tools
  #1 (permalink)  
Old Oct 6th, 2006, 18:55
Junior Member
Join Date: Mar 2006
Age: 21
Posts: 20
Thanks: 0
Thanked 0 Times in 0 Posts
Links a crawler should ignore

Hi, I have developed some code that crawls web pages looking for links. I need to filter out irrelevant links such as those that refer to css, javascript functions, favicons, this is simple enough to achieve with regex. What i need to know is what other irrelevant links am i likely to find on web pages?
Also is there a name for links of the following form: -

http://www. bbc.co.uk/go/homepage/www/lht/h2/t/-/http://www.tvlicensing.co.uk/index.jsp

Cheers

Don
Reply With Quote

Reply

Tags
unusual links, web crawler

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Similar Threads
Thread Thread Starter Forum Replies Last Post
Getting Google to ignore pages not in sitemap? nate2099 Search Engine Optimization (SEO) 12 Jul 9th, 2008 08:35
web crawler not following links nate2099 Web Page Design 1 Feb 17th, 2008 01:59
My pet crawler jhappeal Webforumz Cafe 12 Mar 16th, 2007 18:51


All times are GMT. The time now is 09:19.


Powered by vBulletin®
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC8
© 2003-2008 Webforumz.com : All Rights Reserved

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43