Web Design and Development Forums

Adivse on robots.txt

This is a discussion on "Adivse on robots.txt" within the Search Engine Optimization (SEO) section. This forum, and the thread "Adivse on robots.txt are both part of the Search Engines and SEO category.

Old Feb 14th, 2008, 05:40   #1 (permalink)
Oak
 
Join Date: Dec 2007
Location: London
Age: 35
Posts: 266
Advice on robots.txt

Hi,

Please give advise about adding a robots.txt file to your website directory and search engines.

Thanks


Last edited by saltedm8; Feb 15th, 2008 at 22:52. Reason: title correction
Oak is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
Old Feb 14th, 2008, 08:44   #2 (permalink)
Highly Reputable Member
 
Join Date: Oct 2007
Location: Stockport
Age: 16
Posts: 738
Blog Entries: 1
Re: Adivse on robots.txt

i think this will help you

http://www.robotstxt.org/

craig
unitedcraig is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
Old Feb 14th, 2008, 16:16   #3 (permalink)
Oak
 
Join Date: Dec 2007
Location: London
Age: 35
Posts: 266
Re: Adivse on robots.txt

Yes I have been on that page but I am still not sure. What is best practice, for allowing and disallowing?

Will this do?

Quote:
# robots.txt for http://www.mysite.com

User-agent: Google
Disallow:

User-agent: *
Disallow: /
Thanks
Oak is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
Old Feb 14th, 2008, 16:25   #4 (permalink)
Moderator
 
spinal007's Avatar
 
Join Date: Mar 2004
Location: Good Ol'London
Age: 22
Posts: 1,609
Blog Entries: 1
Send a message via ICQ to spinal007 Send a message via MSN to spinal007 Send a message via Yahoo to spinal007 Send a message via Skype™ to spinal007
Re: Adivse on robots.txt

The above will stop every search engine from indexing your entire website.
It will work if that's what you want.
__________________
Diego - SEO Consultant London (My Blog | Fight Me)
jQuery: Star Rating - Multiple File Upload - FCKEditor/Codepress
Before we work on artificial intelligence why don't we do something about natural stupidity?
spinal007 is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
Old Feb 14th, 2008, 16:35   #5 (permalink)
Oak
 
Join Date: Dec 2007
Location: London
Age: 35
Posts: 266
Re: Adivse on robots.txt

Oh.. I thought that would stop all search engines except for Google. Please can you recommend best procedure. I mean obviously I don't want spam or hackers, what should I do?
Oak is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
Old Feb 14th, 2008, 16:59   #6 (permalink)
Chief Moderator
 
aso186's Avatar
 
Join Date: Oct 2007
Location: UK
Posts: 714
Blog Entries: 2
Send a message via Skype™ to aso186
Re: Adivse on robots.txt

If you're trying to protect against spam-bots, robots.txt isn't going to help you, since most evils will ignore it anyway

To only allow Google (out of all the 'honest' bots), swap the two statements you have, so that you first declare all user-agents disallowed, then overwrite the permissions for Google.
__________________

aso186 is online now  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
Old Feb 14th, 2008, 17:07   #7 (permalink)
Oak
 
Join Date: Dec 2007
Location: London
Age: 35
Posts: 266
Re: Adivse on robots.txt

Quote:
To only allow Google (out of all the 'honest' bots), swap the two statements you have, so that you first declare all user-agents disallowed, then overwrite the permissions for Google.
like this?
HTML: Select all
# robots.txt for http://www.mysite.com

User-agent:  *
Disallow:/

User-agent: Google
Disallow:
* refers to all ?

Last edited by Oak; Feb 14th, 2008 at 17:45.
Oak is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
Old Feb 14th, 2008, 18:06   #8 (permalink)
Chief Moderator
 
aso186's Avatar
 
Join Date: Oct 2007
Location: UK
Posts: 714
Blog Entries: 2
Send a message via Skype™ to aso186
Re: Adivse on robots.txt

Yeah. But you might want to wait for someone else to confirm this (I'm not 100% sure).

EDIT: Just checked it with Google Webmaster tools. All's good except you need to use Googlebot instead of just 'Google'.
__________________


Last edited by aso186; Feb 14th, 2008 at 18:09.
aso186 is online now  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
Old Feb 14th, 2008, 18:12   #9 (permalink)
Oak
 
Join Date: Dec 2007
Location: London
Age: 35
Posts: 266
Re: Adivse on robots.txt

hmm

the code bellow was taken from http://www.robotstxt.org/robotstxt.html

HTML: Select all
[b]To allow a single robot[/b]

  User-agent: Google
Disallow:

User-agent: *
Disallow: /
Oak is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
Old Feb 14th, 2008, 18:18   #10 (permalink)
Chief Moderator
 
aso186's Avatar
 
Join Date: Oct 2007
Location: UK
Posts: 714
Blog Entries: 2
Send a message via Skype™ to aso186
Re: Adivse on robots.txt

Using Google's robots.txt checker, it confirms that Google is not registered by the Googlebot.

You must use Googlebot in order for it to work. Otherwise, it doesn't matter which order the statements are.
__________________

aso186 is online now  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
Old Feb 14th, 2008, 18:20   #11 (permalink)
Oak
 
Join Date: Dec 2007
Location: London
Age: 35
Posts: 266
Re: Adivse on robots.txt

Ok, you understand it better than I do!
Oak is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
Old Feb 15th, 2008, 09:22   #12 (permalink)
Oak
 
Join Date: Dec 2007
Location: London
Age: 35
Posts: 266
Re: Adivse on robots.txt

That's done now. Cheers!

Are there any other bots that should be allowed?
Oak is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
Old Feb 15th, 2008, 11:20   #13 (permalink)
Moderator
 
spinal007's Avatar
 
Join Date: Mar 2004
Location: Good Ol'London
Age: 22
Posts: 1,609
Blog Entries: 1
Send a message via ICQ to spinal007 Send a message via MSN to spinal007 Send a message via Yahoo to spinal007 Send a message via Skype™ to spinal007
Re: Adivse on robots.txt

Ok, my 2 cents:
robots.txt is a convention. robots.txt is not a way of protecting your website, it's just a way of telling bots how you'd like them to behave. but there's no guarantee that they'll obey it.

If you want to stop bad bots, use server-side scripting. check the user agent, respond with 404 if you don't like it. but even then, some bad bots will identify themselves as a browser, so you'll never know....
__________________
Diego - SEO Consultant London (My Blog | Fight Me)
jQuery: Star Rating - Multiple File Upload - FCKEditor/Codepress
Before we work on artificial intelligence why don't we do something about natural stupidity?
spinal007 is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
Old Feb 15th, 2008, 20:06   #14 (permalink)
Up'n'Coming Member
 
Join Date: Oct 2006
Location: Durham
Posts: 72
Send a message via MSN to andyf
Re: Adivse on robots.txt

It's very difficult to classify bad/good bots, some bots just crawl your site to steal your content or emai ID or some for helping you for indexing your content for SE's.
You have to study your log file for deciding who are helpful bots and accordingly should are allowed.
Not all bots obey robots.txt file
__________________
cPanel Hosting VPS Hosting
andyf is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
Old Feb 16th, 2008, 12:09   #15 (permalink)
Oak
 
Join Date: Dec 2007
Location: London
Age: 35
Posts: 266
Re: Adivse on robots.txt

Cheers, it's becoming more clear!

Quote:
If you want to stop bad bots, use server-side scripting. check the user agent, respond with 404 if you don't like it. but even then, some bad bots will identify themselves as a browser, so you'll never know....
Ok well I would not know how to set this up .

Quote:
You have to study your log file for deciding who are helpful bots and accordingly should are allowed.
Not all bots obey robots.txt file
Log file?
Oak is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
Old Feb 16th, 2008, 12:26   #16 (permalink)
Moderator
 
spinal007's Avatar
 
Join Date: Mar 2004
Location: Good Ol'London
Age: 22
Posts: 1,609
Blog Entries: 1
Send a message via ICQ to spinal007 Send a message via MSN to spinal007 Send a message via Yahoo to spinal007 Send a message via Skype™ to spinal007
Re: Adivse on robots.txt

My point is: what you are trying to do requires a lot of work if you want to make it reliable - otherwise it's just a waste of time on something that doesn't actually work. And, in most cases, you cannot stop anyone from crawling your pages. Unless you password protect them.


But why do you want to do this anyway? What's the issue?
__________________
Diego - SEO Consultant London (My Blog | Fight Me)
jQuery: Star Rating - Multiple File Upload - FCKEditor/Codepress
Before we work on artificial intelligence why don't we do something about natural stupidity?
spinal007 is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
Old Feb 16th, 2008, 15:12   #17 (permalink)
Oak
 
Join Date: Dec 2007
Location: London
Age: 35
Posts: 266
Re: Adivse on robots.txt

Ok, I take your meaning.

I thought that you were meant to specify search engines that you want to search your site, so I did that for Google bot, but that's it. So no other reason than that.
Oak is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
Old Feb 16th, 2008, 15:23   #18 (permalink)
Moderator
 
spinal007's Avatar
 
Join Date: Mar 2004
Location: Good Ol'London
Age: 22
Posts: 1,609
Blog Entries: 1
Send a message via ICQ to spinal007 Send a message via MSN to spinal007 Send a message via Yahoo to spinal007 Send a message via Skype™ to spinal007
Re: Adivse on robots.txt

So, to sum up:
The purpose of robots.txt is to ask a bot not to index your site.
You don't need this if you want a bot to index your site.
And if you didn't, this wouldn't work anyway.

Case closed!
__________________
Diego - SEO Consultant London (My Blog | Fight Me)
jQuery: Star Rating - Multiple File Upload - FCKEditor/Codepress
Before we work on artificial intelligence why don't we do something about natural stupidity?
spinal007 is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!