| Welcome to Webforumz.com. |
|
Feb 14th, 2008, 05:40
|
#1 (permalink)
|
Join Date: Dec 2007
Location: London
Age: 35
Posts: 266
|
Advice on robots.txt
Hi,
Please give advise about adding a robots.txt file to your website directory and search engines.
Thanks

Last edited by saltedm8; Feb 15th, 2008 at 22:52.
Reason: title correction
|
|
|
Feb 14th, 2008, 08:44
|
#2 (permalink)
|
|
Highly Reputable Member
Join Date: Oct 2007
Location: Stockport
Age: 16
Posts: 738
|
Re: Adivse on robots.txt
i think this will help you
http://www.robotstxt.org/
craig
|
|
|
Feb 14th, 2008, 16:16
|
#3 (permalink)
|
Join Date: Dec 2007
Location: London
Age: 35
Posts: 266
|
Re: Adivse on robots.txt
Yes I have been on that page but I am still not sure. What is best practice, for allowing and disallowing?
Will this do?
Thanks
|
|
|
Feb 14th, 2008, 16:25
|
#4 (permalink)
|
|
Moderator
Join Date: Mar 2004
Location: Good Ol'London
Age: 22
Posts: 1,609
|
Re: Adivse on robots.txt
The above will stop every search engine from indexing your entire website.
It will work if that's what you want.
|
|
|
Feb 14th, 2008, 16:35
|
#5 (permalink)
|
Join Date: Dec 2007
Location: London
Age: 35
Posts: 266
|
Re: Adivse on robots.txt
Oh.. I thought that would stop all search engines except for Google. Please can you recommend best procedure. I mean obviously I don't want spam or hackers, what should I do?
|
|
|
Feb 14th, 2008, 16:59
|
#6 (permalink)
|
|
Chief Moderator
Join Date: Oct 2007
Location: UK
Posts: 714
|
Re: Adivse on robots.txt
If you're trying to protect against spam-bots, robots.txt isn't going to help you, since most evils will ignore it anyway
To only allow Google (out of all the 'honest' bots), swap the two statements you have, so that you first declare all user-agents disallowed, then overwrite the permissions for Google.
__________________
|
|
|
Feb 14th, 2008, 17:07
|
#7 (permalink)
|
Join Date: Dec 2007
Location: London
Age: 35
Posts: 266
|
Re: Adivse on robots.txt
Quote:
|
To only allow Google (out of all the 'honest' bots), swap the two statements you have, so that you first declare all user-agents disallowed, then overwrite the permissions for Google.
|
like this?
- HTML: Select all
# robots.txt for http://www.mysite.com
User-agent: *
Disallow:/
User-agent: Google
Disallow:
* refers to all ?
Last edited by Oak; Feb 14th, 2008 at 17:45.
|
|
|
Feb 14th, 2008, 18:06
|
#8 (permalink)
|
|
Chief Moderator
Join Date: Oct 2007
Location: UK
Posts: 714
|
Re: Adivse on robots.txt
Yeah. But you might want to wait for someone else to confirm this (I'm not 100% sure).
EDIT: Just checked it with Google Webmaster tools. All's good except you need to use Googlebot instead of just 'Google'.
__________________
Last edited by aso186; Feb 14th, 2008 at 18:09.
|
|
|
Feb 14th, 2008, 18:18
|
#10 (permalink)
|
|
Chief Moderator
Join Date: Oct 2007
Location: UK
Posts: 714
|
Re: Adivse on robots.txt
Using Google's robots.txt checker, it confirms that Google is not registered by the Googlebot.
You must use Googlebot in order for it to work. Otherwise, it doesn't matter which order the statements are.
__________________
|
|
|
Feb 14th, 2008, 18:20
|
#11 (permalink)
|
Join Date: Dec 2007
Location: London
Age: 35
Posts: 266
|
Re: Adivse on robots.txt
Ok, you understand it better than I do!
|
|
|
Feb 15th, 2008, 09:22
|
#12 (permalink)
|
Join Date: Dec 2007
Location: London
Age: 35
Posts: 266
|
Re: Adivse on robots.txt
That's done now. Cheers!
Are there any other bots that should be allowed?
|
|
|
Feb 15th, 2008, 11:20
|
#13 (permalink)
|
|
Moderator
Join Date: Mar 2004
Location: Good Ol'London
Age: 22
Posts: 1,609
|
Re: Adivse on robots.txt
Ok, my 2 cents:
robots.txt is a convention. robots.txt is not a way of protecting your website, it's just a way of telling bots how you'd like them to behave. but there's no guarantee that they'll obey it.
If you want to stop bad bots, use server-side scripting. check the user agent, respond with 404 if you don't like it. but even then, some bad bots will identify themselves as a browser, so you'll never know....
|
|
|
Feb 15th, 2008, 20:06
|
#14 (permalink)
|
|
Up'n'Coming Member
Join Date: Oct 2006
Location: Durham
Posts: 72
|
Re: Adivse on robots.txt
It's very difficult to classify bad/good bots, some bots just crawl your site to steal your content or emai ID or some for helping you for indexing your content for SE's.
You have to study your log file for deciding who are helpful bots and accordingly should are allowed.
Not all bots obey robots.txt file 
|
|
|
Feb 16th, 2008, 12:09
|
#15 (permalink)
|
Join Date: Dec 2007
Location: London
Age: 35
Posts: 266
|
Re: Adivse on robots.txt
Cheers, it's becoming more clear!
Quote:
|
If you want to stop bad bots, use server-side scripting. check the user agent, respond with 404 if you don't like it. but even then, some bad bots will identify themselves as a browser, so you'll never know....
|
Ok well I would not know how to set this up .
Quote:
You have to study your log file for deciding who are helpful bots and accordingly should are allowed.
Not all bots obey robots.txt file
|
Log file?
|
|
|
Feb 16th, 2008, 12:26
|
#16 (permalink)
|
|
Moderator
Join Date: Mar 2004
Location: Good Ol'London
Age: 22
Posts: 1,609
|
Re: Adivse on robots.txt
My point is: what you are trying to do requires a lot of work if you want to make it reliable - otherwise it's just a waste of time on something that doesn't actually work. And, in most cases, you cannot stop anyone from crawling your pages. Unless you password protect them.
But why do you want to do this anyway? What's the issue?
|
|
|
Feb 16th, 2008, 15:12
|
#17 (permalink)
|
Join Date: Dec 2007
Location: London
Age: 35
Posts: 266
|
Re: Adivse on robots.txt
Ok, I take your meaning.
I thought that you were meant to specify search engines that you want to search your site, so I did that for Google bot, but that's it. So no other reason than that.
|
|
|
Feb 16th, 2008, 15:23
|
#18 (permalink)
|
|
Moderator
Join Date: Mar 2004
Location: Good Ol'London
Age: 22
Posts: 1,609
|
Re: Adivse on robots.txt
So, to sum up:
The purpose of robots.txt is to ask a bot not to index your site.
You don't need this if you want a bot to index your site.
And if you didn't, this wouldn't work anyway.
Case closed!
|
|
 | | | |