I've heard that I need a robot.txt to get the whole of my site on google and not just the first page, but what is a robot.txt and how does it work?
I've heard that I need a robot.txt to get the whole of my site on google and not just the first page, but what is a robot.txt and how does it work?
I can customise your phpBB board. Send me a PM.
lynxphp - info, tutorials and scripts
"A forum post should be like a skirt; long enough to cover the subject but short enough to keep things interesting."
Here's an example of a robots.txt file http://x10hosting.com/robots.txt
It's basicaly just a file that stops "robots" from accessing pages on your site and adding them to a search engine(considering the robot complies with this)
Basically a robots.txt file with no security would just say this
User-agent: *
Disallow:
That allows any robot to visit any file on your website (though I don't think you need robots.txt for this, robots.txt is meant more for protecting your web pages that you don't want to be public)
Edit: Just to clarify, a "robot" is a program that crawls around the internet and are used for several different things, this is called web spidering and search engines like google use them to find new websites and add them, technically you don't need to do any work to get a search engine to grab your website, thats why they came out with robots.txt to create rules that web robots are SUPPOSED to obey
Last edited by ttony21; 03-02-2008 at 12:08 PM.
Yeah, I don't think you need a robots text to let robots search your site...Google frankly doesn't care about your privacy =p. They crawled my site within minutes of me setting it up.
______
BlackQuantum
![]()
Yeah it's simply a file that controls the way a "spider" crawls your website. But that's true, Google would probably ignore it anyway :P.
Oh, Google does care, their search engine just tends to be lazy/buggy some times. Otherwise, other than using a robots file to tell bots what and what not to look for and what bots can look around, there is also a way in robots.txt files if you know the syntax to tell robots to have a spider delay in seconds (if your site is VERY busy and the bots are slowing you up).
My signature likes cookies! Do you? :D
Oh, lol I realize the author of this post probably forgot that they posted this or something like that but in case they do come back to look at it I found something else interesting in the x10hosting ftp, the default robots.txt file for each user is this:
User-agent: *
Crawl-delay: 10
Notice the crawl-delay that Smith mentioned
Yes! That's it. That's very useful if you have a very busy site and have loads of bots popping in every second, and you don't want resources being hogged by bots. It's a good idea to use the delay on free hosts with a massive amount of accounts on servers as well, as some search engines like Yahoo are known to crawl sites every second sometimes. I've had Yahoo most recently last week do that to my web server where every second for a half hour it was loading up some page on one of the sites I host here. It wasn't a problems as hardly anyone visits these sites, but if I hosted some busy sites, then that'd be a pretty big problem.
My signature likes cookies! Do you? :D
also note if you don't want a bot or anyone else accessing a directory
like one from the x10hosting file :
make sure that that directory is not accessable by the genral public alsoCode:Disallow: /oldhidden
like if you actually go there youll get an error,
http://x10hosting.com/oldhidden
ive seen some people who add a Disallow in a robot file to keep robots from indexing critdical direcotrys, but you could still go to them and view them
and as someone else said, some robots don't even look at the robot.txt, so they can still index the directory
Just leading the flock.LivewireMasshuu ------ carl6969descalzo ------------------- Smith6612Bryon--------------------------------- CoreyIf you find any post helpful or useful, duck
\ / Thisfor that post and rep it up.
I have actually found that most of the time Google actually follows the rules, most of the time. But like everyone before me has said, robots are programs that crawl the web and find sites to add to their database of millions upon millions of already added sites.
I can customise your phpBB board. Send me a PM.
lynxphp - info, tutorials and scripts
"A forum post should be like a skirt; long enough to cover the subject but short enough to keep things interesting."