+ Reply to Thread
Page 1 of 2 12 LastLast
Results 1 to 10 of 19

Thread: What is a robot.txt ??

  1. #1
    callumacrae's Avatar
    callumacrae is offline not alex mac callumacrae is just really nice
    Join Date
    Dec 2007
    Location
    Wellesbourne, England
    Posts
    5,161

    What is a robot.txt ??

    I've heard that I need a robot.txt to get the whole of my site on google and not just the first page, but what is a robot.txt and how does it work?
    I can customise your phpBB board. Send me a PM.
    lynxphp - info, tutorials and scripts
    "A forum post should be like a skirt; long enough to cover the subject but short enough to keep things interesting."

  2. #2
    ttony21's Avatar
    ttony21 is offline x10Hosting Member ttony21 is an unknown quantity at this point
    Join Date
    Feb 2008
    Location
    New York
    Posts
    7

    Re: What is a robot.txt ??

    Here's an example of a robots.txt file http://x10hosting.com/robots.txt

    It's basicaly just a file that stops "robots" from accessing pages on your site and adding them to a search engine(considering the robot complies with this)

    Basically a robots.txt file with no security would just say this
    User-agent: *
    Disallow:

    That allows any robot to visit any file on your website (though I don't think you need robots.txt for this, robots.txt is meant more for protecting your web pages that you don't want to be public)

    Edit: Just to clarify, a "robot" is a program that crawls around the internet and are used for several different things, this is called web spidering and search engines like google use them to find new websites and add them, technically you don't need to do any work to get a search engine to grab your website, thats why they came out with robots.txt to create rules that web robots are SUPPOSED to obey
    Last edited by ttony21; 03-02-2008 at 12:08 PM.

  3. #3
    intenex is offline x10 Sophmore intenex is an unknown quantity at this point
    Join Date
    Feb 2008
    Posts
    194

    Re: What is a robot.txt ??

    Yeah, I don't think you need a robots text to let robots search your site...Google frankly doesn't care about your privacy =p. They crawled my site within minutes of me setting it up.

    ______

    BlackQuantum





  4. #4
    Sohail's Avatar
    Sohail is offline x10 Spammer Sohail is an unknown quantity at this point
    Join Date
    Sep 2007
    Location
    London, UK
    Posts
    3,052

    Re: What is a robot.txt ??

    Yeah it's simply a file that controls the way a "spider" crawls your website. But that's true, Google would probably ignore it anyway :P.

  5. #5
    Smith6612's Avatar
    Smith6612 is offline <<< wants a Turkey Smith6612 has a spectacular aura about
    Join Date
    Dec 2007
    Location
    Exploded
    Posts
    6,483

    Re: What is a robot.txt ??

    Quote Originally Posted by intenex View Post
    Yeah, I don't think you need a robots text to let robots search your site...Google frankly doesn't care about your privacy =p. They crawled my site within minutes of me setting it up.
    Oh, Google does care, their search engine just tends to be lazy/buggy some times. Otherwise, other than using a robots file to tell bots what and what not to look for and what bots can look around, there is also a way in robots.txt files if you know the syntax to tell robots to have a spider delay in seconds (if your site is VERY busy and the bots are slowing you up).

    My signature likes cookies! Do you? :D

  6. #6
    ttony21's Avatar
    ttony21 is offline x10Hosting Member ttony21 is an unknown quantity at this point
    Join Date
    Feb 2008
    Location
    New York
    Posts
    7

    Post Re: What is a robot.txt ??

    Oh, lol I realize the author of this post probably forgot that they posted this or something like that but in case they do come back to look at it I found something else interesting in the x10hosting ftp, the default robots.txt file for each user is this:

    User-agent: *
    Crawl-delay: 10

    Notice the crawl-delay that Smith mentioned

  7. #7
    Smith6612's Avatar
    Smith6612 is offline <<< wants a Turkey Smith6612 has a spectacular aura about
    Join Date
    Dec 2007
    Location
    Exploded
    Posts
    6,483

    Re: What is a robot.txt ??

    Yes! That's it. That's very useful if you have a very busy site and have loads of bots popping in every second, and you don't want resources being hogged by bots. It's a good idea to use the delay on free hosts with a massive amount of accounts on servers as well, as some search engines like Yahoo are known to crawl sites every second sometimes. I've had Yahoo most recently last week do that to my web server where every second for a half hour it was loading up some page on one of the sites I host here. It wasn't a problems as hardly anyone visits these sites, but if I hosted some busy sites, then that'd be a pretty big problem.

    My signature likes cookies! Do you? :D

  8. #8
    masshuu's Avatar
    masshuu is offline Head of the Geese masshuu has a spectacular aura about
    Join Date
    Oct 2007
    Location
    Las Colinas, Tx
    Posts
    2,262

    Re: What is a robot.txt ??

    also note if you don't want a bot or anyone else accessing a directory
    like one from the x10hosting file :
    Code:
    Disallow: /oldhidden
    make sure that that directory is not accessable by the genral public also
    like if you actually go there youll get an error,
    http://x10hosting.com/oldhidden

    ive seen some people who add a Disallow in a robot file to keep robots from indexing critdical direcotrys, but you could still go to them and view them
    and as someone else said, some robots don't even look at the robot.txt, so they can still index the directory
    Just leading the flock.
    Livewire
    Masshuu ------ carl6969
    descalzo ------------------- Smith6612
    Bryon--------------------------------- Corey
    If you find any post helpful or useful, duck
    \ / This for that post and rep it up.

  9. #9
    AutoItKing's Avatar
    AutoItKing is offline x10Hosting Member AutoItKing is an unknown quantity at this point
    Join Date
    Feb 2008
    Posts
    32

    Re: What is a robot.txt ??

    I have actually found that most of the time Google actually follows the rules, most of the time. But like everyone before me has said, robots are programs that crawl the web and find sites to add to their database of millions upon millions of already added sites.

  10. #10
    callumacrae's Avatar
    callumacrae is offline not alex mac callumacrae is just really nice
    Join Date
    Dec 2007
    Location
    Wellesbourne, England
    Posts
    5,161

    Re: What is a robot.txt ??

    Quote Originally Posted by ttony21 View Post
    Oh, lol I realize the author of this post probably forgot that they posted this or something like that but in case they do come back to look at it I found something else interesting in the x10hosting ftp, the default robots.txt file for each user is this:

    User-agent: *
    Crawl-delay: 10

    Notice the crawl-delay that Smith mentioned
    Would the ten be seconds or miliseconds?

    And I did remember, but I was trying to fix my site, which broke
    I can customise your phpBB board. Send me a PM.
    lynxphp - info, tutorials and scripts
    "A forum post should be like a skirt; long enough to cover the subject but short enough to keep things interesting."

+ Reply to Thread
Page 1 of 2 12 LastLast

Similar Threads

  1. [REQ] Robot.txt script made 10points for it
    By blackroselove in forum The Marketplace
    Replies: 7
    Last Post: 04-25-2006, 08:14 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
x10hosting free hosting for the masses
dedicated servers