+ Reply to Thread
Page 1 of 2 12 LastLast
Results 1 to 10 of 12

Thread: robots.txt ??

  1. #1
    mikel2k3 is offline x10 Lieutenant mikel2k3 is an unknown quantity at this point
    Join Date
    Aug 2005
    Location
    West Yorkshire - UK
    Posts
    374

    robots.txt ??

    heya...

    How many of you contain a robots.txt file for your websites?

    And do i need it? i seem to be getting a lot of trhem visiting and im confused about it.

    I have no idea what to write in the .txt file so if anybody cvould tell me or help me out with that, it would be great.

    thanks, mike
    -----------------------------------
    -----------------------------------

    http://www.DistrasDesigns.com

    -----------------------------------
    -----------------------------------

  2. #2
    Chris Z's Avatar
    Chris Z is offline x10 Spammer Chris Z is an unknown quantity at this point
    Join Date
    Sep 2005
    Location
    Alabama, USA
    Posts
    2,802

    Re: robots.txt ??

    I'm not really sure about the syntax for the file. But I'm pretty sure that it's just a file that allows or disallows the specified robots.
    -Chris Z
    Retired Account Manager


  3. #3
    t2t2t's Avatar
    t2t2t is offline x10 Elder t2t2t is an unknown quantity at this point
    Join Date
    Sep 2006
    Location
    Europe, Estonia
    Posts
    690

    Re: robots.txt ??

    Heres my robots.txt for one of my sites:

    Code:
    User-agent: * 
    Disallow: /admin/ 
    Disallow: /contrib/ 
    Disallow: /doc/ 
    Disallow: /lib/ 
    Disallow: /modules/ 
    Disallow: /plugins/ 
    Disallow: /scripts/ 
    Disallow: /tmp/
    Robots.txt syntax
    This post has been marked spam 52 times.


  4. #4
    Cubeform is offline x10 Lieutenant Cubeform is an unknown quantity at this point
    Join Date
    Aug 2006
    Location
    127.0.0.1
    Posts
    339

    Re: robots.txt ??

    Here is a good page on how to author Robots.txt files and what they are:
    http://www.robotstxt.org/wc/robots.html

    Note you have to place it at the root of your directory (in the public_html folder).

    The CMS I use comes with a Robots.txt. It's quite long, so here's part of it:
    Code:
    User-agent: *
    Crawl-delay: 10
    # Directories
    Disallow: /database/
    Disallow: /includes/
    Disallow: /misc/
    Disallow: /modules/
    Disallow: /sites/
    Disallow: /themes/
    Disallow: /scripts/
    Disallow: /updates/
    Disallow: /profiles/
    # Files
    Disallow: /xmlrpc.php
    Disallow: /cron.php
    Disallow: /update.php
    Disallow: /install.php
    Disallow: /INSTALL.mysql.txt
    Disallow: /INSTALL.pgsql.txt
    Disallow: /CHANGELOG.txt
    Disallow: /MAINTAINERS.txt
    Disallow: /LICENSE.txt
    Disallow: /UPGRADE.txt
    
    # I've cut off the rest of the file from this point forward #
    CUBEFORM
    XHTML | CSS | PHP | JavaScript
    THIS WEEK


  5. #5
    mikel2k3 is offline x10 Lieutenant mikel2k3 is an unknown quantity at this point
    Join Date
    Aug 2005
    Location
    West Yorkshire - UK
    Posts
    374

    Re: robots.txt ??

    ok thanks for the help...

    what id like to know really, is do i really, really need a robots.txt file?
    -----------------------------------
    -----------------------------------

    http://www.DistrasDesigns.com

    -----------------------------------
    -----------------------------------

  6. #6
    Chris Z's Avatar
    Chris Z is offline x10 Spammer Chris Z is an unknown quantity at this point
    Join Date
    Sep 2005
    Location
    Alabama, USA
    Posts
    2,802

    Re: robots.txt ??

    You don't absolutely need one. But it limits what the robots can read. So if you want them to be able index all of your directories, delete the robots.txt.
    -Chris Z
    Retired Account Manager


  7. #7
    Micro is offline Retired staff (11-12-2008) Micro is an unknown quantity at this point
    Join Date
    Jul 2006
    Location
    West Midlands
    Posts
    1,301

    Re: robots.txt ??

    Be warned though that some bots do not comply (Or even read) the robots.txt file. So dont use it for security from bots...
    Micro

  8. #8
    dest581 is offline x10 Lieutenant dest581 is an unknown quantity at this point
    Join Date
    Sep 2006
    Posts
    348

    Re: robots.txt ??

    robots.txt isn't useful for anything but controlling what search engines index. Beyond that, it's useless.

  9. #9
    Cubeform is offline x10 Lieutenant Cubeform is an unknown quantity at this point
    Join Date
    Aug 2006
    Location
    127.0.0.1
    Posts
    339

    Re: robots.txt ??

    Quote Originally Posted by dest581 View Post
    robots.txt isn't useful for anything but controlling what search engines index. Beyond that, it's useless.
    Robots.txt doesn't really control, either--like Micro pointed out, it is only a suggestion.

    For those who do take the suggestion, it's great if a particular search bot starts bombarding your site with requests; you can just block it with robots.txt.
    Last edited by Cubeform; 03-24-2007 at 06:39 PM.
    CUBEFORM
    XHTML | CSS | PHP | JavaScript
    THIS WEEK


  10. #10
    mikel2k3 is offline x10 Lieutenant mikel2k3 is an unknown quantity at this point
    Join Date
    Aug 2005
    Location
    West Yorkshire - UK
    Posts
    374

    Re: robots.txt ??

    i just used a robot.txt generator thing and the results came up with this:

    HTML Code:
    User-agent: Googlebot
    Disallow: 
    User-agent: Googlebot-Image
    Disallow: 
    User-agent: MSNBot
    Disallow: 
    User-agent: Slurp
    Disallow: 
    User-agent: Teoma
    Disallow: /
    User-agent: Gigabot
    Disallow: /
    User-agent: Scrubby
    Disallow: 
    User-agent: Robozilla
    Disallow: /
    User-agent: Nutch
    Disallow: /
    User-agent: ia_archiver
    Disallow: /
    User-agent: baiduspider
    Disallow: /
    User-agent: yahoo-mmcrawler
    Disallow: /
    User-agent: psbot
    Disallow: /
    User-agent: asterias
    Disallow: /
    User-agent: yahoo-blogs/v3.9
    Disallow: /
    User-agent: *
    Disallow: 
    Crawl-delay: 5
    Disallow: /cgi-bin/
    Disallow: /
    All this ok? or is there something simpler i could use?

    And would it be a bad thing if i just Dis-Alloud ALL robots??
    -----------------------------------
    -----------------------------------

    http://www.DistrasDesigns.com

    -----------------------------------
    -----------------------------------

+ Reply to Thread
Page 1 of 2 12 LastLast

Similar Threads

  1. msn bot
    By bigguy in forum Free Hosting
    Replies: 6
    Last Post: 02-28-2006, 01:38 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
x10hosting free hosting for the masses
dedicated servers