+ Reply to Thread
Results 1 to 8 of 8

Thread: Robots.txt ~tutorial

  1. #1
    lair360 is offline x10 Sophmore lair360 is an unknown quantity at this point
    Join Date
    Dec 2008
    Posts
    200

    Post Robots.txt ~tutorial

    Version: 15.2
    Revision: 65 Build 32

    Robots.txt ~tutorial

    Introduction:
    this tutorial will help you to block a specific robot or multiple robots from indexing your files, folders, documents and other private extension. It will also reduce the risk of private data from being seen or collected by "search engines" .

    "Robots.txt" is a regular text file. It also has special meanings to the majority of "honourable" robots on the web. By defining a few rules in the text file, you can instruct or command robots to stop crawling and indexing certain files, directories within your site, or none at all. For example, you may not want "Google" to crawl the "/images" directory of your site, as it's both meaningless to you and a waste of your site's bandwidth. "Robots.txt" lets you tell Google just that...

    Notes: before you create a regular "text file" called "robots.txt", you must make sure it's named exactly as its written! This file must also be uploaded to the root (accessible) directory of your site, not a subdirectory...

    Example: http://www.mysite.com but NOT http://www.mysite.com/sub_folder/

    Syntax
    ----------------------------------------


    User-agent - the robots and the following rule applies to...
    Disallow - the URL you want to block...

    ----------------------------------------

    1.] To block all robots from looking at everything and crawl your website, you can use this following codes.
    ---Copy Source Code---
    Code:
    User-agent: *
    Disallow: /
    ---End Source Code---

    2.] To block a directory and everything in it, you can use this following codes.
    ---Copy Source Code---
    Code:
    User-agent: *
    Disallow: /random-directory-one/
    Disallow: /random-directory-one/random-directory-two/
    ---End Source Code---

    3.] To block a page, just list the page that you want to block.
    ---Copy Source Code---
    Code:
    User-agent: *
    Disallow: /private_file.html
    Disallow: /random-directory-one/style.css
    ---End Source Code---

    4.] To remove a specific image from a search image engine, add the following codes.
    ---Copy Source Code---
    Code:
    User-agent: Googlebot-Image
    Disallow: /image1.gif
    Disallow: /random-directory-one/image2.png
    ---End Source Code---

    5.] To remove all images on your site, just this source code as an example
    ---Copy Source Code---
    Code:
    User-agent: *
    Disallow: /image_folder/
    ---End Source Code---

    6.] To block files of a specific extension, just use this example.
    ---Copy Source Code---
    Code:
    User-agent: *
    Disallow: /*.gif$
    Disallow: /*.jpeg$
    Disallow: /image_folder/*.png$
    Disallow:  /image_folder/*.jpeg$
    ---End Source Code---

    7.] To prevent pages on your site from being crawled, while still displaying on other search engines, you'll need to use this example...
    ---Copy Source Code---
    Code:
    User-agent: *
    Disallow: /folder1/
    
    User-agent: Google
    Allow: /folder1/
    ---End Source Code---

    8.] To match a sequence of characters, use an asterisk [*]. For example, to block access to all subdirectories that begin with "file_directories".
    ---Copy Source Code---
    Code:
    User-agent: Googlebot
    Disallow: /file_directories*/
    ---End Source Code---

    9.] To specify matching the end of a URL, you'll need to use $ symbols. For instance, to block any URLs that end with .zip...
    ---Copy Source Code---
    Code:
    User-agent: Googlebot 
    Disallow: /*.zip$
    ---End Source Code---

    10.] You can conditionally target multiple robots in "robots.txt." For instance, you want to block all search engines and only allow Google to index or crawl your website without looking at "cgi-bin" and "privatedir".
    ---Copy Source Code---
    Code:
    User-agent: *
    Disallow: /
    User-agent: Googlebot
    Disallow: /cgi-bin/
    Disallow: /privatedir/
    ---End Source Code---

    11.] To block multiple extention, you can use this example...
    --Copy Source Code---
    Code:
    User-agent: *
    Disallow: /*.xls$
    Disallow: /*.gif$
    Disallow: /*.jpg$
    Disallow: /*.jpeg$
    Disallow: /*.pdf$
    Disallow: /*.rar$
    Disallow: /*.zip$
    ---End Source Code---
    Copyright 2008 ~Lair360

  2. #2
    dbojan's Avatar
    dbojan is offline x10Hosting Member dbojan is an unknown quantity at this point
    Join Date
    Nov 2008
    Location
    Bosnia and Herzegovina
    Posts
    99

    Re: Robots.txt ~tutorial

    Thank you lair360. This is a great tutorial.

  3. #3
    lair360 is offline x10 Sophmore lair360 is an unknown quantity at this point
    Join Date
    Dec 2008
    Posts
    200

    Smile Re: Robots.txt ~tutorial

    Quote Originally Posted by dbojan View Post
    Thank you lair360. This is a great tutorial.
    Thank you very much for your generous support!

    Best regards,
    Lair360

  4. #4
    DarkDragonLord's Avatar
    DarkDragonLord is offline x10 Elder DarkDragonLord is an unknown quantity at this point
    Join Date
    Mar 2007
    Location
    Brazil
    Posts
    782

    Re: Robots.txt ~tutorial

    Great tutorial!

    +rep
    Last edited by DarkDragonLord; 01-21-2009 at 11:15 AM.
    Regards,
    Raphael DDL

    Designing Solutions for You
    *Web Design;
    *Coding;
    Free Downloads;
    and all related Stuff
    .


    My Tutorials:
    | Multi-Language Websites | Rotative Banners |
    | Bookmark Script for All Browsers
    |
    |
    PHP Switching/Including Content|
    |


  5. #5
    lair360 is offline x10 Sophmore lair360 is an unknown quantity at this point
    Join Date
    Dec 2008
    Posts
    200

    Re: Robots.txt ~tutorial

    Quote Originally Posted by DarkDragonLord View Post
    Great tutorial!

    +rep
    Thank you very much for your support!
    You have given me a key to break the boundaries between knowledge and support!

  6. #6
    zer0ne1337's Avatar
    zer0ne1337 is offline x10Hosting Member zer0ne1337 is an unknown quantity at this point
    Join Date
    Nov 2008
    Location
    Zion City
    Posts
    84

    Re: Robots.txt ~tutorial

    Thanks lair360, it is a very helpful tutorial!

  7. #7
    RRJJMM is offline x10Hosting Member RRJJMM is an unknown quantity at this point
    Join Date
    May 2008
    Location
    Ohio
    Posts
    42

    Re: Robots.txt ~tutorial

    Thanks for the information. This is good stuff for us control freaks that like to "pull the shades" every now and then.

    Cheers,
    RJM

  8. #8
    lair360 is offline x10 Sophmore lair360 is an unknown quantity at this point
    Join Date
    Dec 2008
    Posts
    200

    Re: Robots.txt ~tutorial

    Quote Originally Posted by RRJJMM View Post
    Thanks for the information. This is good stuff for us control freaks that like to "pull the shades" every now and then.

    Cheers,
    Thank you very much for your feedback! However, the "robots.txt" is very powerful, you'll have to be very careful when you assign something to order the robots...

+ Reply to Thread

Similar Threads

  1. Google has started to ignore my robots.txt
    By galaxyAbstractor in forum Off Topic
    Replies: 8
    Last Post: 05-22-2008, 12:10 AM
  2. robots.txt
    By Xiong in forum Free Hosting
    Replies: 1
    Last Post: 05-18-2008, 01:29 AM
  3. Robots.txt
    By dale.black in forum Free Hosting
    Replies: 14
    Last Post: 08-14-2007, 08:10 AM
  4. robots.txt ??
    By mikel2k3 in forum Scripts & 3rd Party Apps
    Replies: 11
    Last Post: 04-02-2007, 10:07 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
x10hosting free hosting for the masses
dedicated servers