+ Reply to Thread
Results 1 to 9 of 9

Thread: Deal With Spybots, Spambots and Scrapers

  1. #1
    rockee is offline x10 Sophmore rockee is an unknown quantity at this point
    Join Date
    Mar 2008
    Posts
    120

    Lightbulb Deal With Spybots, Spambots and Scrapers

    THE HTACCESS APPROACH
    I was going to write a comprehensive tutorial on the subject of banning scrapers, spy bots and other misbehaving robots that either don't follow the robots.text file instruction, completely ignore the robots.txt file altogether or read it then completely ignore it anyway and continue to scrape and spider your web site.

    Clearly there are many such tutorials on the Internet and a Google search using all these key words below at once in a search query found more info than I could post here (the forum has a size limitation on posts):

    bad bots spam bots online downloaders

    There are many pages on this search to look through so don't just read the first page only as you may find some more goodies like a useful PHP-Nuke module, spider traps and country specific bad bots.

    The most likely and seemingly easiest to follow, and which would have been very much the format I would have used here, is located at these addresses:

    How to block spambots, ban spybots, and tell unwanted robots to go to hell.

    List of Bad Bots

    Blocking Bad Bots and Scrapers with .htaccess

    htaccess guide - Blocking offline browsers and 'bad bots'


    The list of sites to check out is many and varied but they all have very useful information if you want to squish some of these server hogging pests.


    Because this is not a new subject you may find some of the information and sites a bit dated but the principle is worthy and especially if you can find some more recent list of bad bots or you are a prolific reader and analyzer of your site's log files.

    One thing to remember also is that some of these bad bots actually hijack web sites and servers (zombies) so they can masquerade the Internet at will - log file analysis will allow you to perhaps spot these and maybe a common reference point that will allow an effective .htaccess entry.


    THE ROBOTS TEXT FILE APPROACH
    An alternative to the use of the .htaccess file is the robots.txt file but as I have outlined above its use is only relevant and effective if these bad bots read the dang thing and follow your instructions - most don't.


    A good place to start and the authority on all matter relating to the robots.txt file is located here:

    The Web Robots Pages at robotstxt.org

    They include pages about these items below plus much more and they use a Previous/Next type of navigation system for easier reading and understanding:
    • database of robots
    • robots.txt file checker
    • robot related meta tag information
    • IP look up
    • how to get the best listing in search engines
    The above site is worth a visit so you can get a handle on this robots.txt file and use it to your best advantage.

    Here is a useful link to Wikipedia relating to BotNet


    I hope this article will be of use and please post back if you can add to it with your own current lists of known mischievous robots and experiences.

    Regards,
    Rocky

  2. #2
    CoolFinalFan's Avatar
    CoolFinalFan is offline x10 Lieutenant CoolFinalFan is an unknown quantity at this point
    Join Date
    Oct 2005
    Location
    Myrtle Beach, SC USA
    Posts
    311

    Thumbs up Re: Deal With Spybots, Spambots and Scrapers

    hey thanks for the FYI here!



  3. #3
    tittat's Avatar
    tittat is offline x10 Spammer tittat is an unknown quantity at this point
    Join Date
    Sep 2007
    Location
    Kerala,India
    Posts
    2,479

    Re: Deal With Spybots, Spambots and Scrapers

    One doubt... not related to this topic.

    i have my .htaccess file with rewrites and a lot of other stuffs.
    My question is if i have my .htaccess file too bigger,will that affect my "sites response time"?
    PLAY ONLINE GAMES
    WWW.TMONDO.COM PlayFar Flash Games
    Former X10 Forum Senior Moderator(Retired)


  4. #4
    rockee is offline x10 Sophmore rockee is an unknown quantity at this point
    Join Date
    Mar 2008
    Posts
    120

    Re: Deal With Spybots, Spambots and Scrapers

    You would not notice any overhead from a large .htaccess file doing mod_rewrites or doing any of it's tasks - it is only a folder by folder very tiny extension of the server's httpd.conf file anyway, imagine the size of a hosting company like X10 Hosting and the huge server configuration files it uses, but you would not notice much overhead at the browser level at all from those conf files being parsed.

    My .htaccess file is huge by normal standards and contains 80% mod_rewrites and there is no noticeable overhead, and in any case how would you measure that latency, if there is any at all?

    The .htaccess file even with many entries and jobs to do is usually much less than 10k, most less even than 1k, and compared with a 30k web page or a 60k graphic image being served this .htaccess file would use only a flea bite of the server's resources in comparison.

    Regards,
    Rocky

  5. #5
    tittat's Avatar
    tittat is offline x10 Spammer tittat is an unknown quantity at this point
    Join Date
    Sep 2007
    Location
    Kerala,India
    Posts
    2,479

    Re: Deal With Spybots, Spambots and Scrapers

    You would not notice any overhead from a large .htaccess
    thanxs rockee, this is what i wish to hear......


    Any others have different opinion?
    PLAY ONLINE GAMES
    WWW.TMONDO.COM PlayFar Flash Games
    Former X10 Forum Senior Moderator(Retired)


  6. #6
    rockee is offline x10 Sophmore rockee is an unknown quantity at this point
    Join Date
    Mar 2008
    Posts
    120

    Re: Deal With Spybots, Spambots and Scrapers

    If you want a definitive answer or an informed opinion, then you should post your question in a forum where the tech support staff frequent most, as they are the only people at X10 Hosting that can give you the correct answer in relation to their servers.

    The parsing of .htaccess files in service by clients on my servers, before I retired, did not noticeably affect those servers - what did affect the servers was all massive amount of needless traffic from the bad bots and scrapers, which the .htaccess files and the measure in place at the servers effectively reduced.

    Regards,
    Rocky

  7. #7
    tittat's Avatar
    tittat is offline x10 Spammer tittat is an unknown quantity at this point
    Join Date
    Sep 2007
    Location
    Kerala,India
    Posts
    2,479

    Re: Deal With Spybots, Spambots and Scrapers

    whenever i read the comments of rockee i am forced to give him reputation points...and i did...
    i will say
    rockee will become famous soon.
    Regards,
    Subeesh
    PLAY ONLINE GAMES
    WWW.TMONDO.COM PlayFar Flash Games
    Former X10 Forum Senior Moderator(Retired)


  8. #8
    rockee is offline x10 Sophmore rockee is an unknown quantity at this point
    Join Date
    Mar 2008
    Posts
    120

    Re: Deal With Spybots, Spambots and Scrapers

    Thank you kindly Subeesh, I can appreciate you hunger for knowledge as I too have been there and done that, but for the life of me, I still can't satisfy my hunger. ;)

    Kindest regards and best wishes always,
    Rocky

  9. #9
    Zangetsu's Avatar
    Zangetsu is offline x10 Lieutenant Zangetsu is an unknown quantity at this point
    Join Date
    Mar 2008
    Location
    somewhere out there
    Posts
    491

    Re: Deal With Spybots, Spambots and Scrapers

    cool, but is there also a way to get rid of those spiders ?
    please help me out and register Here


    :hsughr:This section is in need of attention from an expert on the subject.:hsughr:

+ Reply to Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
x10hosting free hosting for the masses
dedicated servers