+ Reply to Thread
Page 1 of 2 12 LastLast
Results 1 to 10 of 14

Thread: Yahoo is raiding my forum!

  1. #1
    Smith6612's Avatar
    Smith6612 is offline <<< wants a Turkey Smith6612 has a spectacular aura about
    Join Date
    Dec 2007
    Location
    Exploded
    Posts
    6,485

    Yahoo is raiding my forum!

    Look at what Yahoo is doing to my forum. I think I need to change my robots.txt file again.
    Attached Thumbnails Attached Thumbnails Yahoo is raiding my forum!-wtf.png  

    My signature likes cookies! Do you? :D

  2. #2
    dWhite Guest

    Re: Yahoo is raiding my forum!

    That's normal. Search engines have the ability to send out hundreds upon hundreds of spiders at the same time.

  3. #3
    Smith6612's Avatar
    Smith6612 is offline <<< wants a Turkey Smith6612 has a spectacular aura about
    Join Date
    Dec 2007
    Location
    Exploded
    Posts
    6,485

    Re: Yahoo is raiding my forum!

    I know it's normal :P My home server still gets raided by search engines all the time, which is like once a week.

    My signature likes cookies! Do you? :D

  4. #4
    bleachsk is offline x10Hosting Member bleachsk is an unknown quantity at this point
    Join Date
    Sep 2007
    Posts
    50

    Re: Yahoo is raiding my forum!

    thats insane haha

  5. #5
    jodani is offline x10Hosting Member jodani is an unknown quantity at this point
    Join Date
    Apr 2008
    Posts
    2

    Re: Yahoo is raiding my forum!

    That's shocking man
    :eek4:

  6. #6
    rockee is offline x10 Sophmore rockee is an unknown quantity at this point
    Join Date
    Mar 2008
    Posts
    120

    Re: Yahoo is raiding my forum!

    As you know, bandwidth is a precious commodity and not an infinite resource on free X10 Hosting.

    So if you want to conserve some of that bandwidth by limiting your visitors to those humans that read and contribute to your forum, then you have 2 very strong tools to achieve this goal.

    One you have mentioned is the robots.txt file but this is not always adhered to by those rude bots I call them, who, for log reading purposes, get the robots.txt file but then continue to spider your web site regardless of any restrictions they encounter in the robots.txt file, and I can assure you there are lots of those little beasties out there.

    There are many examples out there of how to configure a robots.txt file to Disallow individual robots from accessing your site, also from individual files and folders, frequency of visits and a time gap between GET requests so as not to hog the available ports from your real visitors - if you need help with the robots.txt file then watch out for my next tutorial in the Tutorials Forum or do a Google for robots.txt.

    So to really put the mockers on those or any non human robot who spider your site, then you can use the mod_rewrite directive in a .htaccess file in the web root of your site (public_html).

    Here's how.
    I have included these entries from my own .htaccess that is very successful in keeping my log files and my bandwidth under my control.

    The list is quite comprehensive as it has been created and added to over the years and as such, some may cease to exist and those new faces on the block have yet to be added and will be if and when I see them in my site's log files.

    You can pick and choose and add those that you feel ignore your robots.txt file and add those that don't even bother with the file at all which are mostly the spam bots looking for email addresses to add to a spammers joy.

    This list has been, in my use, without issue for many years on just about every hosting service I have used, and incidentally owned, with most of them coming from my owned dedicated servers' log files.

    So if you have any error issues with adding or editing your .htaccess file then check that you have not made a typo or a copy and paste error.

    Make a backup of any existing .htaccess file before adding this list or parts of this list.

    I can assure you also that it will not effect those human visitors you wish to have access to your site - but even they can be denied access, if they play up, in the same .htaccess file but by using a different directive.

    Code:
    RewriteEngine on 
    RewriteBase /
    # User-Agents with no privileges (mostly spambots/spybots/offline downloaders that ignore robots.txt)
    RewriteCond %{REMOTE_ADDR} "^63\.148\.99\.2(2[4-9]|[3-4][0-9]|5[0-5])$" [OR] # Cyveillance spybot
    RewriteCond %{REMOTE_ADDR} ^12\.148\.196\.(12[8-9]|1[3-9][0-9]|2[0-4][0-9]|25[0-5])$ [OR] # NameProtect spybot
    RewriteCond %{REMOTE_ADDR} ^12\.148\.209\.(19[2-9]|2[0-4][0-9]|25[0-5])$ [OR] # NameProtect spybot
    RewriteCond %{REMOTE_ADDR} ^64\.140\.49\.6([6-9])$ [OR] # Turnitin spybot
    RewriteCond %{REMOTE_ADDR} ^216\.169\.(9[6-9]|1[01][0-9]|12[0-7])\. [OR] # rude bot
    RewriteCond %{HTTP_REFERER} citylinkz\.com [NC,OR] # log spambot
    RewriteCond %{HTTP_REFERER} iaea\.org [NC,OR] # spambot
    RewriteCond %{HTTP_REFERER} netfactual\.com [NC,OR] # rude bot
    RewriteCond %{HTTP_REFERER} traffixer\.com [NC,OR] # log spambot
    RewriteCond %{HTTP_REFERER} web\.ask\.com [NC,OR] # rude bot
    RewriteCond %{HTTP_USER_AGENT} ZyBorg [NC,OR] # rude bot
    RewriteCond %{HTTP_USER_AGENT} ^[A-Z]+$ [NC,OR] # spambot
    RewriteCond %{HTTP_USER_AGENT} anarchie [NC,OR] # OD
    RewriteCond %{HTTP_USER_AGENT} AOLserver-Tcl/3\.5\.6 [NC,OR] # rude bot
    RewriteCond %{HTTP_USER_AGENT} Atomz [NC,OR] # rude bot
    RewriteCond %{HTTP_USER_AGENT} cherry.?picker [NC,OR] # spambot
    RewriteCond %{HTTP_USER_AGENT} "compatible ; MSIE 6.0" [NC,OR] # spambot (note extra space before semicolon)
    RewriteCond %{HTTP_USER_AGENT} crescent [NC,OR] # OD
    RewriteCond %{HTTP_USER_AGENT} "^DA \d\.\d+" [NC,OR] # OD
    RewriteCond %{HTTP_USER_AGENT} "DTS Agent" [NC,OR] # OD
    RewriteCond %{HTTP_USER_AGENT} "^Download" [NC,OR] # OD
    RewriteCond %{HTTP_USER_AGENT} EasyDL/\d\.\d+ [NC,OR] # OD
    RewriteCond %{HTTP_USER_AGENT} EmeraldShield [NC,OR] # rude bot
    RewriteCond %{HTTP_USER_AGENT} e?mail.?(collector|magnet|reaper|siphon|sweeper|harvest|collect|wolf) [NC,OR] # spambot
    RewriteCond %{HTTP_USER_AGENT} express [NC,OR] # OD
    RewriteCond %{HTTP_USER_AGENT} extractor [NC,OR] # OD
    RewriteCond %{HTTP_USER_AGENT} "Fetch API Request" [NC,OR] # OD
    RewriteCond %{HTTP_USER_AGENT} flashget [NC,OR] # OD
    RewriteCond %{HTTP_USER_AGENT} FlickBot [NC,OR] # rude bot
    RewriteCond %{HTTP_USER_AGENT} FrontPage [NC,OR] # stupid user trying to edit my site
    RewriteCond %{HTTP_USER_AGENT} getright [NC,OR] # OD
    RewriteCond %{HTTP_USER_AGENT} go.?zilla [NC,OR] # OD
    RewriteCond %{HTTP_USER_AGENT} "efp@gmx\.net" [NC,OR] # rude bot
    RewriteCond %{HTTP_USER_AGENT} Gigabot [NC,OR] # rude bot
    RewriteCond %{HTTP_USER_AGENT} Girafabot [NC,OR] # rude bot
    RewriteCond %{HTTP_USER_AGENT} grabber [NC,OR] # OD
    RewriteCond %{HTTP_USER_AGENT} grub [NC,OR] # OD
    RewriteCond %{HTTP_USER_AGENT} "Hosting Client" [NC,OR] # spambot
    RewriteCond %{HTTP_USER_AGENT} HostItCheap [NC,OR] # spambot
    RewriteCond %{HTTP_USER_AGENT} Hotbar [NC,OR] # rude bot
    RewriteCond %{HTTP_USER_AGENT} httrack [NC,OR] # OD
    RewriteCond %{HTTP_USER_AGENT} ia_archiver [NC,OR] # rude bot
    RewriteCond %{HTTP_USER_AGENT} imagefetch [NC,OR] # rude bot
    RewriteCond %{HTTP_USER_AGENT} "Indy Library" [NC,OR] # spambot
    RewriteCond %{HTTP_USER_AGENT} "^Internet Explore" [NC,OR] # spambot
    RewriteCond %{HTTP_USER_AGENT} ^IE\ \d\.\d\ Compatible.*Browser$ [NC,OR] # spambot
    RewriteCond %{HTTP_USER_AGENT} Larbin [NC,OR] # rude bot
    RewriteCond %{HTTP_USER_AGENT} "libwww-perl/5\.68" [NC,OR] # rude bot
    RewriteCond %{HTTP_USER_AGENT} "LINKS ARoMATIZED" [NC,OR] # rude bot
    RewriteCond %{HTTP_USER_AGENT} lwp-trivial [NC,OR] # rude bot
    RewriteCond %{HTTP_USER_AGENT} MediBot [NC,OR] # rude bot
    RewriteCond %{HTTP_USER_AGENT} "Microsoft URL Control" [NC,OR] # spambot
    RewriteCond %{HTTP_USER_AGENT} "^Microsoft-WebDAV-MiniRedir/5\.1\.2600$" [NC,OR] # unknown
    RewriteCond %{HTTP_USER_AGENT} "mister pix" [NC,OR] # rude bot
    RewriteCond %{HTTP_USER_AGENT} "^Mozilla/4.0$" [NC,OR] # dumb bot
    RewriteCond %{HTTP_USER_AGENT} "^Mozilla/\?\?$" [NC,OR] # formmail attacker
    RewriteCond %{HTTP_USER_AGENT} MSIECrawler [NC,OR] # IE’s "make available offline" mode
    RewriteCond %{HTTP_USER_AGENT} ^NG [NC,OR] # unknown bot
    RewriteCond %{HTTP_USER_AGENT} "^obot$" [NC,OR] #
    RewriteCond %{HTTP_USER_AGENT} offline [NC,OR] # OD
    RewriteCond %{HTTP_USER_AGENT} NaverRobot [NC,OR] # rude bot
    RewriteCond %{HTTP_USER_AGENT} net.?(ants|mechanic|spider|vampire|zip) [NC,OR] # OD
    RewriteCond %{HTTP_USER_AGENT} Netcraft [NC,OR] # rude bot
    RewriteCond %{HTTP_USER_AGENT} nicerspro [NC,OR] # spambot
    RewriteCond %{HTTP_USER_AGENT} ninja [NC,OR] # Download Ninja OD
    RewriteCond %{HTTP_USER_AGENT} NPBot [NC,OR] # NameProtect spybot
    RewriteCond %{HTTP_USER_AGENT} PersonaPilot [NC,OR] # rude bot
    RewriteCond %{HTTP_USER_AGENT} psbot [NC,OR] # image thief bot
    RewriteCond %{HTTP_USER_AGENT} Scooter [NC,OR] # rude bot
    RewriteCond %{HTTP_USER_AGENT} semanticdiscovery [NC,OR] # rude bot
    RewriteCond %{HTTP_USER_AGENT} snagger [NC,OR] # OD
    RewriteCond %{HTTP_USER_AGENT} Sqworm [NC,OR] # rude bot
    RewriteCond %{HTTP_USER_AGENT} SurveyBot [NC,OR] # rude bot
    RewriteCond %{HTTP_USER_AGENT} tele(port|soft) [NC,OR] # OD
    RewriteCond %{HTTP_USER_AGENT} Teoma [NC,OR] # rude bot
    RewriteCond %{HTTP_USER_AGENT} T-H-U-N-D-E-R-S-T-O-N-E [NC,OR] # rudebot
    RewriteCond %{HTTP_USER_AGENT} "Torrent Crawler" [NC,OR] # Rude Torrent Crawler 
    RewriteCond %{HTTP_USER_AGENT} TurnitinBot [NC,OR] # Turnitin spybot
    RewriteCond %{HTTP_USER_AGENT} twiceler [NC,OR] # experimental bot
    RewriteCond %{HTTP_USER_AGENT} VoilaBot [NC,OR] # rude bot
    RewriteCond %{HTTP_USER_AGENT} web.?(auto|bandit|collector|copier|devil|downloader|fetch|hook|mole|miner|mirror|reaper|sauger|sucker|site|snake|stripper|weasel|zip) [NC,OR] # ODs
    RewriteCond %{HTTP_USER_AGENT} vayala [NC,OR] # dumb bot, doesn’t know how to follow links, generates lots of 404s
    RewriteCond %{HTTP_USER_AGENT} zeus [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} "^Mozilla/4\.0 compatible ZyBorg/1\.0 (wn\.zyborg@looksmart\.net; http://www\.WISEnutbot\.com)$" [NC] # rude bot
    RewriteRule .* - [F,L]
    If you wanted to add Yahoo for example, then to keep it in alphabetical order , just place it after this entry like so:
    Code:
    RewriteCond %{HTTP_USER_AGENT} vayala [NC,OR] # dumb bot, doesn't know how to follow links, generates lots of 404 error pages
    RewriteCond %{HTTP_USER_AGENT} "Yahoo! Slurp" [NC,OR] # bandwidth hog bot
    Do likewise if you want to add more that you know of.

    Or if you just want to add Yahoo to get you started and to test the waters so to speak, and maybe you feel Google is the only worthy search engine to use (Yahoo looks like it will be swallowed up by the greedy Micro$oft entity anyway) then just add this to your .htaccess file which can be the last entry in the file, after all others, if you wish:

    Code:
    RewriteEngine on 
    RewriteBase /
    RewriteCond %{HTTP_USER_AGENT} "Yahoo! Slurp" [NC,OR] # bandwidth hog bot
    RewriteRule .* - [F,L]
    I hope this helps to keep your log files and bandwidth under your control, it has for me, also I am in the process of creating a new tutorial for the Tutorials Forum but, as I pointed out, there are many bots that just choose to ignore your wishes and steal your bandwidth for their own greedy self interests, which is a pet hate of mine, especially if one is paying for that bandwidth.

    Regards,
    Rocky

  7. #7
    Smith6612's Avatar
    Smith6612 is offline <<< wants a Turkey Smith6612 has a spectacular aura about
    Join Date
    Dec 2007
    Location
    Exploded
    Posts
    6,485

    Re: Yahoo is raiding my forum!

    Thanks for the .htaccess tutorial. If I ever need to use that, I'll be sure to check back here. As for me running out of bandwidth, all I have to do is change my domain name to point to my home connection. With my nightly downloads of my database and weekly backups of the home directory, if something happens, I just open those backups, throw them on the server and change the DNS on the domain, and I'm all set.

    My signature likes cookies! Do you? :D

  8. #8
    Brandon's Avatar
    Brandon is offline Former Senior Account Rep Brandon is on a distinguished road
    Join Date
    Jun 2006
    Location
    Tewksbury, MA
    Posts
    9,589

    Re: Yahoo is raiding my forum!

    Yahoo is always on my forums, usually 2-5 spiders min. I am not sure why, but they must have a lot.
    Thanks,
    Brandon Long

  9. #9
    DeadBattery's Avatar
    DeadBattery is offline Community Support Team DeadBattery is a name known to allDeadBattery is a name known to all
    Join Date
    Mar 2008
    Location
    localhost
    Posts
    4,019

    Re: Yahoo is raiding my forum!

    Maybe they are trying to beat Google.


  10. #10
    Smith6612's Avatar
    Smith6612 is offline <<< wants a Turkey Smith6612 has a spectacular aura about
    Join Date
    Dec 2007
    Location
    Exploded
    Posts
    6,485

    Re: Yahoo is raiding my forum!

    Yahoo probably does, as SMF 2.0 saw over a thousand cases of Yahoo being in my forum. That was quadruple the amount of Google. So yeah, there's two bots sitting in the forum right now, but it never leaves :P

    My signature likes cookies! Do you? :D

+ Reply to Thread
Page 1 of 2 12 LastLast

Similar Threads

  1. Linking Hosting Acct To Different Forum Acct
    By matfor1 in forum Free Hosting
    Replies: 1
    Last Post: 11-05-2007, 09:23 AM
  2. Replies: 2
    Last Post: 10-24-2007, 08:54 AM
  3. Forum dilema
    By Sundowner in forum Crossfire
    Replies: 1
    Last Post: 10-21-2007, 06:19 AM
  4. Forum Options Error
    By WeeRowan in forum Free Hosting
    Replies: 6
    Last Post: 12-29-2005, 06:39 PM
  5. A totally awesome forum
    By Spunky in forum Off Topic
    Replies: 4
    Last Post: 07-01-2005, 12:54 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
x10hosting free hosting for the masses
dedicated servers