+ Reply to Thread
Page 1 of 2 12 LastLast
Results 1 to 10 of 13

Thread: Hotlinking Question...

  1. #1
    learning_brain is offline x10 Sophmore learning_brain is an unknown quantity at this point
    Join Date
    Apr 2010
    Location
    UK, Midlands
    Posts
    170

    Hotlinking Question...

    Some of you may know I now run an image search engine that crawls for high quality pics and graphics.

    Unfortunately, due to the somewhat unpredictable nature of my 'good' free host, my account has been deleted so I can't demonstrate the problem.

    Essentially, there are two parts (well more but two critical ones) to the crawling process.

    1) find and store all <img src="whatever">'s
    2) create and store small jpg thumbnail for reference.

    Simple so far?

    Yep, but my index page search has an interesting function. Firstly, it loads all thumbnails related to the search (fast loading). In addition however, it also hotlinks (or loads the image from the originating site) so that a large preview can be shown when you hover over the thumbnail.

    Sounds great, so whats the problem?

    The problem is that "hotlinking" is frowned upon by many. It eats into the originating site's bandwidth and, as I'm using high resolution images, this can be a severe hit to someone with limited resource. The secondary hang-up for my site is that the page takes a while to load, as each page result has about 16 large images on it.

    This practice is not always frowned upon. My site provides the originating site's address, hence creating a back link for them and driving traffic. Also, the ability to create direct links is kinda the point of the world wide web is it not?

    I have two options:

    1) Drop the large image preview and stick to the 120x80px thumbs which are stored on my site... which will make it more like google and lose part of the USP uniqueness. This will also speed up my page loads.

    2) Keep the hotlinked large image preview, maintain my sites uniqueness and risk the consequences... if there are any :S (other than the known "switcheroo" problem which means the originating site may alter the image without warning for something unsuitable)

    Any opinions?

  2. #2
    lemon-tree's Avatar
    lemon-tree is offline x10 Minion lemon-tree has a spectacular aura about
    Join Date
    Nov 2007
    Posts
    1,420

    Re: Hotlinking Question...

    If the large images are what defines your site, then removing them wouldn't really get you anywhere. However, another alternative would be to only load the large images when the user clicks or mouseovers the image; this is a balance of how Google will take you out of the page just to see a large image (which is frustrating) and your technique of loading them all (which is slow).
    There shouldn't be any consequences of loading images from people's sites, as any that feel they don't want their images shared can use the hot-linking protection. I assume you have code built into your crawler to ensure the image does actually exist and hot-linking can take place.
    Finally, it's no surpise you were suspended from a free host if you were running a crawling bot on it; for this sort of computational power etc and accessibility you should really be looking at something more like a VPS.

  3. #3
    descalzo's Avatar
    descalzo is offline Grim Squeaker descalzo has a brilliant futuredescalzo has a brilliant futuredescalzo has a brilliant future
    Join Date
    Jul 2009
    Location
    Ankh-Morpork
    Posts
    7,636

    Re: Hotlinking Question...

    Opinion?

    Hotlinking is theft of bandwidth. And theft is theft.
    Nothing is always absolutely so.

  4. #4
    learning_brain is offline x10 Sophmore learning_brain is an unknown quantity at this point
    Join Date
    Apr 2010
    Location
    UK, Midlands
    Posts
    170

    Re: Hotlinking Question...

    Interesting diversity of opinion... as I feared.

    Yes I do have functionality built into the crawler to ensure hotlinking can take place, but I'm mimicing the useragent in cURL to look like a browser. The check comes when I try to gain info about the image (like size). If I can't obtain the info, likelihood is, they don't permit hotlinking and the root URL(with directories) gets ignored.

    I did check my hosts tos carefully before hosting it with them, but I think they must have cottoned on to the large processing power needed. One url link is OK, but when it spiders to numerous urls, it gets more interesting! And yes I am looking to a paid service, either VPS or dedicated. ATM, I'm developing using XAMP.

    Descalzo - your point is also noted and it is for this reason I posted.

    The really interesting point you raise lemon_tree, is the "load on hover".. which I asume would be done in Javascript (assuming the browser has JS). Unfortunately, I have no experience with Javascript and have no idea where to start with it... I'll have do do some googling.

    Thank you both.

    Rich

  5. #5
    lemon-tree's Avatar
    lemon-tree is offline x10 Minion lemon-tree has a spectacular aura about
    Join Date
    Nov 2007
    Posts
    1,420

    Re: Hotlinking Question...

    The Javascript required would really be quite trivial and would just be a case of populating img tags src from an array containing the URLs of the images in the current page. Adding a 'loading image' overlay whilst the image is still transferring would also add to the UI for any particularly slow loading images.
    This technique would also reduce the amount of hot-linking required as only the user's desired images are ever loaded fully; this should move it more to the situation that descalzo favours.

  6. #6
    learning_brain is offline x10 Sophmore learning_brain is an unknown quantity at this point
    Join Date
    Apr 2010
    Location
    UK, Midlands
    Posts
    170

    Re: Hotlinking Question...

    LOL - even trivial JS is hard for me! BTW - I like the Overlay Image idea.

    Still reading up....

    I've also found a way to avoid the nasty switcheroo problem.

    When the crawler finds an image, I could use exif_read_data() and store a concatenated string as a fingerprint of that image. Then, on the image view page (not quite the same as the index thumbs and preview) I can duplicate the check and compare each value. If teh fingerprints don't match, all I'd have to do is display a "Sorry" statement and mark that record to ignore it in future....It would slow down the cralwer even more though

  7. #7
    leafypiggy's Avatar
    leafypiggy is offline Community Advocate leafypiggy is on a distinguished road
    Join Date
    Aug 2007
    Location
    Massachusetts
    Posts
    2,228
    Content dynamic network?

    That's what google uses. Might be helpful for you to read up on it and maybe make your own. That way, you can cache the high-res images on the CDN, and access them when needed. Probably will save loading time as well since it will be a local DNS lookup.
    Neil Hanlon | x10Hosting Support Representative
    Neil[at]x10hosting.com
    █ I'm always happy to help. Just ask a question in Free Hosting
    Terms of Service IRC

  8. #8
    learning_brain is offline x10 Sophmore learning_brain is an unknown quantity at this point
    Join Date
    Apr 2010
    Location
    UK, Midlands
    Posts
    170

    Re: Hotlinking Question...

    Wow - CDN's are complex! (Content Delivery Network?)

    Yes, this would be the perfect solution.... if I had enough traffic.... and enough money! CDN's are notoriously difficult to set up on your own by the look of it and professional services are extortionate! Some CDN networks claim to offer better performance, but in reality depend on PoP and network Server proximity.

    The JS pre-load is an idea I could use for the overlay (simple to you I know), but this becomes tricky to incorporate becasue my images are already in a php array and I'm already using an "image-over" css trick using the span.

    Head hurts.. tomorrow I'll look at it again.
    Last edited by learning_brain; 07-15-2010 at 05:51 PM.

  9. #9
    cybrax's Avatar
    cybrax is offline x10 Elder cybrax is on a distinguished road
    Join Date
    Aug 2009
    Location
    UK
    Posts
    699

    Re: Hotlinking Question...

    It's a classic 'Rat Hole' project, not because the script is imposibble but rather the fact nobody will host it for you on a shared server [free or paid] due to the high CPU resource usage. The other thing is of course that the server running the crawler is going to have a fixed IP address and webmasters of sites on the crawl list are going to spot something is wrong fairly quickly.. as will Google Adsense but that's another issue all together.

    Plan 'A':
    Now it would be nice if all the scraping could be done Client-Side on the visitors PC, using their processor power, IP address and bandwidth etc. Alas though there is no real way of doing this due to the browser security restrictions on cross domain scripting. The closest you can get is by using YQL and Jquery/JSON but it's far from ideal.

    Plan 'B':
    Run a server at home on your own Internet connection to perform the heavy work of 'searching & storing' data using whatever script and displaying the result of any query. Using a Pro hosted site (free or paid) as a 'Web Anchor' for static content/ indexing / SEO and link the two together...


    Plan 'C':
    'Know Thy Visitor' - why crawl the web for every user request? People are predictable or rather follow certain trends and a little digging around the webs photostock sites reveals what folk are asking them to find. So...

    Querying a database to return stored image URL(s) information for a given rubric gives the same percieved effect to the visitor. Grab an object/noun word list from the web and run it through a trial builder version of 'Djuggler' to generate the DB.
    The code must flow.
    Project 157: Latest UK Jobs direct to your mobile phone
    New Domain under construction: Lovelogic.net
    home for some new projects that we can't keep here ;)


  10. #10
    learning_brain is offline x10 Sophmore learning_brain is an unknown quantity at this point
    Join Date
    Apr 2010
    Location
    UK, Midlands
    Posts
    170

    Re: Hotlinking Question...

    Thanks Cybrax

    I'm shortly going to be moving to paid hosting and have now allocated a .com domain name for it. (not very interesting but great for SEO)

    I have already checked with the hosts about the CPU usgae for the crawler and they don't have problem with it a) because they don't have enough clients or more likely b) they have no conept of the degree of processing it will require. Ho hum, that's their problem.

    I quite like your Plan B idea, although this is a very new venture and I want to gain some experience of "knowing mine visitor"! Investing in a dedicated server ATM is not a preferred option.

    As for your plan C, I had already orgnanised the script to do just that. It will prefer domains including key words such as "wallpaper" for instance (and other common searches), which will aim the content more precisely than if I let it do it's own thing.

    The hotlinking issue is still bugging me. I haven't worked out the JS code yet for selective image loads, but the issue still remains to a certain degree. Still working on that....
    Last edited by learning_brain; 07-17-2010 at 01:50 PM.

+ Reply to Thread
Page 1 of 2 12 LastLast

Similar Threads

  1. Hotlinking image exception?
    By WyrGecko in forum Free Hosting
    Replies: 1
    Last Post: 12-12-2008, 11:48 PM
  2. Prevent Hotlinking
    By lair360 in forum Tutorials
    Replies: 2
    Last Post: 12-04-2008, 11:08 AM
  3. Que es hotlinking?
    By yaxiiah in forum General
    Replies: 9
    Last Post: 01-31-2007, 08:39 PM
  4. help with hotlinking
    By NewFuture in forum Free Hosting
    Replies: 12
    Last Post: 07-13-2005, 11:13 PM
  5. Hotlinking allowed?
    By asianu in forum Free Hosting
    Replies: 3
    Last Post: 03-22-2005, 07:07 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
x10hosting free hosting for the masses
dedicated servers