The problem with this sort of captcha is that it is very much dependant upon what a user perceives an image to be of. For example, this image:

It has more than just one possibility: Boat, Ship, Ocean, Sea, Tree and any other number of derivations. I entered boat and got an invalid captcha; so what one person calls a boat may be called a ship by another. Essentially, with this sort of captcha, in which there is any form of ambiguity as to what the answer is, will only act to frustrate your users.
What you are trying to do here is reinvent the wheel when the are already viable systems that have been shown to be effective. Additionally, image recognition captcha has been positively shown to be insecure in the past. For example, I could index the first thousand or more images that are returned by a search for 'Ship' and then every time you show an image of a ship I could break the system first time. If I don't have an image than I may then be able to make a good guess based on the content of the images, i.e if the image is lots of blue with a white shape then I may guess 'Ship' or 'Cloud'. If this process is repeated for a whole range of phrases then your system becomes effectively useless, as eventually you will show an image that I have data for.
This is why randomly generated images are so popular for captcha, as there is no way to predict what will show up. The only way to break these is to decode it. Relying upon the chance that I don't have that image is not a good system for a captcha.