The technology itself is developed by Google, but offered to software developers as an API (application programming interface) so that the intelligence can be embedded into other apps as they are built.
Is that a plate of sushi or a lion?
The Google Cloud Vision API itself is capable of ‘absorbing’ and seeing an image and then classifying it into one (or several) of thousands of categories. The technology here can scan an image’s typical components and constituent parts to decide if the computer is looking at a plate of sushi or a picture of a lion, for want of two random examples. This intelligence can also detect individual objects and faces within images and reads printed words contained within images.
Why bother? Well, lots of reasons… number one being our drive to create workable machine intelligence functions within the computer systems that we use day to day. But there are other good reasons for building computers than can see:
- Computers with seeing vision of this kind can detect human emotions, so ‘image sentiment analysis’ allows our machines to know whether we are happy or sad when using a particular application or service.
- Computers with seeing vision can detect inappropriate content and in theory help build a cleaner Internet — used in live image analysis they can also help detect violence and other dangerous scenarios.
- Computers with seeing vision can extract text from inside images, translate it (if necessary) and help to build up our increasingly digitized view of the world.
Is Google ‘capturing’ the planet?
These are some of the core rationales around which Google has developed this technology. There will be obvious safety and privacy concerns and worries about how far Google is attempting to ‘capture the entire planet for search’ (which it ultimately monetizes), but this debate has already been well aired.
What we do know is that Google is currently building out this service and bringing new functions forward. We also know that, as Forbes writer Janakiram MSV has explained, the heavy computer processing here goes on in the cloud. This means that even low-powered mobile devices’ applications can take advantage of these services through the APIs.
What Google Cloud Vision did next
In addition to announcing pricing for this API (no, it’s not free, silly) currently we see Google adding additional capabilities to identify the dominant color of an image. Developers can now apply Label Detection on an image for $2 per 1,000 images or Optical Character Recognition (OCR) for $0.60 for 1,000 images.
According to Google, for text extraction, “Optical Character Recognition (OCR) enables you to detect text within your images, along with automatic language identification across a broad set of languages. For sentiment analysis: Cloud Vision API can analyze emotional attributes of people in your images, like joy, sorrow and anger, along with detecting popular product logos.”
Marsal Gavaldà, director of engineering for machine intelligence at Yik Yak, a location-based social network, ran over a million images through the Cloud Vision API.
Google’s PR machine arguably somewhat gushes and notes that, “The company (Yik Yak) was impressed with the accuracy of its feature detectors and content analyzers and the precision and recall of the text extraction in multiple languages. The number of objects that can be identified with the Cloud Vision API is an order of magnitude greater than comparable services from other cloud providers.”
Not mission critical
This is still an emerging very much beta stage technology and Google openly states that Cloud Vision API is not intended for real-time mission critical applications.
Should we fight this kind of innovation? You can try, but this is technical progress and so no, you can’t stop this kind of progression happening. It is better to know about it now and think about the implications it will have on our applications and upon our own behaviour.