Arrow Down

Everything You Need To Know About Google Cloud Vision API

Nisha Gopinath Menon
June 23, 2017

Google Photos is the first to bring image recognition features to the public. It relies on pattern matching algorithms and image classification. This is the technology which makes it possible for us to search for photos containing a particular landmark or object. By open sourcing Cloud Vision, Google allows developers who are not in a position to build their technology, to take advantage of Google’s underlying image recognition technology to build applications which see and understand content within its images. Cloud Vision API enables your developers to build image recognition and classification features into your application, by incorporating image analytics capabilities in the form of easy to use REST APIs.  

For the non-technical crowd, an Application programming interface (API) is a messenger of sorts. An API takes your requests and tells the system what you want it to do then brings the response back to you. A lot like a waiter in a restaurant who takes your order to the kitchen where it’s made and brings the response or the food back to your table. Cloud Vision API immediately categorizes images(queries), detects particular faces, objects and logos and searches for content in those images and displays them (response). This opens up a lot of avenues, like building metadata on image catalog or identifying offensive content. Companies have always encouraged their users to annotate and tag images accurately, even so, the metadata is poor or missing even. Cloud Vision uses artificial intelligence to add metadata to images automatically when they are uploaded to your system.

Phrases like “game-changer” and “”revolutionary technology” have been thrown around to describe Google’s Cloud Vision API. Skeptics will find it hard not to look past the marketing smokescreens, but some developments come with this technology which makes it worth the effort to examine the fine print.

The logic

Exposed as RESTful APIs (meaning providing the user(Cloud vision) with an interface to access and manipulate your data(images)), Cloud Vision accepts an image and categorizes it. Your developers then build rich metadata around the images to perform custom searches. Google gave good thought to the three main parameters involved in developing this technology.

  1. How will you get your source of data and tag it? Google has access to a huge set of data.  
  2. Need large-scale data processing capability? Google is the best there is.  
  3. Tackle domain expertise? Google is product-agnostic, in other words not zooming in on any specific vertical.

Claims made by Google’s Cloud Vision API

  • Label/Entity Detection identifies the dominant object within an image. You can use the API to build metadata on your image catalog, allowing new scenarios like image based searches or recommendations.
  • Optical Character Recognition to understand the textual content in an image. Cloud Vision API provides automatic identification of language as it supports a broad range of languages.
  • Safe Search Detection detects any inappropriate content in your images. Powered by Google SafeSearch, the feature enables you to moderate crowd-sourced content.
  • Facial Detection detects faces in an image, along with the facial features like nose, eye and mouth position, and a likeness of over eight attributes such as joy and sorrow.
  • Landmark Detection comes with the identification of the related latitude and longitude.
  • Logo Detection to find and recognize product logos within an image. Cloud Vision API identifies the product brand logo, with the associated bounding polybox.


Today’s users are overwhelmed by the sheer number of photos they store on their devices and in the cloud. Solving their challenges will influence their photo taking, sharing and tagging behavior.

  • Insight
  • Cloud Vision detects faces, logos, and objects in your image. It also detects the associated emotions by returning to positions of eyes, nose, and mouth of the faces in your image. The more you work with this technology, the more it adapts to your environment and better the accuracy. Cloud Vision doesn’t touch privacy-sensitive face recognition functionality.
  • Entity Detection
  • One of the attractive features of cloud vision APIs is the entity detection, meaning it detects any object you like using Google Image Search.
  • Moderate Content
  • You can moderate content using Cloud Vision. Optical Character Recognition (OCR is one of the key technologies powering Google Translate) lets you detect various content from adult to offensive within your images, along with detecting the language the content is written in.
  • Image Attributes
  • The dominating colors in your image, image size, landmark hints, crop size hints, etc.
  • Multiple Feature Application
  • You can apply various features to one image. Take an image with a car, for instance. It’ll recognize the brand, color, text (if any) and the happy faces inside the car.
  • Discoverability
  • It’s no longer about merely detecting when and where your logos are appearing on social media. Cloud Vision will detect your logo if it appears on random glass bottles or walls, or in different locations across the globe.
  • The Google Advantage
  • Google is known for designing sophisticated systems which scale well, handle huge sets of data and iterate fast. They also have enough resources to invest in technology development

By shifting your heavy duty to the Cloud, low-powered devices can take advantage of these services through the APIs. Even the app developers who have a homegrown image recognition technology could benefit from adding a subset of the Google Cloud Vision API functionality to complement theirs. The possibilities are immense. Door keys might die out soon if this technology is coupled with the Internet of things to open doors through facial recognition. It can even be used to describe images to visually impaired people.


Image recognition defines an image in words. It will identify objects, facial expressions, landmarks, logos, etc. Visual search is about finding visually similar images, or maybe to find visually similar objects like those identified in your image. Visual search is more of a challenge to develop. It relies a lot on domain expertise to ensure the results are relevant and not merely technically correct. While the ink is not yet dry as to Cloud Vision’s upcoming features, and future scope, the days of offering general image recognition solutions are over. The market will only allow vendors, focusing on areas that require specialized image recognition services to thrive.  

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Need help with product design or development?

Our product development experts are eager to learn more about your project and deliver an experience your customers and stakeholders love.

Nisha Gopinath Menon