Is it possible for either Microsoft Computer Vision API or Google's Cloud Vision API to get a location for objects?

google vision api
microsoft computer vision api tutorial
microsoft computer vision api python
azure computer vision ocr
microsoft cognitive services demo
azure computer vision vs custom vision
microsoft emotion api
google vision api pricing

I am trying to develop an application that needs to know the location of tagged objects in an image. Knowing that there is a "piano" in an image is not enough, I need to know where that piano is in the image.

Both Microsoft's Computer Vision API and Google's Cloud Vision API provide some form of cropping suggestion/smart thumbnail generation service which leads me to think that the location of certain objects is being detected - however is there a way to get that information (like a bounding box around each detected object) from either Microsoft's Computer Vision API or Google's Cloud Vision API?

EDIT: I understand that both APIs can return the location of faces detected in an image, however I am looking for locations and sizes of every object in an image: cars, pianos, trees, people...anything.

Microsoft Vision API offer no pixel coordinates for the detected objects (see return features:

However if you want to detect persons Microsoft API can return the coordinates of the face rectangles.

Using Google Cloud Vision API with Golang, . Beyond that there is a tiered pricing model based on the number of units that you use in a month. Empower users with low vision by providing descriptions of images. Learn how Microsoft applies Computer Vision to PowerPoint, Word, Outlook, and Excel for auto-captioning of images for low-vision users. And help users navigate the world around them by pairing Computer Vision with Immersive Reader to turn pictures of text into words read aloud.

I don't know about any API serving you coordinates of the object at this time. What I recommend to use is YOLO which provides you with coordinates of the object. You can use either pre-trained models or train your own.

However, it is not API and you have to code a bit of backend to run in remotely.

Using the Vision API with Python - Codelabs, detection features within applications, including image labeling, face and landmark detection, optical character recognition (OCR), and tagging of explicit content. For example, Computer Vision can determine whether an image contains adult content, find specific brands or objects, or find human faces. You can use Computer Vision in your application through a client library SDK or by calling the REST API directly. This page broadly covers what you can do with Computer Vision.

Hope this could help you


url:- (In POST) https://{yourvisionapp}
headers:- Content-Type: application/json
Ocp-Apim-Subscription-Key : {yourSubscriptionKey}
body:- {"url":"yoururl"}

sample response:-

    "objects": [
            "rectangle": {
                "x": 460,
                "y": 79,
                "w": 141,
                "h": 258
            "object": "window",
            "confidence": 0.508
            "rectangle": {
                "x": 180,
                "y": 240,
                "w": 299,
                "h": 182
            "object": "Billiard table",
            "confidence": 0.635,
            "parent": {
                "object": "table",
                "confidence": 0.676
            "rectangle": {
                "x": 8,
                "y": 11,
                "w": 497,
                "h": 416
            "object": "room",
            "confidence": 0.547
    "requestId": "f8aafd95-d17d-4088-a34b-ad616f9cde4a",
    "metadata": {
        "width": 640,
        "height": 427,
        "format": "Jpeg"

How to Use the Google Cloud Vision API in Android Apps, . In the page that opens, simply press the Enable button. In practice, a standard size of 640 x 480 pixels works well in most cases; sizes larger than this may not gain much in accuracy, while greatly diminishing throughput. When at all possible, pre-process your images to reduce their size to these minimum standards. File size. Image files sent to the Vision API should not exceed 20MB.

2020 UPDATE:

This question is a few years old, but the Microsoft Azure Computer Vision API is now able to draw bounding boxes around objects that are detected in an image. Here is a sample in Python. Other languages are available as well.

Computer Vision documentation:

Computer Vision SDK:

Computer Vision API:

Computer Vision, Transform your app with computer vision—all through an API call. Powerful content extraction. Pull from a  Google Cloud Vision: Face recognition platforms tuned into cloud engines have a massive advantage. If your application runs on Google cloud engine it will be very easy to integrate Google Cloud Vision into your product or application. It boasts many pre-trained models and API outlets to become a power tool for many computer vision programmers.

Top 10 Computer Vision APIs: AWS, Microsoft, Google and more, Computer Vision API is hosted on Microsoft Azure and provides developers with access to advanced The options of either uploading the image or passing a URL are both available. You may face possible privacy issues. Because the Google API Client can work only if your app has the INTERNET permission, make sure the following line is present in your project's manifest file: <uses-permission android:name="android.permission.INTERNET"/> 3. Configuring the API Client. You must configure the Google API client before you use it to interact with the Cloud Vision API.

Comparison of Top 6 Cloud APIs for Computer Vision, The main task of computer vision is to understand the contents of the image. They are developed by the various companies like Google, Microsoft, IBM, With the help of their products, it is possible to detect faces in either  You can work with either one, or reap the benefits of both products by using Vision API to quickly categorize content using thousands of predefined labels, and using AutoML Vision to create

Ask HN: Microsoft Computer Vision API or Google Cloud Vision API , Ask HN: Microsoft Computer Vision API or Google Cloud Vision API? Face detection is something that is an added bonus if possible. I wouldn't rely on either for my own startup, because I dont think these API's will have  Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for.

  • Possible duplicate of How to get a position of custom object on image using vision recognition api
  • See my edit - I'm looking for more than just face locations, but I understand that these APIs may not be what I am looking for.
  • In that case Microsoft API is not suitable
  • Any idea about the Google API or any other APIs?
  • Have you tried using the OpenCV package in python (tutorial: ). Unfortunately I have no clue about googles API. Good luck.
  • I think OpenCV has to be trained to be able to classify a ton of objects. I am looking for some solution that already can recognize thousands of every day objects and items.