/  Technology   /  Google AI Releases Objectron Dataset For Advanced 3D Object Understanding

Google AI Releases Objectron Dataset For Advanced 3D Object Understanding

The cutting edge in machine learning (ML) has accomplished extraordinary accuracy on many computer vision tasks exclusively by training models on photos. Building upon these triumphs and driving 3D object understanding has incredible scope to power an extensive range of applications, such as augmented reality, robotics, autonomy, and image recovery if we train these models on 3D objects.

However, understanding objects in 3D remains a difficult task due to the lack of huge real-world datasets compared to 2D (e.g., ImageNet, COCO, and Open Images tasks). To engage the research community for continued advancement in 3D object understanding, there is a strong need for the release of object-centric video datasets, which capture greater amount of the 3D structure of an object, while coordinating the data format utilized for many vision tasks.

 

ObjectronDataset

 

Furthermore, now, Google has released the Objectron dataset, which is aassortment of short, object-centric video clips that capture anenormous set of common objects from different points. In addition, with the dataset, the research also details a new 3D object detection solution.

 

Objectron Dataset 

 

The Objectron dataset consists of video clips and images in the following classifications: bikes, books, cameras, bottles, chairs, cups, shoes, and laptops. It comprises of 15,000 annotated video clips and 4 million images gathered from a geo-diverse sample covering ten nations in five continents. The data contains:

  • The video sequences
  • Manually annotated bounding boxes, in order to describe the position, orientation, and dimension, of each object
  • Augmented reality metadata such as camera poses and sparse point clouds is contained in the video clips.
  • Sequence Example format for videos and a shuffled version of annotated frames called the processed dataset for images
  • Scripts to run evaluation
  • Scripts that supports to load the data into deep learning libraries such as Tensorflow, PyTorch, and Jax to visualise the dataset.

 

 3D Object Detection Solution

 

Google additionally released a 3D object detection solution for four classes of objects — shoes, chairs, mugs, and cameras. These models are trained using the Objectron dataset. The models are released in MediaPipe. MediaPipe is an open-source cross-platform that provides customizable machine learning solutions for live and streaming media; it finds applications in human pose detection and tracking, hand tracking, iris tracking, 3D object detection, and face detection.

Earlier proposed single-stage model recently, the pose and physical size of an object were computed using a single RGB image. Some of the features of this model include:

  • An encoder-decoder architecture built on MobileNetv2
  • Using detection and regression prediction of object shape
  • For obtaining the final 3D coordinates for the bounding box, the model used a well-established pose estimation algorithm
  • The model can run real-time on mobile devices and is lightweight

The new 3D object detection model, though, utilises a two-stage architecture, a marked improvement from its archetype, mentioned above, that used a single-stage model. The primary stage in this model uses the TensorFlow Object Detection model to find the 2D crop of the object. The 2D crop is used to decide the 3D bounding box in the second stage. To avoid operating the object detection for each frame, the model also simultaneously decides the 2D crop for the next frame.

 

ObjectronDataset

 

3D Object Detection Solution Architecture

 

To assess the performance of the 3D detection models, a 3D intersection over union (IoU) closeness statistics was used. Likewise called the Jaccard index, IoU is used for checking the similarity and diversity of the sample sets; it is basically used for computer vision tasks that measure how close bounding boxes are to the ground truth.

Google has proposed an algorithm for estimating precise 3D IoU values for the bounding boxes. Using the Sutherland-Hodgman Polygon clipping algorithm, the group first computed the intersection points between the faces of the two boxes and further, the volume of this intersection is determined. At last, this calculated volume of the intersection and the union of two boxes is utilized for computing the IoU.

 

ObjectronDataset

 

Above figure shows that by using the polygon clipping algorithm compute the 3D intersection over union.

With open-sourcing the dataset and presenting the two-stage object detection model, Google wants to enable wider exploration in the fields of view synthesis, unsupervised learning, and improved 3D representation.

Leave a comment