Top 6 Classic Open-Source Computer Vision Projects.
Computer vision is the most impressive and compelling type of AI and is fundamentally a field of study that has focused on tackling the problem of computers’ vision.
Computer vision is a sub field of AI. It helps in training the computer in understanding the visual world with the assistance of deep learning models to easily identify objects and then reacts accordingly.
One of the most ideal approaches to learn computer vision is by working on computer vision projects. So, in this article, we have shared our top computer vision project ideas.
1. Image Classification
Image classification is a fundamental assignment in computer vision. Here, the objective is to classify an image by assigning a particular label to it.
Here are two of the most noticeable open-source projects for image classification:
The CIFAR-10 dataset is used to train machine learning and computer vision algorithms which consists of collection of images. It is the most well known datasets for machine learning. It contains 60,000, 32×32 colour images in 10 unique classes representing airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks.
The ImageNet dataset is an enormous visual database for use in computer vision research. Above 14 million images have been hand-annotated by the project to indicate what objects are pictured and bounding boxes are also given for at least one million of the images. ImageNet contains 20,000 different categories.
As a beginner, you can learn with a neural network from scratch using Keras or PyTorch. For better results and to increase the level of learning use pre-trained models like VGG-16, Restnet- 50, Googlenet, etc.
2. Face Recognition
Face recognition is used for security, surveillance, or in unlocking your devices. It’s main goal is to identify the faces in an image or video against a pre-existing database.
It is a multi-stage process with the following steps:
- Face Detection: It is the initial step and involves finding one or more faces available in the input image or video.
- Face Alignment: Alignment is normalizing the input faces to be geometrically reliable with the database.
- Feature Extraction: Later, extract the features that are used in the recognition task.
- Feature recognition: Matching the input features to the database.
Below are open-source datasets that gives you good experience to face recognition-
MegaFace is a large-scale public face recognition training dataset that used for commercial face recognition problems. It has 4,753,320 faces of 672,057 identities
Labeled faces in wild home:
Labeled Faces in the Wild (LFW) is a database of face photographs intended for studying unconstrained face recognition problem. It includes 13,233 images of 5,749 people that were detected and collected from the web. Likewise, 1,680 of the people pictured have two or more distinct photos in the dataset.
You can utilise pre-trained models like Facenet which is a deep learning model that offers unified embeddings for face recognition, verification, and clustering task. The network maps each face image in euclidean space with the end goal that the distance between similar images is less.
You can easily work on pre-trained Facenet models present in Keras and PyTorch to create your own face recognition system.
3. Scene Text Detection
Here the text that appears on the images is captured by a camera in an outside environment like number plates of cars on roads, billboards on the roadside, etc.
The text in scene images differs in shape, font, color, and position. The complication in recognition of scene text further increases by non-uniform enlightenment and core focus.
Below are the popular datasets which help to enrich your skills in analyzing Scene Text Detection:
The Street View House Numbers (SVHN) dataset is one of the most popular open-source datasets available. It has been utilized in neural networks made by Google to read house numbers and match them to their geolocations. This is a great benchmark dataset to play with, learn and train models that precisely identify street numbers. This dataset consists of 600k labeled real-world images of house numbers taken from Google Street View.
Scene Text Dataset:
The scene text dataset includes 3000 images caught in various environments, including outdoors and indoors scenes under different lighting conditions. Images were captured either by the use of a high-resolution advanced digital camera or a low-resolution mobile phone camera. Also, all images have been resized to 640×480.
4. Object Detection with DETR
Object detection main objective is to predict each object of interest present in the image by creating a bounding box along with proper labels on them.
A couple of months back, Facebook publicly releases its object detection framework- DEtectionTRansformer (DETR). DETR is an innovative and efficient solution to object detection problems. By viewing object detection as a direct set prediction problem, it streamlines the training pipeline. Further,depending on trans-formers it adopts an encoder-decoder architecture.
Following are open-sourced datasets for object detection:
Open Image dataset consists of ~9M images annotated with image-level labels, object bounding boxes, object segmentation masks, visual relationships, and localized narratives. In this the dataset is split into a training set (9,011,219 images), a validation set (41,620 images), and a test set (125,436 images).
MS-COCO is a large scale dataset used for object detection problems. It consists of 330K images with 80 object categories having 5 captions per image and 250,000 people with key points.
5. Semantic Segmentation
Semantic segmentation comes into the picture when we talk of complete scene understanding in computer vision technology. It is the task of classifying all the pixels in an image into relevant classes of the objects.
Following is the list of open-source datasets for this topic:
This database is one of the first semantically segmented datasets to be released. This is frequently used in semantic segmentation research. This dataset contains:
- 367 training pairs
- 101 validation pairs
- 233 test pairs
This is one of the most popular datasets available for semantic segmentation tasks. It includes 2975 training images files and 500 validation image files each of 256×512 pixels.
6. Image Captioning
Image captioning is a combined task of computer vision and natural language processing (NLP) and it generates textual description for an image.
Computer vision technique help in understanding and extracting the feature from the input images. Next, NLP creates textual description in images in the correct order of words.
The following are few useful datasets to work with image captioning:
COCO is object detection, segmentation, and captioning dataset which consists of 330K images were more than >200K are labeled with 1.5 million object instances and 80 object categories given 5 captions per image.
Flicker 8k dataset:
It is an image caption corpus comprises of 158,915 crowd-sourced captions describing 31,783 images. The new images and captions focus on people doing regular activities and occasions.