Annotation: Is it a hurdle in implementing Computer Vision technology?

Jonathan Swift, the author of Gulliver’s Travels, once said, “Vision is the art of seeing what is invisible to others”. His words are true even today and are applicable to computer vision technology. In order to develop computer vision, neural networks are built. Neural networks allow us to build AI object recognition systems that work better than human. Effectively, machines can ‘see’ images and objects in large datasets that were lost in large amounts of data.
The area of applications for modern image recognition systems is extremely wide. Image recognition systems can be used for factory automation. For instance, computer vision algorithms when employed a cookie factory can help adjust oven temperature based on the appearance of baked cookies. These systems can also be used for monitoring various activity points such as toll booths or mall entrances. In heavy industries such as oil and gas, computer vision is used for obtaining real-time information for safe and efficient drilling.
The area of applications for modern image recognition systems is extremely wide
In the last five years, due to its speed and accuracy, computer vision has become a popular technology. Algorithms are becoming mature and today we have several options such as YOLO, UNet, Mask-RCNN, Faster-RCNN, DeepLab. Some algorithms such as Mask-RCNN help us perform complex tasks such as identifying a specific pattern in a complex image. Mask-RCNN could identify people vs. cars from a photo of a busy street. Some algorithms e.g., YOLO or You-Only-Look-Once are powerful enough to do real-time detection of objects.
In order to make a neural network work reliably, one needs to train it with a big number of annotated images. Let’s see an example – At present, human eyes need to inspect CCTV footage to detect smokers in non-smoking areas. Computer vision can be used for detecting individuals smoking in the areas where smoking is prohibited. These algorithms can use video as input to output statistics or real time alarming. This process of annotating images is manual and labor-intensive. It can take about one hundred manhours to label data collected in an hour. Furthermore, the quality of annotation is of utmost importance. In short, the annotation is a crucial step that eventually determines the accuracy of a neural network.

Companies can leverage computer vision right now. In order to use the technology, one needs to “attach” trained neural network module to existing CCTV or another kind of camera and get reports with required details (E.g., identifying how many SUVs entered through a toll booth). Often times IT departments do want to outsource these routine tasks. In such scenarios, external AI consulting agencies can offer solutions.
Business leaders focused on excellence know that investing in computer vision is the need of the hour. Using computer vision, we can now automate processes that are tedious for human eyes. If trained well, computers can recognize everything from images or video streams. However, some believe that the process of training computers is time-consuming and expensive. Hence, companies are not willing to invest in computer vision.
In order to make a neural network work, one needs to train it with a big number of annotated images
The process of training machines with pre-annotated images need not be a hurdle to advancement. BitRefine group offers annotation services for developing computer vision suited for your organization. We have skilled professionals and proper tools to prepare annotations for clients. Although Bitrefine does manual annotation, it also uses modern tools to speed up the process. These tools suggest the shape of objects automatically so that the operator can quickly add corrections and proper tags. This helps client organizations to complete projects within their allocated time and budget.

October 02, 2018