Two Tips to Give A Performance Boost to Real-Time Vision-Based AI Applications

With recent advances in Computer Vision making it one of the most important fields in current Artificial Intelligence research, optimal performance has become an important necessity for any application leveraging vision-based AI. Problem statements such as Face Detection, Object Classification, Facial Recognition and Emotion Recognition apply several data science analysis techniques to every pixel within their Region of Interest (ROI), and hence that has the potential to slow down the application’s performance in real-time. That calls for programmatic solutions to optimize the code being written, so as to make the program as light as possible during runtime. Here are two vision-specific programming techniques that can help address this issue:

Image result for facial recognition

Real-time vision-based tasks such as Facial Recognition or Emotion Recognition can suffer from slow real-time performance, hence the need for programmatic techniques for a performance boost.

1. Skipping Video Frames for Analysis

The presence of multiple frames per second in video feeds means it’s feasible to skip a few frames each cycle to help lighten the processing requirements in real time

Most standard camera equipments aren’t designed to capture more than 15-30 Frames Per Second (FPS) in realtime, even though they may possess a listed theoretical limit higher than this figure. However, the human eye usually cannot even distinguish between the performance of 15 FPS and 30 FPS. That means, for all intents and purposes, it is unnecessary to repeat the same frame-by-frame analysis on every one of the 15 frames in a one-second window. It is quite feasible to set a specified number of frames to skip, before actually performing the data science techniques necessary on each frame of a live video feed. This will avoid the latency that can affect real-time performance on every frame, and will give the appearance of a smoother, faster-functioning computer vision application, for tasks such as Facial Recognition and Object Detection.

2. Resizing (Diminishing) Frames for Analysis

Image result for reducing image size

Reducing image size decreases the pixel count and hence could help improve the performance of a real-time vision-based AI application

Another important factor affecting the performance of vision-based applications is the pixel size of the frame (image) being analyzed. Because of the pixel-by-pixel nature of the analyses being performed, for obvious reasons, the larger the image of each frame, the slower the performance (and the effective FPS) of the application. That means, within the limits of maintaining image quality, it makes sense to resize and diminish images before the analysis can begin, so as to shorten the time it takes to complete the analysis of each frame in the video feed.

This resizing step is as simple as one line of code in Python to diminish the image before it is analyzed, but can make a huge difference to the application’s performance all things considered.

# Resizing the image to a quarter of its original size
small_frame = cv2.resize(frame, (0,0), fx=0.25, fy=0.25); 

These two techniques alone can go a long way towards increasing the real-time performance of any computer vision application that needs to function in real-time. This will ensure quick feedback loops to check and improve the application’s results, and will also make for a better demonstration of the data science algorithm’s capabilities on vision-based tasks.