YOLO review for beginner from beginner

August 13, 2020

YOLO review for beginner from beginner

Yolo is one of the popular object detection models. The first version of Yolo came on the stage of May 16, 2016. And for this time, there is Yolo v5 on git-hub. It may look not fancy and outdated. But to understand the new versions of Yolo, it is necessary to know the first Yolo! Let us review the Yolo paper briefly and talk about the core features of Yolo.

Introduction

Yolo tries a different way to detect objects. Yolo frames the object detection problem as a regression problem, not a classification problem prior algorithms considered. This paper gives an example using R-CNN. Because of using the region proposal method, it says R-CNN is too complicated that makes it slow and not easy to optimize. But Yolo reframe object detection problem as a single regression problem, which means Yolo only looks once an image to make predictions. In other words, It is one stage detection.

Yolo is fast. That is what the paper talks about mainly. A more attractive thing than the existing algorithms is Yolo can get contextual information. Sliding window looks only windows and R-CNN looks only proposed region. But Yolo looks only once the whole image so it understands the context of the image. It means Yolo has the potential for development.

Grid cell

Yolo divides the input images into the (S, S) grid. If you choose the number 3 as the S, an image would be with (3, 3) grid that has 9 cells. If the center of the object falls into a grid cell, that grid cell is responsible for detecting the object. Each cell predicts bounding boxes and confidence scores for those boxes. These confidence scores reflect how confident the model is that the box contains the object. The confidence scores are defined by IOU between the predicted box and any ground truth box.

https://heartbeat.fritz.ai/gentle-guide-on-how-yolo-object-localization-works-with-keras-part-2-65fe59ac12d

In this picture, there are (3, 3) grid and two objects with a red line border. The green dot is indicating the center of the white truck and the orange dot is indicating the center of the black car. As you can see, the orange dot falls into the center cell of the grid. So the center cell is responsible for detecting the white truck object in this image. On the right side of the center, the orange dot falls into there so that cell is responsible for detecting the black car object of the image. Ok then, do you understand what it means that the cell is responsible? It means the cell should be confident that the center of the object is in it. In other words, the cell should have a high confidence score for detecting the object.

Yolo shape

https://www.kaggle.com/mattbast/object-detection-tensorflow-end-to-end

Each dataset has a different form of bounding boxes. The image above shows that well about that . Yolo also has its own way to represent the notation of bounding boxes. Since it is a prime concept that whether the center of the object falls into a cell, Yolo should know the coordinates of the center point on an image. When you want to use Yolo with the VOC data set. You have to adjust the form of dataset into the form of Yolo shape.

loss function

We are almost there. It is critical to understand how the loss function works because it is like an indicator that tells the model how much difference between the prediction and the ground-truth. With the loss value, the model learns and predicts a better answer.

https://pylessons.com/YOLOv3-TF2-mnist/

Yolo uses a sum-squared error for loss function because it is easy to optimize, however it is not the best way to get maximum average precision. As you can see, in the equation, some characters that make it looks complicated. So let us find what those mean first.

https://www.researchgate.net/figure/Flowchart-of-YOLO-object-detection-To-improve-the-YOLO-prediction-accuracy-Redmon-et_fig1_329409946

If you look into images of the dataset, there are not many objects in an image. Nevertheless, the model tries to make many predictions on an image. The predictions are calculated to the loss to do backpropagation. The thing is, as you can see above the picture, the predictions which indicate background are more than the predictions indicate object. If we equally weight those errors, the loss value that reflects there is no-object would be too large. It would cause a lower confidence score. To prevent it, Yolo uses λcoord and λnoobj to make a balance. In the paper λcoord is 5, λnoobj is 0.5.

Unfortunately, I don't know how to write and read this letter. So let me call it responsible because the paper calls it like that. Ok, let's talk about that. responsible_ij denotes that jth bounding box predictor in cell i is responsible for detecting an object. Let us look at the above image about the white truck and black car. As forementioned, the middle of the cell and the cell on the right side of the middle have the center dot of the object. We can say these cells are responsible for detecting the object. This is what the responsible_ij tells you in the equation. Then, why do we need it? The loss function only penalizes classification error if an object is present in that grid cell. It also only penalizes bounding box coordinate error it that cell is responsible for detecting an object. not_responsible_ij is opposite to responsible_ij. It denotes that jth bounding box predictor in cell i is not responsible for detecting an object. It is used to penalize the model when it indicates a wrong location.

Everything is going well without the squared root things. But that one is just simple. Let me give an easy example. Is it significant the 10 inches difference for someone who wants to be a basketball player? Of course, it is. Then, Is it matter the 10 inches difference for choosing the way home? Not at all. The Paper says a smaller object needs to be more weighted than the bigger object for the error of width and height. Since every values are normalized 0 to 1, square root makes the real number between 0 and 1 bigger. Some lines of code would be better than my words.

conclusion

Yolo is getting more powerful and faster. While I reviewed the origin paper of Yolo, There are high versions of Yolo. I believe that studying the first version of the art would be helpful when you learn the others. While, in the paper, there are many contents that I didn't address in this post, I hope my review would help you out reading the paper.

Search This Blog

Deep learning and Linear algebra