Summerize YOLOv8 paper

Introduced in 2023 by Ultralytics , YOLOv8, a family of You Only Look once and state-of-arts objects detection model . A real-time flying objects detection that can be used for transfer learning and further research. The model was trained on 40 different classes of flying objects, forcing the model to learn more abstract features . Transfer learning then applied on more data set with real world envirements such as rotation, small objects and so on .

Detecting Flying objects remains challenging due to variances in object size, speed of the inference and changing the location quickly. By solving this issues, we will have a SOTA objects detection model. Researchers and engineers can fin-tuning the model on their use cases and produce a model that is capable of detecting objects accurately regarding their variances .

The model first was trained on data set comprised of 15,064 images with various flying objects : drones, bird, p-airline and c-helicopter . The second times, transfer learning is applied to refine the model on different data set with flying objects at noticeably further distance then the first one .

YOLOv8 follow the same architecture as its predecessors, but it introduced numerous improvement such as the introducing of a new neural network that combines Feature Pyramid Network and Path Aggregation Network and auto labeling tools .

FPN works by gradually reducing the input image while increasing the number of feature channels .

In the other hand we have PAN that combines the features from a different levels (that was produced by FPN) . With this architecture, we have a model that is able to capture features at multiple scales (see the figure below) .

PAN and FPN architecture

YOLOv8 architecture is from You Only Look Once: Unified, Real-Time Object Detection which I have summarize here .

To refine the detection results, YOLOv8 uses Soft-NMS which is a variance of NMS (Non-Max Suppression) .

For data augmentation, Ultralytics incoperate Mosaic data augmentation on the training data to enhance the model’s ability to recognize objects on various contexts . In our case, we take four random images from the training data set and combine them into a single mosaic image .

To learn more about how YOLO works behind the scenes, I have already explain it here -> https://medium.com/@otmanheddouchai/summary-of-you-only-look-once-unified-real-time-object-detection-yolov1-70fb0fafaea1?source=user_profile_page---------0-------------e7d361ca183e----------------------

Table of Contents