IOE OpenIR  > 光电技术研究所博硕士论文
基于多尺度分支结构特征融合的目标检测研究
孙元辉
Subtype硕士
2019-05-23
Degree Grantor中国科学院大学
Place of Conferral中国科学院光电技术研究所
Degree Name工程硕士
Degree Discipline电子与通信工程
Keyword目标检测,特征融合,样本不均衡
Abstract

在计算机视觉领域,目标检测是最基础的任务之一,对于图像的分析与理解具有重要作用。论文以卷积神经网络网络为基础,使用深度学习的方式提取目标特征。最终完成对图像中特定目标的定位与分类任务,为计算机对图像的场景理解、态势感知提供支撑。

论文首先对传统的经典目标检测算法进行分析总结,并与以卷积神经网络为基础的深度学习方法进行了对比。分析发现经典目标检测算法在单一目标、简单场景下具有良好的效果,但对于多类目标、复杂场景的目标检测往往难以实现。而卷积网络类似人类认知目标的过程,以学习的方式提取出目标的语义特征。学习得到的网络特征具备一般性与普适性,对于复杂场景下多类目标的识别具有巨大的优势。使用卷积神经网络进行目标检测依然存在许多问题。首先,网络层数的加深对于更高层语义特征的提取有所帮助,但更深的卷积网络使目标的位置信息愈发模糊,从而导致定位精度的下降。其次,更深层的卷积网络、全连接网络需要巨大的计算量,如何在保持检测精度的同时提升检测速度对于算法性能同样至关重要。

针对以上问题,论文对基于深度学习的目标检测方法展开研究。从目标的定位方法、分类方法分析了各种方案的优劣,最终使用了从卷积层直接回归出目标坐标及类别的端到端的检测方式作为检测网络的基础形式。端到端的检测网络具有运算速度高的特点,但存在特征提取不充分、正负样本不均衡等问题。本文针对这些问题展开了以下研究。

首先针对特征提取不充分问题,本文利用多尺度分支结构特征融合实现了一种高速的特征提取模块,在模块内通过连接的方式实现了对于多尺度感受野的特征融合;通过引入矩形卷积核增强了网络的特征表示;通过调整卷积网络的结构,对于多层的网络特征进行了再次融合。多尺度感受野、矩形卷积核、多层特征融合的方法有效的改善了网络特征提取不充分的问题,并且在实现特征融合时大量采用并行卷积结构,提升了检测精度的同时保证了算法的实时性。

针对正负样本不均衡的问题,文章考虑卷积网络的基本原理,从不同候选区域设置、损失函数的调整以及预测过程中非极大值抑制算法的改进三个方面着手,分别分析和验证了改进后的网络性能变换,选取出合适的参数设置与损失函数。三个改进缓解了端到端网络正负样本不均衡的问题,使网络的召回率得到了提升。

卷积网络的训练对于算法最终的效果也有巨大的影响。针对神经网络的训练,本文尝试了多种方法提升训练效果,包括对网络的预训练和迁移学习、训练数据集的增广(反转、尺度变换、灰度变换、图像混合)和多GPU下训练参数的调整等。

本文在公开数据集上进行了训练和测试,并与相关的基于深度学习的目标检测算法进行了对比。结果显示本算法的检测精度与检测速度的综合性能上具备一定优势,同时也分析了算法的不足与继续改进的方向。

Other Abstract

In the field of computer vision, target detection is one of the most basic tasks, and plays an important role in image analysis and understanding. Based on the convolutional neural network, the paper uses deep learning to extract the target features. Finally, the task of locating and classifying specific targets in the image is completed, which provides support for the computer to understand the scene and situational awareness of the image.

The paper first analyzes and summarizes the traditional classic target detection algorithms and compares them with the deep learning method based on CNN network. The analysis finds that the classical target detection algorithm has good effects in a single target and simple scene, but it is often difficult to achieve target detection for multiple types of targets and complex scenes. The convolutional network is similar to the process of human cognitive goals, and the semantic features of the target are extracted in a learning way. The learned network features are general and universal, and have great advantages for the recognition of multiple types of targets in complex scenarios. There are still many problems with using convolutional neural networks for target detection. First, the deepening of the number of network layers is helpful for the extraction of higher-level semantic features, but the deeper convolution network makes the location information of the target more and more blurred, resulting in a decrease in positioning accuracy. Secondly, deep convolutional networks and fully connected networks require a huge amount of computation. How to improve the detection speed while maintaining the detection accuracy is also crucial for the performance of the algorithm.

In view of the above problems, the paper studies the target detection method based on deep learning. From the target location method and classification method, the advantages and disadvantages of various schemes are analyzed. Finally, the end-to-end detection method that directly returns the target coordinates and categories from the convolutional layer is used as the basic form of the detection network. The end-to-end detection network has the characteristics of high computing speed, but there are problems such as insufficient feature extraction and unbalanced positive and negative samples. This article has conducted the following research on these issues.

Firstly, aiming at the problem of insufficient feature extraction, this paper uses a feature fusion of multi-scale branch structure to realize a high-speed feature extraction module, and realizes the feature fusion for multi-scale receptive field through connection in the module. The feature representation of the network is enhanced by introducing a rectangular convolution kernel. By adjusting the structure of the convolutional network, the multi-layer network features are re-converged. The multi-scale receptive field, rectangular convolution kernel and multi-layer feature fusion method effectively improve the problem of insufficient network feature extraction, and use parallel convolution structure in the feature fusion to improve the detection accuracy while ensuring real time of the algorithm.

Aiming at the problem of unbalanced positive and negative samples, the paper considers the basic principles of convolutional networks, and analyzes and verifies them from three aspects: different candidate region settings, adjustment of loss function and improvement of non-maximum suppression algorithm in prediction process. The improved network performance transformation, select the appropriate parameter settings and loss function. The three improvements alleviated the problem of unbalanced positive and negative samples in the end-to-end network, which improved the network recall rate.

The training of the convolutional network also has a huge impact on the final effect of the algorithm. In order to train the neural network, this paper attempts a variety of methods to improve the training effect, including pre-training and migration learning for the network, augmentation of the training data set (inversion, scale transformation, grayscale transformation, image blending) and multi-GPU Adjustment of training parameters, etc.

This paper has been trained and tested on public datasets and compared with related deep learning-based target detection algorithms. The results show that the algorithm has certain advantages in the comprehensive performance of detection accuracy and detection speed, and also analyzes the shortcomings of the algorithm and the direction of continuous improvement.

Subject Area图象处理
MOST Discipline Catalogue工学
Language中文
Document Type学位论文
Identifierhttp://ir.ioe.ac.cn/handle/181551/9078
Collection光电技术研究所博硕士论文
Recommended Citation
GB/T 7714
孙元辉. 基于多尺度分支结构特征融合的目标检测研究[D]. 中国科学院光电技术研究所. 中国科学院大学,2019.
Files in This Item:
File Name/Size DocType Version Access License
基于多尺度分支结构特征融合的目标检测研究(3979KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[孙元辉]'s Articles
Baidu academic
Similar articles in Baidu academic
[孙元辉]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[孙元辉]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.