IOE OpenIR  > 光电技术研究所博硕士论文
光电成像目标识别与检测关键技术研究
钟剑丹1,2
学位类型博士
导师吴钦章
2018-05-28
学位授予单位电子科技大学
学位授予地点成都
关键词目标检测 支持向量约简 特征压缩 区域推荐 卷积神经网络
摘要

目标识别与检测一直是计算机视觉及多媒体应用领域中的基础工作。目标识别的任务是在一幅给定的观测图像中确定是否包含某类目标,而目标检测除了需要确定图像中是否包含目标之外还需要确定其位置。在光电成像观测任务中,目标识别和检测扮演着尤为重要的角色,对目标快速而准确的识别与检测不仅能够为跟踪任务提供有效的保障,而且也能为系统的决策判断提供有力的依据。尽管该领域目前已经取得了很大的进步,但是在实际应用场景中仍面临着许多挑战。如目标在模糊、形变、部分遮挡、光照变化、背景干扰等因素的影响下,很难获得理想的辨识精度。为了提高识别和检测的性能,通常的做法是提取高维的特征并设计复杂的模型,这样又会降低识别与检测的速度。随着图像数据井喷式的增长以及对系统智能化需求的提升,研究精确性、实时性较高以及适应性较强的目标识别与检测算法已成为当前的热点话题。

本文围绕不同场景中目标识别和检测面临的实际问题,结合图像处理、计算机视觉、机器学习和深度学习等算法进行了深入研究。具体研究内容包括:从分类模型优化的角度出发研究目标识别的效率问题,从目标特征提取与表达、目标特征高效编码与汇聚、提升定位精度等角度出发研究部分遮挡目标、尺度与角度变化目标、小目标的检测问题。主要贡献如下:

(1)针对目标识别框架中常用的分类模型—Kernel SVM提出一种模型优化算法。尽管Kernel SVM有着出色的泛化能力,但是随着支持向量数目的增多,决策开销也随之增加。因此,本文提出了一种支持向量约简算法,通过循环迭代的方式对原始Kernel SVM的支持向量集合进行约简并重构出其精简子集。实验结果表明,利用精简子集构成的SVM可以在减少决策开销的前提下达到与原始Kernel SVM接近的泛化能力。

(2)针对部分遮挡的目标检测问题,本文结合霍夫投票的思想,从提取目标的鲁棒特征表达和提升算法检测速度两个方面展开研究。在特征提取方面,提出一种基于空间信息的局部特征,并基于该特征构建了一个包含目标丰富外观信息的“特征词典”。该局部特征可视为一个二元信息组合<pf, lf>,它既包含了目标的局部外观信息pf,又包含了其对应的空间信息lf,在估计目标中心的过程中展现出较好的性能。此外,为了减少目标特征表达中的信息冗余,本文基于压缩感知(Compressive Sensing)理论提出了一种快速压缩算法,该算法仅通过一个大规模的随机矩阵就可以实现特征降维。最后,利用局部压缩特征结合集成学习算法AdaBoost构建分类模型,通过在多个数据集上的实验结果可知,本文提出的局部压缩特征对部分遮挡的目标展现出较高的检测精度。另外,本文提出的快速压缩算法在压缩速度和精度方面均展现出较高的性能。

(3)针对尺度和角度变换的目标检测问题,经典的检测框架通常采用基于滑动窗(Sliding Window)的方式在多个尺度空间进行匹配,这种方式虽然效果不错,但是速度较慢。本文提出一种基于候选区域推荐的快速检测框架,该框架的检测流程包含两个阶段:第一阶段的作用是粗检测,利用候选区域推荐算法在图像中快速定位出700个左右的候选区域,相比于滑动窗模型,这种方式减小了搜索空间;第二阶段的作用是精检测,即识别候选区域内的目标,本文结合随机森林和分块词包(Bag of Words)模型提出一种快速特征编码和汇聚的方法。文中检测框架的优势在于对两个阶段都进行加速—既减小了搜索空间又加速了特征编码。实验结果表明其在检测速度和检测精度方面均取得了较好的性能。

(4)本文针对小目标(例如航空影像中的小目标)检测中面临的检测精度低、定位困难等问题,提出一种基于级联卷积神经网络(Convolutional Neural Network,CNN)的检测框架。由于小目标尺寸较小且外观信息较少,单纯依靠经典的CNN很难实现精确定位。此外,利用传统的手工特征也很难提取目标有效的特征表达。文中首先设计了一个基于CNN的目标定位网络,该网络利用多个层次的深度特征图构建了一组不同尺度的层级特征图,通过遍历这组层级特征图实现对小尺度目标的精确定位。其次,文中训练了另外一个CNN用于特征提取和目标识别。最后,对两个网络级采用级联方式构成检测框架。通过两个公开数据集上的比较实验可知本文提出的级联模型在小目标检测中展现出较为明显的优势。

其他摘要

Object recognition and detection are the fundamental tasks in the field of computer vision and multimedia applications. The task of object recognition is to determine whether a certain type of object is existed in a given image, and the object detection not only needs to determine whether the image contains the object but also needs to determine its location. In the mission of optoelectronic imaging observation, object recognition and detection play a particularly important role; fast and accurate object recognition and detection can not only provide effective support for the object tracking tasks, but also provide powerful basis for the system’s decision-making. Although this field has achieved great improvement, it still faces many challenges in practical applications. For example, it is difficult to obtain the ideal recognition and detection precision under the influence of blur, deformation, partial occlusion, illumination change, clutter background, etc. In order to improve the recognition and detection performance, the common approach is to extract high dimensional features and design complex classification models, which will reduce the recognition and detection speed. With the growth of the image data and the improvement of the demand of system’s intelligence, developing the high-precision, real-time and well-adapted approaches have become a hot topic.

This dissertation focuses on the practical problems of object recognition and detection in different situations and combines with the methods of image processing, computer vision, machine learning and deep learning for further research. The main contents include: studying the efficiency of object recognition from the aspect of classification model optimization; studying the object detection problem of partially occluded objects, multi-scale objects, multi-orientation objects and small objects from the aspect of feature extraction and representation, efficient feature vector coding and polling, accurate positioning, etc. The main contributions are summarized as follow:

 (1) A model optimization algorithm is proposed for Kernel SVM, a commonly used classification model in object recognition framework. Although the Kernel SVM has good generalization capability, as the number of support vectors increases, the decision-making cost increases as well. Therefore, we propose a support vector reduction method which reconstructs a simplified subset of original support vectors through the way of cyclic iterations. The experimental results show that the simplified SVM can achieve the generalization capability close to the original Kernel SVM with the reduction of decision-making cost.

 (2) A method which follows the idea of Hough’s voting is proposed to deal with the object detection problem of partial occlusion. The considerations of the problem are from two aspects: robust feature extraction and the acceleration of object detection algorithm. From the aspect of feature extraction, we propose a local feature based on spatial information, and we employ this local feature to build a “feature dictionary” which contains rich appearance information of the object. The local feature is seen as a binary structure <pf, lf>, where pf is the appearance pattern of the object, and lf is the corresponding location of pf. This local feature is efficient to estimate the centroid of object. Moreover, in order to reduce the information redundancy of the feature vectors, a fast compression approach based on CS (Compressive Sensing) theory is proposed, which just employs a large random matrix to achieve the fast feature compression. Finally, the local compressed features are combined with AdaBoost to build the classification model. From the experimental results on several datasets, the proposed local compressed features show higher detection performance on the partial occluded object. In addition, the proposed fast compression algorithm also shows high performance in compression speed and precision.

(3) A method is proposed to deal with the detection problem of object with multi-orientation and multi-scale. To handle this problem, the traditional detection frameworks usually search from a multi-scale image space by a sliding window way, although they show the favorable detection precision, but slowly. We propose a fast detection framework based on the region proposal way; the detection process of the proposed framework includes two stages: the first stage is coarse detection, which employs fast region-proposal way to select about 700 candidate regions from testing image. Compared with the sliding window fashion, region-proposal way reduces the search space greatly. The second stage is fine detection, which recognizes the object from each candidate region. A fast feature vector coding and pooling approach which combines random forest with partitioned BoW(Bag of Words) model is proposed to build the feature representation of the candidate regions. The advantage of the proposed detection framework is accelerating the detection speed in both stages: reducing the search space and speeding up the feature vector coding and pooling. The experimental results show that the proposed method has favorable performance in detection speed and detection precision.

 (4) A method based on cascaded CNNs (Convolutional Neural Networks) is proposed to deal with the problem of small object (eg. the object in aerial images) detection, such as low detection precision and inaccurate positioning. Due to the small size and the lack of appearance information, it is difficult to achieve accurate positioning by the classical CNN models. Moreover, using the traditional hand-craft feature is hard to describe the small objects. In this dissertation, we first design an object positioning network based on CNN model, which comprises the feature maps of multiple layers and scales. Traversing the hierarchy feature maps can yield better positioning accuracy for small objects. Secondly, another CNN model is trained for feature extraction and object recognition. Finally, we concatenate the two CNNs and build a detection framework. From the experimental results on two public datasets, the proposed detection framework shows its advantages in small object detection task.

学科领域信号检测 ; 图象处理 ; 信息处理技术其他学科
文献类型学位论文
条目标识符http://ir.ioe.ac.cn/handle/181551/8589
专题光电技术研究所博硕士论文
作者单位1.电子科技大学
2.中国科学院光电技术研究所
推荐引用方式
GB/T 7714
钟剑丹. 光电成像目标识别与检测关键技术研究[D]. 成都. 电子科技大学,2018.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
光电成像目标识别与检测关键技术研究v22(4796KB)学位论文 开放获取CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[钟剑丹]的文章
百度学术
百度学术中相似的文章
[钟剑丹]的文章
必应学术
必应学术中相似的文章
[钟剑丹]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。