Knowledge Management System Of Institute of optics and electronics, CAS
|Place of Conferral||成都|
|Keyword||目标检测 支持向量约简 特征压缩 区域推荐 卷积神经网络|
（1）针对目标识别框架中常用的分类模型—Kernel SVM提出一种模型优化算法。尽管Kernel SVM有着出色的泛化能力，但是随着支持向量数目的增多，决策开销也随之增加。因此，本文提出了一种支持向量约简算法，通过循环迭代的方式对原始Kernel SVM的支持向量集合进行约简并重构出其精简子集。实验结果表明，利用精简子集构成的SVM可以在减少决策开销的前提下达到与原始Kernel SVM接近的泛化能力。
（2）针对部分遮挡的目标检测问题，本文结合霍夫投票的思想，从提取目标的鲁棒特征表达和提升算法检测速度两个方面展开研究。在特征提取方面，提出一种基于空间信息的局部特征，并基于该特征构建了一个包含目标丰富外观信息的“特征词典”。该局部特征可视为一个二元信息组合<pf, lf>，它既包含了目标的局部外观信息pf，又包含了其对应的空间信息lf，在估计目标中心的过程中展现出较好的性能。此外，为了减少目标特征表达中的信息冗余，本文基于压缩感知（Compressive Sensing）理论提出了一种快速压缩算法，该算法仅通过一个大规模的随机矩阵就可以实现特征降维。最后，利用局部压缩特征结合集成学习算法AdaBoost构建分类模型，通过在多个数据集上的实验结果可知，本文提出的局部压缩特征对部分遮挡的目标展现出较高的检测精度。另外，本文提出的快速压缩算法在压缩速度和精度方面均展现出较高的性能。
（3）针对尺度和角度变换的目标检测问题，经典的检测框架通常采用基于滑动窗（Sliding Window）的方式在多个尺度空间进行匹配，这种方式虽然效果不错，但是速度较慢。本文提出一种基于候选区域推荐的快速检测框架，该框架的检测流程包含两个阶段：第一阶段的作用是粗检测，利用候选区域推荐算法在图像中快速定位出700个左右的候选区域，相比于滑动窗模型，这种方式减小了搜索空间；第二阶段的作用是精检测，即识别候选区域内的目标，本文结合随机森林和分块词包（Bag of Words）模型提出一种快速特征编码和汇聚的方法。文中检测框架的优势在于对两个阶段都进行加速—既减小了搜索空间又加速了特征编码。实验结果表明其在检测速度和检测精度方面均取得了较好的性能。
（4）本文针对小目标（例如航空影像中的小目标）检测中面临的检测精度低、定位困难等问题，提出一种基于级联卷积神经网络（Convolutional Neural Network，CNN）的检测框架。由于小目标尺寸较小且外观信息较少，单纯依靠经典的CNN很难实现精确定位。此外，利用传统的手工特征也很难提取目标有效的特征表达。文中首先设计了一个基于CNN的目标定位网络，该网络利用多个层次的深度特征图构建了一组不同尺度的层级特征图，通过遍历这组层级特征图实现对小尺度目标的精确定位。其次，文中训练了另外一个CNN用于特征提取和目标识别。最后，对两个网络级采用级联方式构成检测框架。通过两个公开数据集上的比较实验可知本文提出的级联模型在小目标检测中展现出较为明显的优势。
Object recognition and detection are the fundamental tasks in the field of computer vision and multimedia applications. The task of object recognition is to determine whether a certain type of object is existed in a given image, and the object detection not only needs to determine whether the image contains the object but also needs to determine its location. In the mission of optoelectronic imaging observation, object recognition and detection play a particularly important role; fast and accurate object recognition and detection can not only provide effective support for the object tracking tasks, but also provide powerful basis for the system’s decision-making. Although this field has achieved great improvement, it still faces many challenges in practical applications. For example, it is difficult to obtain the ideal recognition and detection precision under the influence of blur, deformation, partial occlusion, illumination change, clutter background, etc. In order to improve the recognition and detection performance, the common approach is to extract high dimensional features and design complex classification models, which will reduce the recognition and detection speed. With the growth of the image data and the improvement of the demand of system’s intelligence, developing the high-precision, real-time and well-adapted approaches have become a hot topic.
This dissertation focuses on the practical problems of object recognition and detection in different situations and combines with the methods of image processing, computer vision, machine learning and deep learning for further research. The main contents include: studying the efficiency of object recognition from the aspect of classification model optimization; studying the object detection problem of partially occluded objects, multi-scale objects, multi-orientation objects and small objects from the aspect of feature extraction and representation, efficient feature vector coding and polling, accurate positioning, etc. The main contributions are summarized as follow:
(1) A model optimization algorithm is proposed for Kernel SVM, a commonly used classification model in object recognition framework. Although the Kernel SVM has good generalization capability, as the number of support vectors increases, the decision-making cost increases as well. Therefore, we propose a support vector reduction method which reconstructs a simplified subset of original support vectors through the way of cyclic iterations. The experimental results show that the simplified SVM can achieve the generalization capability close to the original Kernel SVM with the reduction of decision-making cost.
(2) A method which follows the idea of Hough’s voting is proposed to deal with the object detection problem of partial occlusion. The considerations of the problem are from two aspects: robust feature extraction and the acceleration of object detection algorithm. From the aspect of feature extraction, we propose a local feature based on spatial information, and we employ this local feature to build a “feature dictionary” which contains rich appearance information of the object. The local feature is seen as a binary structure <pf, lf>, where pf is the appearance pattern of the object, and lf is the corresponding location of pf. This local feature is efficient to estimate the centroid of object. Moreover, in order to reduce the information redundancy of the feature vectors, a fast compression approach based on CS (Compressive Sensing) theory is proposed, which just employs a large random matrix to achieve the fast feature compression. Finally, the local compressed features are combined with AdaBoost to build the classification model. From the experimental results on several datasets, the proposed local compressed features show higher detection performance on the partial occluded object. In addition, the proposed fast compression algorithm also shows high performance in compression speed and precision.
(3) A method is proposed to deal with the detection problem of object with multi-orientation and multi-scale. To handle this problem, the traditional detection frameworks usually search from a multi-scale image space by a sliding window way, although they show the favorable detection precision, but slowly. We propose a fast detection framework based on the region proposal way; the detection process of the proposed framework includes two stages: the first stage is coarse detection, which employs fast region-proposal way to select about 700 candidate regions from testing image. Compared with the sliding window fashion, region-proposal way reduces the search space greatly. The second stage is fine detection, which recognizes the object from each candidate region. A fast feature vector coding and pooling approach which combines random forest with partitioned BoW(Bag of Words) model is proposed to build the feature representation of the candidate regions. The advantage of the proposed detection framework is accelerating the detection speed in both stages: reducing the search space and speeding up the feature vector coding and pooling. The experimental results show that the proposed method has favorable performance in detection speed and detection precision.
(4) A method based on cascaded CNNs (Convolutional Neural Networks) is proposed to deal with the problem of small object (eg. the object in aerial images) detection, such as low detection precision and inaccurate positioning. Due to the small size and the lack of appearance information, it is difficult to achieve accurate positioning by the classical CNN models. Moreover, using the traditional hand-craft feature is hard to describe the small objects. In this dissertation, we first design an object positioning network based on CNN model, which comprises the feature maps of multiple layers and scales. Traversing the hierarchy feature maps can yield better positioning accuracy for small objects. Secondly, another CNN model is trained for feature extraction and object recognition. Finally, we concatenate the two CNNs and build a detection framework. From the experimental results on two public datasets, the proposed detection framework shows its advantages in small object detection task.
|Subject Area||信号检测 ; 图象处理 ; 信息处理技术其他学科|
|钟剑丹. 光电成像目标识别与检测关键技术研究[D]. 成都. 电子科技大学,2018.|
|Files in This Item:|
|光电成像目标识别与检测关键技术研究v22（4796KB）||学位论文||开放获取||CC BY-NC-SA||Application Full Text|
|Recommend this item|
|Export to Endnote|
|Similar articles in Google Scholar|
|Similar articles in Baidu academic|
|Similar articles in Bing Scholar|
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.