热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

论文笔记1.3——JointFaceDetectionandAlignmentusingMultitaskCascadedConvolutionalNetworks

三、EXPERIMENTSInthissection,wefirstevaluatetheeffectivenessoftheproposedhardsampleminingstr

三、EXPERIMENTS


In this section, we first evaluate the effectiveness of the

proposed hard sample mining strategy. Then we compare our

face detector and alignment against the state-of-the-art methods

in Face Detection Data Set and Benchmark (FDDB) [25],

WIDER FACE [24], and Annotated Facial Landmarks in the

Wild (AFLW) benchmark [8]. FDDB dataset contains the an

notations for 5,171 faces in a set of 2,845 images. WIDER

FACE dataset consists of 393,703 labeled face bounding boxes

in 32,203 images where 50% of them for testing into three

subsets according to the difficulty of images, 40% for training

and the remaining for validation. AFLW contains the facial

landmarks annotations for 24,386 faces and we use the same

test subset as [22]. Finally, we evaluate the computational ef

ficiency of our face detector.


在这一部分,我们首先对提出的困难样本采集策略的效率进行评价。之后把我们的人脸检测器和关键点对准与当前最好的方法(Face Detection Data Set and Benchmak FDDB)[25],和WIDER FACE 【24】相比较,以及Annotated Facial Landmarks in the Wild (AFLW) benchmark[8] 。FDDB 数据集包含5171张人脸标注 ,2845张图片。 WIDER FACE 数据集包含393703个标记好的人脸边界框,32203张图片。把他们之中50%分成三个识别困难度的组别进行测试,40%用来训练、和保留下来用来核实,AFLW包含面部标注的24386张人脸,我们使用和【22】相同的方法测试,最后评估我们的人脸检测器的计算效能。


A. Training Data
Since we jointly perform face detection and alignment, here
we use four different kinds of data annotation in our training
process: (i) Negatives: Regions that the Intersec
tion-over-Union (IoU) ratio less than 0.3 to any ground-truth
faces; (ii) Positives: IoU above 0.65 to a ground truth face; (iii)
Part faces: IoU between 0.4 and 0.65 to a ground truth face; and
(iv) Landmark faces: faces labeled 5 landmarks’ positions.
Negatives and positives are used for face classification tasks,
positives and part faces are used for bounding box regression,
and landmark faces are used for facial landmark localization.
The training data for each network is described as follows:
A. Training Data
因为我们想要把人脸识别和关键点校准结合起来处理,所以我们就在训练过程中使用四种不同的数据标注:(i)负面的:交并比(IOU)(模型产生的目标窗口和原来标记窗口的交叠率。具体我们可以简单的理解为: 即检测结果(DetectionResult)与 Ground Truth 的交集比上它们的并集。)比例对于任何标记数据在0.3以下。(ii)正面的: IoU 比例在0.65以上  (iii)部分人脸:IoU比例在0.4-0.65 (iv)标记人脸:标记好了五个关键部位的位置。   正反面的数据被用来训练人脸分类任务,正面的和部分人脸的两组用来训练边界框回归。标记好的人脸用来训练面部关键部位的定位。
每个网络的训练数据集被描述成如下形式:
1) P-Net: We randomly crop several patches from WIDER
FACE [24] to collect positives, negatives and part face. Then,
we crop faces from CelebA [23] as landmark faces
把WIDER FACE 中的图片随机裁剪成小块来收集正面的,反面的,和部分脸的数据。然后把CelebA【23】中裁剪的脸部图片作为第四组数据。
2) R-Net: We use first stage of our framework to detect faces 
from WIDER FACE [24] to collect positives, negatives and
part face while landmark faces are detected from CelebA [23].
使用框架的第一阶段来从WIDER FACE 中收集正面的 反面的 和部分脸数据,其中第四组数据同上
3) O-Net: Similar to R-Net to collect data but we use first two
stages of our framework to detect faces.
与R-net类似不过使用的是框架的第二个阶段
B. The effectiveness of online hard sample mining
To evaluate the contribution of the proposed online hard
sample mining strategy, we train two O-Nets (with and without
online hard sample mining) and compare their loss curves. To
make the comparison more directly, we only train the O-Nets
for the face classification task. All training parameters includ
ing the network initialization are the same in these two O-Nets.
To compare them easier, we use fix learning rate. Fig. 3 (a)
shows the loss curves from two different training ways. It is
very clear that the hard sample mining is beneficial to perfor
mance improvement.
B. 线上收集困难识别样本的效果
为了看看我们提出的线上收集策略怎么样,我们训练了两组 O-nets (用和不用线上收集样本方法)并比较他们的损失曲线。为了使这个比较更加直观,我们只训练了O-Nets。所有的训练参数包括网络初始化数据在这两组实验中都是相同的。为了更容易的比较他们,我们使用固定不变的learning rate 。Fig 3 (a)展示两种不同训练方式的损失曲线。很明显,困难样本的在线采集对提高实验效果是很有帮助的。

 


C. The effectiveness of joint detection and alignment

To evaluate the contribution of joint detection and alignment,

we evaluate the performances of two different O-Nets (joint

facial landmarks regression task and do not joint it) on FDDB

(with the same P-Net and R-Net for fair comparison). We also

compare the performance of bounding box regression in these

two O-Nets. Fig. 3 (b) suggests that joint landmarks localiza

tion task learning is beneficial for both face classification and

bounding box regression tasks.


C,把人脸检测与关键点对准结合起来的效果

评价了两种不同的O-Nets on FDDB数据集(使用相同的 P-NET 和R-net)同时比较了边界框回归效果。Fig.3b表明把这两者结合起来工作对于人脸分类和边界框回归任务都有很大好处。






D. Evaluation on face detection
To evaluate the performance of our face detection method,
we compare our method against the state-of-the-art methods [1,
5, 6, 11, 18, 19, 26, 27, 28, 29] in FDDB, and the
state-of-the-art methods [1, 24, 11] in WIDER FACE. Fig. 4
(a)-(d) shows that our method consistently outperforms all the
previous approaches by a large margin in both the benchmarks.
We also evaluate our approach on some challenge photos1.
D.人脸识别效果评价
比较了我们的方法和当前表现最好的算法【1,5,6,11,18,19,26,27,28,29】在FDDB数据集上,还有在WIDER FACE数据集上表现最好的方法。如图 Fig 4 a-d 展示,我们的算法在两个数据集上的效果始终要大幅度地超过所有先前的算法。我们同时也在一些非常有挑战性的图片上对我们的算法做出了评估。(Examples are showed in http://kpzhang93.github.io/SPL/index.html)
E. Evaluation on face alignment
In this part, we compare the face alignment performance of
our method against the following methods: RCPR [12], TSPM
[7], Luxand face SDK [17], ESR [13], CDM [15], SDM [21],
and TCDCN [22]. In the testing phase, there are 13 images that
our method fails to detect face. So we crop the central region of
these 13 images and treat them as the input for O-Net. The
mean error is measured by the distances between the estimated
landmarks and the ground truths, and normalized with respect
to the inter-ocular distance. Fig. 4 (e) shows that our method
outperforms all the state-of-the-art methods with a margin.

E.在人脸关键点对准上的效果评价

在方面,我们与一下算法对比

RCPR [12], TSPM
[7], Luxand face SDK [17], ESR [13], CDM [15], SDM [21],
and TCDCN [22].
在测试阶段,有13张图片我们的方法并没有检测出人脸。所以我们把这13张图的中心裁剪出来,作为O-net的输入。平均误差通过估计出来的坐标位置和真实标注的距离计算得到。并基于眼间距离作归一化。 Fig4 显示我们的方法在一定幅度上超过了当前最好的方法。
F. Runtime efficiency
Given the cascade structure, our method can achieve very fast
speed in joint face detection and alignment. It takes 16fps on a
2.60GHz CPU and 99fps on GPU (Nvidia Titan Black). Our
implementation is currently based on un-optimized MATLAB
code.
F.实时效率
鉴于级联结构,我们的方法可以在人脸检测和关键点校准上达到非常快的运行速度。在2.60GHZ的CPU上可以达到16fps。在Nvidia Titan Black gpu上可以达到99fps 我们的实验目前是基于尚未优化的MATLAB代码。
IV. CONCLUSION
In this paper, we have proposed a multi-task cascaded CNNs
based framework for joint face detection and alignment. Ex
perimental results demonstrate that our methods consistently
outperform the state-of-the-art methods across several chal
lenging benchmarks (including FDDB and WIDER FACE
benchmarks for face detection, and AFLW benchmark for face
alignment) while keeping real time performance. In the future,
we will exploit the inherent correlation between face detection
and other face analysis tasks, to further improve the perfor
mance.
四、结论
这篇论文中,我们提出了一个基于结合人脸检测和关键点对准的多任务级联的CNNs。实验结果表明我们的方法始终比当前最先进的方法要表现要好,同时也保持了运行速度。未来,我们将会把人脸检测和其他人脸分析任务结合起来,进一步提升效果。
REFERENCES
[1] B. Yang, J. Yan, Z. Lei, and S. Z. Li, “Aggregate channel eatures for
multi-view face detection,” in IEEE International Joint Conference on
Biometrics, 2014, pp. 1-8.
Fig. 4. (a) Evaluation on FDDB. (b-d) Evaluation on three subsets of WIDER
FACE. The number following the method indicates the average accuracy. (e)
Evaluation on AFLW for face alignment
Fig. 3. (a) Validation loss of O-Net with and without hard sample mining. (b)
“JA” denotes joint face alignment learning while “No JA” denotes do not joint
it. “No JA in BBR” denotes do not joint it while training the CNN for bounding
box regression.5
[2] P. Viola and M. J. Jones, “Robust real-time face detection. International
journal of computer vision,” vol. 57, no. 2, pp. 137-154, 2004
[3] M. T. Pham, Y. Gao, V. D. D. Hoang, and T. J. Cham, “Fast polygonal
integration and its application in extending haar-like features to improve
object detection,” in IEEE Conference on Computer Vision and Pattern
Recognition, 2010, pp. 942-949.
[4] Q. Zhu, M. C. Yeh, K. T. Cheng, and S. Avidan, “Fast human detection
using a cascade of histograms of oriented gradients,” in IEEE Computer
Conference on Computer Vision and Pattern Recognition, 2006, pp.
1491-1498.
[5] M. Mathias, R. Benenson, M. Pedersoli, and L. Van Gool, “Face detection
without bells and whistles,” in European Conference on Computer Vision,
2014, pp. 720-735.
[6] J. Yan, Z. Lei, L. Wen, and S. Li, “The fastest deformable part model for
object detection,” in IEEE Conference on Computer Vision and Pattern
Recognition, 2014, pp. 2497-2504.
[7] X. Zhu, and D. Ramanan, “Face detection, pose estimation, and landmark
localization in the wild,” in IEEE Conference on Computer Vision and
Pattern Recognition, 2012, pp. 2879-2886.
[8] M. Köstinger, P. Wohlhart, P. M. Roth, and H. Bischof, “Annotated facial
landmarks in the wild: A large-scale, real-world database for facial land
mark localization,” in IEEE Conference on Computer Vision and Pattern
Recognition Workshops, 2011, pp. 2144-2151.
[9] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with deep convolutional neural networks,” in Advances in neural infor
mation processing systems, 2012, pp. 1097-1105.
[10] Y. Sun, Y. Chen, X. Wang, and X. Tang, “Deep learning face representa
tion by joint identification-verification,” in Advances in Neural Infor
mation Processing Systems, 2014, pp. 1988-1996.
[11] S. Yang, P. Luo, C. C. Loy, and X. Tang, “From facial parts responses to
face detection: A deep learning approach,” in IEEE International Confer
ence on Computer Vision, 2015, pp. 3676-3684.
[12] X. P. Burgos-Artizzu, P. Perona, and P. Dollar, “Robust face landmark
estimation under occlusion,” in IEEE International Conference on Com
puter Vision, 2013, pp. 1513-1520.
[13] X. Cao, Y. Wei, F. Wen, and J. Sun, “Face alignment by explicit shape
regression,” International Journal of Computer Vision, vol 107, no. 2, pp.
177-190, 2012.
[14] T. F. Cootes, G. J. Edwards, and C. J. Taylor, “Active appearance models,”
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23,
no. 6, pp. 681-685, 2001.
[15] X. Yu, J. Huang, S. Zhang, W. Yan, and D. Metaxas, “Pose-free facial
landmark fitting via optimized part mixtures and cascaded deformable
shape model,” in IEEE International Conference on Computer Vision,
2013, pp. 1944-1951.
[16] J. Zhang, S. Shan, M. Kan, and X. Chen, “Coarse-to-fine auto-encoder
networks (CFAN) for real-time face alignment,” in European Conference
on Computer Vision, 2014, pp. 1-16.
[17] Luxand Incorporated: Luxand face SDK, http://www.luxand.com/
[18] D. Chen, S. Ren, Y. Wei, X. Cao, and J. Sun, “Joint cascade face detection
and alignment,” in European Conference on Computer Vision, 2014, pp.
109-122.
[19] H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua, “A convolutional neural
network cascade for face detection,” in IEEE Conference on Computer
Vision and Pattern Recognition, 2015, pp. 5325-5334.
[20] C. Zhang, and Z. Zhang, “Improving multiview face detection with mul
ti-task deep convolutional neural networks,” IEEE Winter Conference
on Applications of Computer Vision, 2014, pp. 1036-1041.
[21] X. Xiong, and F. Torre, “Supervised descent method and its applications to
face alignment,” in IEEE Conference on Computer Vision and Pattern
Recognition, 2013, pp. 532-539.
[22] Z. Zhang, P. Luo, C. C. Loy, and X. Tang, “Facial landmark detection by
deep multi-task learning,” in European Conference on Computer Vision,
2014, pp. 94-108.
[23] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the
wild,” in IEEE International Conference on Computer Vision, 2015, pp.
3730-3738.
[24] S. Yang, P. Luo, C. C. Loy, and X. Tang, “WIDER FACE: A Face Detec
tion Benchmark”. arXiv preprint arXiv:1511.06523.
[25] V. Jain, and E. G. Learned-Miller, “FDDB: A benchmark for face detec
tion in unconstrained settings,” Technical Report UMCS-2010-009, Uni
versity of Massachusetts, Amherst, 2010.
[26] B. Yang, J. Yan, Z. Lei, and S. Z. Li, “Convolutional channel features,” in
IEEE International Conference on Computer Vision, 2015, pp. 82-90.
[27] R. Ranjan, V. M. Patel, and R. Chellappa, “A deep pyramid deformable
part model for face detection,” in IEEE International Conference on Bio
metrics Theory, Applications and Systems, 2015, pp. 1-8.
[28] G. Ghiasi, and C. C. Fowlkes, “Occlusion Coherence: Detecting and
Localizing Occluded Faces,” arXiv preprint arXiv:1506.08347.
[29] S. S. Farfade, M. J. Saberian, and L. J. Li, “Multi-view face detection using
deep convolutional neural networks,” in ACM on International Conference
on Multimedia Retrieval, 2015, pp. 643-650.





推荐阅读
  • 在Xamarin XAML语言中如何在页面级别构建ControlTemplate控件模板
    本文介绍了在Xamarin XAML语言中如何在页面级别构建ControlTemplate控件模板的方法和步骤,包括将ResourceDictionary添加到页面中以及在ResourceDictionary中实现模板的构建。通过本文的阅读,读者可以了解到在Xamarin XAML语言中构建控件模板的具体操作步骤和语法形式。 ... [详细]
  • 本文介绍了使用Spark实现低配版高斯朴素贝叶斯模型的原因和原理。随着数据量的增大,单机上运行高斯朴素贝叶斯模型会变得很慢,因此考虑使用Spark来加速运行。然而,Spark的MLlib并没有实现高斯朴素贝叶斯模型,因此需要自己动手实现。文章还介绍了朴素贝叶斯的原理和公式,并对具有多个特征和类别的模型进行了讨论。最后,作者总结了实现低配版高斯朴素贝叶斯模型的步骤。 ... [详细]
  • STL迭代器的种类及其功能介绍
    本文介绍了标准模板库(STL)定义的五种迭代器的种类和功能。通过图表展示了这几种迭代器之间的关系,并详细描述了各个迭代器的功能和使用方法。其中,输入迭代器用于从容器中读取元素,输出迭代器用于向容器中写入元素,正向迭代器是输入迭代器和输出迭代器的组合。本文的目的是帮助读者更好地理解STL迭代器的使用方法和特点。 ... [详细]
  • 判断编码是否可立即解码的程序及电话号码一致性判断程序
    本文介绍了两个编程题目,一个是判断编码是否可立即解码的程序,另一个是判断电话号码一致性的程序。对于第一个题目,给出一组二进制编码,判断是否存在一个编码是另一个编码的前缀,如果不存在则称为可立即解码的编码。对于第二个题目,给出一些电话号码,判断是否存在一个号码是另一个号码的前缀,如果不存在则说明这些号码是一致的。两个题目的解法类似,都使用了树的数据结构来实现。 ... [详细]
  • 颜色迁移(reinhard VS welsh)
    不要谈什么天分,运气,你需要的是一个截稿日,以及一个不交稿就能打爆你狗头的人,然后你就会被自己的才华吓到。------ ... [详细]
  • 在本教程中,我们将看到如何使用FLASK制作第一个用于机器学习模型的RESTAPI。我们将从创建机器学习模型开始。然后,我们将看到使用Flask创建AP ... [详细]
  • 鄂维南:从数学角度,理解机器学习的「黑魔法」,并应用于更广泛的科学问题...
    作者|Hertz来源|科学智能AISI北京时间2022年7月8日晚上22:30,鄂维南院士在2022年的国际数学家大会上作一小时大会报告(plenarytalk)。今 ... [详细]
  • 模型融合(集成)参考博客:KaggleEnsemblingGuide(https_mlwave.com模型集成是融合多个训练好的模型,基于某种方式实现测试数据的多模型融合,这 ... [详细]
  • 本文介绍了闭包的定义和运转机制,重点解释了闭包如何能够接触外部函数的作用域中的变量。通过词法作用域的查找规则,闭包可以访问外部函数的作用域。同时还提到了闭包的作用和影响。 ... [详细]
  • 本文介绍了Perl的测试框架Test::Base,它是一个数据驱动的测试框架,可以自动进行单元测试,省去手工编写测试程序的麻烦。与Test::More完全兼容,使用方法简单。以plural函数为例,展示了Test::Base的使用方法。 ... [详细]
  • 推荐系统遇上深度学习(十七)详解推荐系统中的常用评测指标
    原创:石晓文小小挖掘机2018-06-18笔者是一个痴迷于挖掘数据中的价值的学习人,希望在平日的工作学习中,挖掘数据的价值, ... [详细]
  • oracle主键和唯一索引,Oracle 主键、唯一键与唯一索引的区别
    如果我们让主键约束或者唯一键约束失效,Oracle自动创建的唯一索引是否会受到影响?SQLdroptabletestpurge;Tabledroppe ... [详细]
  • 这篇文章主要讲解了“怎么用Python写一个电信客户流失预测模型”,文中的讲解内容简单清晰,易于学习与理解,下面请大家跟着小编的思路慢慢深入, ... [详细]
  • 基于halcon的特征匹配实例
    特征匹配原图模板识别图代码结果原图模板识别图代码*这个例子在图片数据库中查找文章的页面。*第一步是训练不同的页面并创建模型。*然后搜索未知图像并检测出正确的文章页面。*请注意& ... [详细]
  • 对于我当前的需求,我需要绘制一些我从mongodb中获取的数据的图表,并且我正在使用reactPo ... [详细]
author-avatar
一枝红杏出墙来2001
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有