当前位置: 开发笔记 > 编程语言 > 正文

论文笔记1.3——JointFaceDetectionandAlignmentusingMultitaskCascadedConvolutionalNetworks

作者：一枝红杏出墙来2001 | 来源：互联网 | 2023-09-06 19:51

三、EXPERIMENTSInthissection,wefirstevaluatetheeffectivenessoftheproposedhardsampleminingstr

三、EXPERIMENTS

In this section, we first evaluate the effectiveness of the

proposed hard sample mining strategy. Then we compare our

face detector and alignment against the state-of-the-art methods

in Face Detection Data Set and Benchmark (FDDB) [25],

WIDER FACE [24], and Annotated Facial Landmarks in the

Wild (AFLW) benchmark [8]. FDDB dataset contains the an

notations for 5,171 faces in a set of 2,845 images. WIDER

FACE dataset consists of 393,703 labeled face bounding boxes

in 32,203 images where 50% of them for testing into three

subsets according to the difficulty of images, 40% for training

and the remaining for validation. AFLW contains the facial

landmarks annotations for 24,386 faces and we use the same

test subset as [22]. Finally, we evaluate the computational ef

ficiency of our face detector.

在这一部分&＃xff0c;我们首先对提出的困难样本采集策略的效率进行评价。之后把我们的人脸检测器和关键点对准与当前最好的方法&＃xff08;Face Detection Data Set and Benchmak FDDB)[25],和WIDER FACE 【24】相比较&＃xff0c;以及Annotated Facial Landmarks in the Wild (AFLW) benchmark[8] 。FDDB 数据集包含5171张人脸标注 &＃xff0c;2845张图片。 WIDER FACE 数据集包含393703个标记好的人脸边界框&＃xff0c;32203张图片。把他们之中50%分成三个识别困难度的组别进行测试&＃xff0c;40%用来训练、和保留下来用来核实&＃xff0c;AFLW包含面部标注的24386张人脸&＃xff0c;我们使用和【22】相同的方法测试&＃xff0c;最后评估我们的人脸检测器的计算效能。

A. Training Data

Since we jointly perform face detection and alignment, here

we use four different kinds of data annotation in our training

process: (i) Negatives: Regions that the Intersec

tion-over-Union (IoU) ratio less than 0.3 to any ground-truth

faces; (ii) Positives: IoU above 0.65 to a ground truth face; (iii)

Part faces: IoU between 0.4 and 0.65 to a ground truth face; and

(iv) Landmark faces: faces labeled 5 landmarks’ positions.

Negatives and positives are used for face classification tasks,

positives and part faces are used for bounding box regression,

and landmark faces are used for facial landmark localization.

The training data for each network is described as follows:

A. Training Data

因为我们想要把人脸识别和关键点校准结合起来处理&＃xff0c;所以我们就在训练过程中使用四种不同的数据标注&＃xff1a;&＃xff08;i&＃xff09;负面的&＃xff1a;交并比&＃xff08;IOU&＃xff09;&＃xff08;模型产生的目标窗口和原来标记窗口的交叠率。具体我们可以简单的理解为&＃xff1a; 即检测结果(DetectionResult)与 Ground Truth 的交集比上它们的并集。&＃xff09;比例对于任何标记数据在0.3以下。&＃xff08;ii&＃xff09;正面的&＃xff1a; IoU 比例在0.65以上 &＃xff08;iii&＃xff09;部分人脸&＃xff1a;IoU比例在0.4-0.65 &＃xff08;iv&＃xff09;标记人脸&＃xff1a;标记好了五个关键部位的位置。正反面的数据被用来训练人脸分类任务&＃xff0c;正面的和部分人脸的两组用来训练边界框回归。标记好的人脸用来训练面部关键部位的定位。

每个网络的训练数据集被描述成如下形式&＃xff1a;

1) P-Net: We randomly crop several patches from WIDER

FACE [24] to collect positives, negatives and part face. Then,

we crop faces from CelebA [23] as landmark faces

把WIDER FACE 中的图片随机裁剪成小块来收集正面的&＃xff0c;反面的&＃xff0c;和部分脸的数据。然后把CelebA【23】中裁剪的脸部图片作为第四组数据。

2) R-Net: We use first stage of our framework to detect faces

from WIDER FACE [24] to collect positives, negatives and

part face while landmark faces are detected from CelebA [23].

使用框架的第一阶段来从WIDER FACE 中收集正面的反面的和部分脸数据&＃xff0c;其中第四组数据同上

3) O-Net: Similar to R-Net to collect data but we use first two

stages of our framework to detect faces.

与R-net类似不过使用的是框架的第二个阶段

B. The effectiveness of online hard sample mining

To evaluate the contribution of the proposed online hard

sample mining strategy, we train two O-Nets (with and without

online hard sample mining) and compare their loss curves. To

make the comparison more directly, we only train the O-Nets

for the face classification task. All training parameters includ

ing the network initialization are the same in these two O-Nets.

To compare them easier, we use fix learning rate. Fig. 3 (a)

shows the loss curves from two different training ways. It is

very clear that the hard sample mining is beneficial to perfor

mance improvement.

B. 线上收集困难识别样本的效果

为了看看我们提出的线上收集策略怎么样&＃xff0c;我们训练了两组 O-nets &＃xff08;用和不用线上收集样本方法&＃xff09;并比较他们的损失曲线。为了使这个比较更加直观&＃xff0c;我们只训练了O-Nets。所有的训练参数包括网络初始化数据在这两组实验中都是相同的。为了更容易的比较他们&＃xff0c;我们使用固定不变的learning rate 。Fig 3 &＃xff08;a&＃xff09;展示两种不同训练方式的损失曲线。很明显&＃xff0c;困难样本的在线采集对提高实验效果是很有帮助的。

C. The effectiveness of joint detection and alignment

To evaluate the contribution of joint detection and alignment,

we evaluate the performances of two different O-Nets (joint

facial landmarks regression task and do not joint it) on FDDB

(with the same P-Net and R-Net for fair comparison). We also

compare the performance of bounding box regression in these

two O-Nets. Fig. 3 (b) suggests that joint landmarks localiza

tion task learning is beneficial for both face classification and

bounding box regression tasks.

C&＃xff0c;把人脸检测与关键点对准结合起来的效果

评价了两种不同的O-Nets on FDDB数据集&＃xff08;使用相同的 P-NET 和R-net&＃xff09;同时比较了边界框回归效果。Fig.3b表明把这两者结合起来工作对于人脸分类和边界框回归任务都有很大好处。

D. Evaluation on face detection

To evaluate the performance of our face detection method,

we compare our method against the state-of-the-art methods [1,

5, 6, 11, 18, 19, 26, 27, 28, 29] in FDDB, and the

state-of-the-art methods [1, 24, 11] in WIDER FACE. Fig. 4

(a)-(d) shows that our method consistently outperforms all the

previous approaches by a large margin in both the benchmarks.

We also evaluate our approach on some challenge photos1.

D.人脸识别效果评价

比较了我们的方法和当前表现最好的算法【1&＃xff0c;5&＃xff0c;6&＃xff0c;11&＃xff0c;18&＃xff0c;19&＃xff0c;26&＃xff0c;27&＃xff0c;28&＃xff0c;29】在FDDB数据集上&＃xff0c;还有在WIDER FACE数据集上表现最好的方法。如图 Fig 4 a-d 展示&＃xff0c;我们的算法在两个数据集上的效果始终要大幅度地超过所有先前的算法。我们同时也在一些非常有挑战性的图片上对我们的算法做出了评估。&＃xff08;Examples are showed in http://kpzhang93.github.io/SPL/index.html&＃xff09;

E. Evaluation on face alignment

In this part, we compare the face alignment performance of

our method against the following methods: RCPR [12], TSPM

[7], Luxand face SDK [17], ESR [13], CDM [15], SDM [21],

and TCDCN [22]. In the testing phase, there are 13 images that

our method fails to detect face. So we crop the central region of

these 13 images and treat them as the input for O-Net. The

mean error is measured by the distances between the estimated

landmarks and the ground truths, and normalized with respect

to the inter-ocular distance. Fig. 4 (e) shows that our method

outperforms all the state-of-the-art methods with a margin.

E.在人脸关键点对准上的效果评价

在方面&＃xff0c;我们与一下算法对比

RCPR [12], TSPM

[7], Luxand face SDK [17], ESR [13], CDM [15], SDM [21],

and TCDCN [22].

在测试阶段&＃xff0c;有13张图片我们的方法并没有检测出人脸。所以我们把这13张图的中心裁剪出来&＃xff0c;作为O-net的输入。平均误差通过估计出来的坐标位置和真实标注的距离计算得到。并基于眼间距离作归一化。 Fig4 显示我们的方法在一定幅度上超过了当前最好的方法。

F. Runtime efficiency

Given the cascade structure, our method can achieve very fast

speed in joint face detection and alignment. It takes 16fps on a

2.60GHz CPU and 99fps on GPU (Nvidia Titan Black). Our

implementation is currently based on un-optimized MATLAB

code.

F.实时效率

鉴于级联结构&＃xff0c;我们的方法可以在人脸检测和关键点校准上达到非常快的运行速度。在2.60GHZ的CPU上可以达到16fps。在Nvidia Titan Black gpu上可以达到99fps 我们的实验目前是基于尚未优化的MATLAB代码。

IV. CONCLUSION

In this paper, we have proposed a multi-task cascaded CNNs

based framework for joint face detection and alignment. Ex

perimental results demonstrate that our methods consistently

outperform the state-of-the-art methods across several chal

lenging benchmarks (including FDDB and WIDER FACE

benchmarks for face detection, and AFLW benchmark for face

alignment) while keeping real time performance. In the future,

we will exploit the inherent correlation between face detection

and other face analysis tasks, to further improve the perfor

mance.

四、结论

这篇论文中&＃xff0c;我们提出了一个基于结合人脸检测和关键点对准的多任务级联的CNNs。实验结果表明我们的方法始终比当前最先进的方法要表现要好&＃xff0c;同时也保持了运行速度。未来&＃xff0c;我们将会把人脸检测和其他人脸分析任务结合起来&＃xff0c;进一步提升效果。

REFERENCES

[1] B. Yang, J. Yan, Z. Lei, and S. Z. Li, “Aggregate channel eatures for

multi-view face detection,” in IEEE International Joint Conference on

Biometrics, 2014, pp. 1-8.

Fig. 4. (a) Evaluation on FDDB. (b-d) Evaluation on three subsets of WIDER

FACE. The number following the method indicates the average accuracy. (e)

Evaluation on AFLW for face alignment

Fig. 3. (a) Validation loss of O-Net with and without hard sample mining. (b)

“JA” denotes joint face alignment learning while “No JA” denotes do not joint

it. “No JA in BBR” denotes do not joint it while training the CNN for bounding

box regression.5

[2] P. Viola and M. J. Jones, “Robust real-time face detection. International

journal of computer vision,” vol. 57, no. 2, pp. 137-154, 2004

[3] M. T. Pham, Y. Gao, V. D. D. Hoang, and T. J. Cham, “Fast polygonal

integration and its application in extending haar-like features to improve

object detection,” in IEEE Conference on Computer Vision and Pattern

Recognition, 2010, pp. 942-949.

[4] Q. Zhu, M. C. Yeh, K. T. Cheng, and S. Avidan, “Fast human detection

using a cascade of histograms of oriented gradients,” in IEEE Computer

Conference on Computer Vision and Pattern Recognition, 2006, pp.

1491-1498.

[5] M. Mathias, R. Benenson, M. Pedersoli, and L. Van Gool, “Face detection

without bells and whistles,” in European Conference on Computer Vision,

2014, pp. 720-735.

[6] J. Yan, Z. Lei, L. Wen, and S. Li, “The fastest deformable part model for

object detection,” in IEEE Conference on Computer Vision and Pattern

Recognition, 2014, pp. 2497-2504.

[7] X. Zhu, and D. Ramanan, “Face detection, pose estimation, and landmark

localization in the wild,” in IEEE Conference on Computer Vision and

Pattern Recognition, 2012, pp. 2879-2886.

[8] M. Köstinger, P. Wohlhart, P. M. Roth, and H. Bischof, “Annotated facial

landmarks in the wild: A large-scale, real-world database for facial land

mark localization,” in IEEE Conference on Computer Vision and Pattern

Recognition Workshops, 2011, pp. 2144-2151.

[9] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification

with deep convolutional neural networks,” in Advances in neural infor

mation processing systems, 2012, pp. 1097-1105.

[10] Y. Sun, Y. Chen, X. Wang, and X. Tang, “Deep learning face representa

tion by joint identification-verification,” in Advances in Neural Infor

mation Processing Systems, 2014, pp. 1988-1996.

[11] S. Yang, P. Luo, C. C. Loy, and X. Tang, “From facial parts responses to

face detection: A deep learning approach,” in IEEE International Confer

ence on Computer Vision, 2015, pp. 3676-3684.

[12] X. P. Burgos-Artizzu, P. Perona, and P. Dollar, “Robust face landmark

estimation under occlusion,” in IEEE International Conference on Com

puter Vision, 2013, pp. 1513-1520.

[13] X. Cao, Y. Wei, F. Wen, and J. Sun, “Face alignment by explicit shape

regression,” International Journal of Computer Vision, vol 107, no. 2, pp.

177-190, 2012.

[14] T. F. Cootes, G. J. Edwards, and C. J. Taylor, “Active appearance models,”

IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23,

no. 6, pp. 681-685, 2001.

[15] X. Yu, J. Huang, S. Zhang, W. Yan, and D. Metaxas, “Pose-free facial

landmark fitting via optimized part mixtures and cascaded deformable

shape model,” in IEEE International Conference on Computer Vision,

2013, pp. 1944-1951.

[16] J. Zhang, S. Shan, M. Kan, and X. Chen, “Coarse-to-fine auto-encoder

networks (CFAN) for real-time face alignment,” in European Conference

on Computer Vision, 2014, pp. 1-16.

[17] Luxand Incorporated: Luxand face SDK, http://www.luxand.com/

[18] D. Chen, S. Ren, Y. Wei, X. Cao, and J. Sun, “Joint cascade face detection

and alignment,” in European Conference on Computer Vision, 2014, pp.

109-122.

[19] H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua, “A convolutional neural

network cascade for face detection,” in IEEE Conference on Computer

Vision and Pattern Recognition, 2015, pp. 5325-5334.

[20] C. Zhang, and Z. Zhang, “Improving multiview face detection with mul

ti-task deep convolutional neural networks,” IEEE Winter Conference

on Applications of Computer Vision, 2014, pp. 1036-1041.

[21] X. Xiong, and F. Torre, “Supervised descent method and its applications to

face alignment,” in IEEE Conference on Computer Vision and Pattern

Recognition, 2013, pp. 532-539.

[22] Z. Zhang, P. Luo, C. C. Loy, and X. Tang, “Facial landmark detection by

deep multi-task learning,” in European Conference on Computer Vision,

2014, pp. 94-108.

[23] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the

wild,” in IEEE International Conference on Computer Vision, 2015, pp.

3730-3738.

[24] S. Yang, P. Luo, C. C. Loy, and X. Tang, “WIDER FACE: A Face Detec

tion Benchmark”. arXiv preprint arXiv:1511.06523.

[25] V. Jain, and E. G. Learned-Miller, “FDDB: A benchmark for face detec

tion in unconstrained settings,” Technical Report UMCS-2010-009, Uni

versity of Massachusetts, Amherst, 2010.

[26] B. Yang, J. Yan, Z. Lei, and S. Z. Li, “Convolutional channel features,” in

IEEE International Conference on Computer Vision, 2015, pp. 82-90.

[27] R. Ranjan, V. M. Patel, and R. Chellappa, “A deep pyramid deformable

part model for face detection,” in IEEE International Conference on Bio

metrics Theory, Applications and Systems, 2015, pp. 1-8.

[28] G. Ghiasi, and C. C. Fowlkes, “Occlusion Coherence: Detecting and

Localizing Occluded Faces,” arXiv preprint arXiv:1506.08347.

[29] S. S. Farfade, M. J. Saberian, and L. J. Li, “Multi-view face detection using

deep convolutional neural networks,” in ACM on International Conference

on Multimedia Retrieval, 2015, pp. 643-650.

推荐阅读

schema
在Xamarin XAML语言中如何在页面级别构建ControlTemplate控件模板

本文介绍了在Xamarin XAML语言中如何在页面级别构建ControlTemplate控件模板的方法和步骤，包括将ResourceDictionary添加到页面中以及在ResourceDictionary中实现模板的构建。通过本文的阅读，读者可以了解到在Xamarin XAML语言中构建控件模板的具体操作步骤和语法形式。 ... [详细]

蜡笔小新 2023-12-12 17:52:50
spring
Spark实现高斯朴素贝叶斯模型的低配版

本文介绍了使用Spark实现低配版高斯朴素贝叶斯模型的原因和原理。随着数据量的增大，单机上运行高斯朴素贝叶斯模型会变得很慢，因此考虑使用Spark来加速运行。然而，Spark的MLlib并没有实现高斯朴素贝叶斯模型，因此需要自己动手实现。文章还介绍了朴素贝叶斯的原理和公式，并对具有多个特征和类别的模型进行了讨论。最后，作者总结了实现低配版高斯朴素贝叶斯模型的步骤。 ... [详细]

蜡笔小新 2023-12-10 21:42:37
spring
STL迭代器的种类及其功能介绍

本文介绍了标准模板库(STL)定义的五种迭代器的种类和功能。通过图表展示了这几种迭代器之间的关系，并详细描述了各个迭代器的功能和使用方法。其中，输入迭代器用于从容器中读取元素，输出迭代器用于向容器中写入元素，正向迭代器是输入迭代器和输出迭代器的组合。本文的目的是帮助读者更好地理解STL迭代器的使用方法和特点。 ... [详细]

蜡笔小新 2023-12-10 15:17:25
tree
判断编码是否可立即解码的程序及电话号码一致性判断程序

本文介绍了两个编程题目，一个是判断编码是否可立即解码的程序，另一个是判断电话号码一致性的程序。对于第一个题目，给出一组二进制编码，判断是否存在一个编码是另一个编码的前缀，如果不存在则称为可立即解码的编码。对于第二个题目，给出一些电话号码，判断是否存在一个号码是另一个号码的前缀，如果不存在则说明这些号码是一致的。两个题目的解法类似，都使用了树的数据结构来实现。 ... [详细]

蜡笔小新 2023-12-09 02:47:15
range
颜色迁移（reinhard VS welsh）

不要谈什么天分，运气，你需要的是一个截稿日，以及一个不交稿就能打爆你狗头的人，然后你就会被自己的才华吓到。------ ... [详细]

蜡笔小新 2023-10-17 21:20:36
testing
使用FLASK REST API的机器学习模型

在本教程中，我们将看到如何使用FLASK制作第一个用于机器学习模型的RESTAPI。我们将从创建机器学习模型开始。然后，我们将看到使用Flask创建AP ... [详细]

蜡笔小新 2023-10-17 19:13:12
testing
鄂维南：从数学角度，理解机器学习的「黑魔法」，并应用于更广泛的科学问题...

作者|Hertz来源|科学智能AISI北京时间2022年7月8日晚上22:30，鄂维南院士在2022年的国际数学家大会上作一小时大会报告(plenarytalk)。今 ... [详细]

蜡笔小新 2023-10-15 23:41:17
testing
什么是模型融合,stacking集成模型原理介绍

模型融合（集成）参考博客：KaggleEnsemblingGuide(https_mlwave.com模型集成是融合多个训练好的模型，基于某种方式实现测试数据的多模型融合，这 ... [详细]

蜡笔小新 2023-10-13 12:57:34
js
JS进修笔记——闭包的运转机制和作用域

本文介绍了闭包的定义和运转机制，重点解释了闭包如何能够接触外部函数的作用域中的变量。通过词法作用域的查找规则，闭包可以访问外部函数的作用域。同时还提到了闭包的作用和影响。 ... [详细]

蜡笔小新 2023-12-14 18:45:00
ip
Perl的测试框架Test::Base简介及使用方法

本文介绍了Perl的测试框架Test::Base，它是一个数据驱动的测试框架，可以自动进行单元测试，省去手工编写测试程序的麻烦。与Test::More完全兼容，使用方法简单。以plural函数为例，展示了Test::Base的使用方法。 ... [详细]

蜡笔小新 2023-12-13 20:05:31
tree
推荐系统遇上深度学习(十七）详解推荐系统中的常用评测指标

原创：石晓文小小挖掘机2018-06-18笔者是一个痴迷于挖掘数据中的价值的学习人，希望在平日的工作学习中，挖掘数据的价值， ... [详细]

蜡笔小新 2023-12-13 19:35:25
const
oracle主键和唯一索引,Oracle 主键、唯一键与唯一索引的区别

如果我们让主键约束或者唯一键约束失效，Oracle自动创建的唯一索引是否会受到影响？SQLdroptabletestpurge;Tabledroppe ... [详细]

蜡笔小新 2023-10-13 13:17:05
uri
怎么用Python写一个电信客户流失预测模型

这篇文章主要讲解了“怎么用Python写一个电信客户流失预测模型”，文中的讲解内容简单清晰，易于学习与理解，下面请大家跟着小编的思路慢慢深入， ... [详细]

蜡笔小新 2023-10-13 09:09:43
default
基于halcon的特征匹配实例

特征匹配原图模板识别图代码结果原图模板识别图代码*这个例子在图片数据库中查找文章的页面。*第一步是训练不同的页面并创建模型。*然后搜索未知图像并检测出正确的文章页面。*请注意& ... [详细]

蜡笔小新 2023-10-13 08:58:05
format
有没有一种方法可以在反应/观察中使用嵌套的reactPoll来绘制依赖于DB和UI更改的图

对于我当前的需求，我需要绘制一些我从mongodb中获取的数据的图表，并且我正在使用reactPo ... [详细]

蜡笔小新 2023-10-11 18:49:23

一枝红杏出墙来2001

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章