当前位置: 开发笔记 > 编程语言 > 正文

软件工程应用与实践（六）：PaddleOCR文字识别器策略四

作者：莪乜子12 | 来源：互联网 | 2023-06-17 15:11

2021SCSDUSC目录一、前情回顾1.PP-OCR文字识别策略2.本文策略——学习率衰减的简单介绍学习率衰减介绍学习率衰减的常见参数几种固定学习率衰减策略介绍学习率衰

2021SC&＃64;SDUSC

一、前情回顾

1.PP-OCR文字识别策略

2.本文策略——学习率衰减的简单介绍

学习率衰减介绍

学习率衰减的常见参数

几种固定学习率衰减策略介绍

学习率衰减的warm-up策略

Paddle OCR所涉及的学习率衰减策略

二、学习率衰减策略与代码分析

1.PP-OCR的学习率衰减策略

2.代码分析

总结

一、前情回顾

1.PP-OCR文字识别策略

策略的选用主要是用来增强模型能力和减少模型大小。下面是PP-OCR文字识别器所采用的九种策略&＃xff1a;

轻主干&＃xff0c;选用采用 MobileNetV3 large x0.5 来权衡精度和效率&＃xff1b;
数据增强&＃xff0c;BDA (Base Dataaugmented)和TIA (Luo et al. 2020)&＃xff1b;
余弦学习率衰减&＃xff0c;有效提高模型的文本识别能力&＃xff1b;
特征图辨析&＃xff0c;适应多语言识别&＃xff0c;进行向下采样 feature map的步幅修改&＃xff1b;
正则化参数&＃xff0c;权值衰减避免过拟合&＃xff1b;
学习率预热&＃xff0c;同样有效&＃xff1b;
轻头部&＃xff0c;采用全连接层将序列特征编码为预测字符&＃xff0c;减小模型大小&＃xff1b;
预训练模型&＃xff0c;是在 ImageNet 这样的大数据集上训练的&＃xff0c;可以达到更快的收敛和更好的精度&＃xff1b;
PACT量化&＃xff0c;略过 LSTM 层&＃xff1b;

2.本文策略——学习率衰减的简单介绍

余弦学习率衰减&＃xff0c;有效提高模型的文本识别能力

学习率衰减介绍

深层神经网络的参数学习主要是通过梯度下降方法来寻找一组可以最小化结构风险的参数。在梯度下降中学习率的取值非常关键&＃xff0c;如果过大可能不会收敛&＃xff0c;过小则收敛速度太慢。

通常的策略在一开始采用大的学习率保证收敛&＃xff0c;在收敛到最优点附近时要小些以避免来回震荡。因此&＃xff0c;比较简单直接的学习率调整可以通过学习率衰减&＃xff08;Learning Rate Decay&＃xff09;的方式来实现。

学习率衰减策略可以分为两种&＃xff1a;固定策略的学习率衰减和自适应学习率衰减&＃xff0c;其中固定学习率衰减包括分段衰减、逆时衰减、指数衰减等&＃xff0c;自适应学习率衰减包括AdaGrad、 RMSprop、 AdaDelta等。一般情况&＃xff0c;两种策略会结合使用。

学习率衰减的常见参数

参数名称	参数说明
learning_rate	初始学习率
global_step	用于衰减计算的全局步数&＃xff0c;非负&＃xff0c;用于逐步计算衰减指数
decay_steps	衰减步数&＃xff0c;必须是正值&＃xff0c;决定衰减周期
decay_rate	衰减率
end_learning_rate	最低的最终学习率
cycle	学习率下降后是否重新上升
alpha	最小学习率
num_periods	衰减余弦部分的周期数
initial_variance	噪声的初始方差
variance_decay	衰减噪声的方差
boundaries	学习率衰减边界
values	不同阶段对应学习率
staircase	是否以离散的时间间隔衰减学习率
power	多项式的幂

几种固定学习率衰减策略介绍

基本学习率衰减

piecewise decay 分段常数衰减&＃xff0c; 在训练过程中不同阶段设置不同的学习率&＃xff0c;便于更精细的调参。在目标检测任务如Faster RCNN 和 SSD 的训练中都采用分段常数衰减策略&＃xff0c;调整学习率。策略效果如下&＃xff1a;
exponential decay 指数衰减&＃xff1a;学习率以指数的形式进行衰减&＃xff0c;其中指数函数的底为decay_rate&＃xff0c; 指数为 global_step / decay_steps。策略效果如下&＃xff08;图一为连续&＃xff0c;图二为离散&＃xff09;&＃xff1a;
natural exponential decay 自然指数衰减:学习率以自然指数进行衰减&＃xff0c;其中指数函数底为自然常数e, 指数为-decay_rate * global_step / decay_step&＃xff0c; 相比指数衰减具有更快的衰减速度。策略实现效果如下&＃xff08;图一为连续&＃xff0c;图二为离散&＃xff09;&＃xff1a;
polynomial decay 多项式衰减&＃xff1a;调整学习率的衰减轨迹以多项式对应的轨迹进行。其中&＃xff08;1 - global_step / decay_steps&＃xff09; 为幂函数的底&＃xff1b; power为指数&＃xff0c;控制衰减的轨迹。策略效果实现如下&＃xff08;幂指数分别为0.5、1.0、2.0&＃xff09;&＃xff1a;
cosine decay 余弦衰减&＃xff1a;学习率以cosine 函数曲线进行进行衰减&＃xff0c; 其中余弦函数的周期为 , 自变量为

策略效果如下&＃xff1a;
linear cosine decay 线性余弦衰减&＃xff1a;动机式在开始的几个周期&＃xff0c;执行warm up 操作&＃xff0c;线性余弦衰减比余弦衰减更具aggressive&＃xff0c;通常可以使用更大的初始学习速率。其中余弦函数的周期为 &＃xff0c;自变量为。实现效果如下&＃xff1a;

学习率衰减的warm-up策略

它可以分为两个阶段&＃xff1a;第一个阶段&＃xff0c;学习率从很小的学习率&＃xff08;warm-up learning rate&＃xff09;增加到基学习率&＃xff08;base learning rate&＃xff09;&＃xff0c;这一阶段也称为warm-up阶段。第二阶段&＃xff0c;从基学习开始&＃xff0c;执行学习率衰减。

warm-up 动机如下&＃xff1a;

对于第一阶段&＃xff0c;由于刚开始训练时,模型的权重(weights)是随机初始化的&＃xff0c;这时模型对于数据的“分布”理解为零&＃xff0c;在初始训练阶段&＃xff0c; 每个输入数据对模型来说都是新的&＃xff0c; 模型会根据数据对模型权重进行修正。此时若选择一个较大的学习率,如果这时候学习率就很大&＃xff0c;极有可能导致模型对开始的数据“过拟合”&＃xff0c;后面要通过多轮训练才能拉回来&＃xff0c;浪费时间。当训练了一段&＃xff08;几个epoch&＃xff09;后&＃xff0c;模型已经对数据集分布有一定了解&＃xff0c;或者说对当前的batch而言有了一些正确的先验&＃xff0c;较大的学习率就不那么容易会使模型跑偏&＃xff0c;所以可以适当调大学习率。这个过程就可以看做是warm-up。
对于第二阶段&＃xff0c;当模型一定阶段&＃xff08;如十几个epoch&＃xff09;后,模型的分布就已经比较固定了&＃xff0c;模型慢慢趋于稳定。这时再执行学习率衰减&＃xff0c; 可以使得模型获得更快的收敛速度。warm-up 有助于减缓模型在初始阶段对mini-batch的提前过拟合现象&＃xff0c;保持分布的平稳。

Paddle OCR所涉及的学习率衰减策略

带有warm-up的线性学习率衰减——Linear learning rate decay。
带有warm-up的余弦学习率衰减——Cosine learning rate decay &＃xff1a;lr &＃61; 0.05 * (math.cos(epoch * (math.pi / epochs)) &＃43; 1)。效果图如下&＃xff08;图一未预热&＃xff0c;图二预热&＃xff09;&＃xff1a;
带有warm-up的分段学习率衰减——Piecewise learning rate decay。效果图如下&＃xff08;图一未预热&＃xff0c;图二预热&＃xff09;&＃xff1a;
带有warm-up的循环余弦学习率衰减——Cyclical cosine learning rate decay

二、学习率衰减策略与代码分析

1.PP-OCR的学习率衰减策略

学习率是控制学习速度的超参数。学习率越低&＃xff0c;损失值变化越慢。虽然使用较低的学习率可以确保不会错过任何局部最小值&＃xff0c;但这也意味着收敛速度较慢。在训练的早期&＃xff0c;权值处于随机初始化状态&＃xff0c;因此我们可以设置较大的学习率&＃xff0c;以更快地收敛。在训练的后期&＃xff0c;由于权值接近最优值&＃xff0c;所以应该使用较小的学习率。余弦学习速率衰减已成为首选的提高模型精度的学习速率降低策略。在整个训练过程中&＃xff0c;余弦学习速率衰减保持较大的学习速率&＃xff0c;因此其收敛速度较慢&＃xff0c;但最终收敛精度较好。下图是不同的学习速率衰减方式的比较&＃xff1a;

2.代码分析

代码位置&＃xff1a;

主要代码段&＃xff1a;

lr_scheduler.py 循环余弦学习率衰减
class CyclicalCosineDecay(LRScheduler):def __init__(self,learning_rate,T_max,cycle&＃61;1,last_epoch&＃61;-1,eta_min&＃61;0.0,verbose&＃61;False):"""Cyclical cosine learning rate decay 周期余弦学习率衰减A learning rate which can be referred in https://arxiv.org/pdf/2012.12645.pdfArgs:learning rate(float): learning rateT_max(int): maximum epoch numcycle(int): period of the cosine decaylast_epoch (int, optional): The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.eta_min(float): minimum learning rate during trainingverbose(bool): whether to print learning rate for each epoch"""super(CyclicalCosineDecay, self).__init__(learning_rate, last_epoch,verbose)self.cycle &＃61; cycleself.eta_min &＃61; eta_mindef get_lr(self):if self.last_epoch &＃61;&＃61; 0:return self.base_lrreletive_epoch &＃61; self.last_epoch % self.cyclelr &＃61; self.eta_min &＃43; 0.5 * (self.base_lr - self.eta_min) * \(1 &＃43; math.cos(math.pi * reletive_epoch / self.cycle))return lr
learning_rate.py 其他学习率衰减

class Linear(object):"""Linear learning rate decay 线性学习率衰减Args:lr (float): The initial learning rate. It is a python float number.epochs(int): The decay step size. It determines the decay cycle.end_lr(float, optional): The minimum final learning rate. Default: 0.0001.power(float, optional): Power of polynomial. Default: 1.0.last_epoch (int, optional): The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate."""def __init__(self,learning_rate,epochs,step_each_epoch,end_lr&＃61;0.0,power&＃61;1.0,warmup_epoch&＃61;0,last_epoch&＃61;-1,**kwargs):super(Linear, self).__init__()self.learning_rate &＃61; learning_rateself.epochs &＃61; epochs * step_each_epochself.end_lr &＃61; end_lrself.power &＃61; powerself.last_epoch &＃61; last_epochself.warmup_epoch &＃61; round(warmup_epoch * step_each_epoch)def __call__(self):learning_rate &＃61; lr.PolynomialDecay(learning_rate&＃61;self.learning_rate,decay_steps&＃61;self.epochs,end_lr&＃61;self.end_lr,power&＃61;self.power,last_epoch&＃61;self.last_epoch)if self.warmup_epoch > 0:learning_rate &＃61; lr.LinearWarmup(learning_rate&＃61;learning_rate,warmup_steps&＃61;self.warmup_epoch,start_lr&＃61;0.0,end_lr&＃61;self.learning_rate,last_epoch&＃61;self.last_epoch)return learning_rateclass Cosine(object):"""Cosine learning rate decay 余弦学习率衰减lr &＃61; 0.05 * (math.cos(epoch * (math.pi / epochs)) &＃43; 1)Args:lr(float): initial learning ratestep_each_epoch(int): steps each epochepochs(int): total training epochslast_epoch (int, optional): The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate."""def __init__(self,learning_rate,step_each_epoch,epochs,warmup_epoch&＃61;0,last_epoch&＃61;-1,**kwargs):super(Cosine, self).__init__()self.learning_rate &＃61; learning_rateself.T_max &＃61; step_each_epoch * epochsself.last_epoch &＃61; last_epochself.warmup_epoch &＃61; round(warmup_epoch * step_each_epoch)def __call__(self):learning_rate &＃61; lr.CosineAnnealingDecay(learning_rate&＃61;self.learning_rate,T_max&＃61;self.T_max,last_epoch&＃61;self.last_epoch)if self.warmup_epoch > 0:learning_rate &＃61; lr.LinearWarmup(learning_rate&＃61;learning_rate,warmup_steps&＃61;self.warmup_epoch,start_lr&＃61;0.0,end_lr&＃61;self.learning_rate,last_epoch&＃61;self.last_epoch)return learning_rateclass Step(object):"""Piecewise learning rate decay 分段学习率衰减的分段Args:step_each_epoch(int): steps each epochlearning_rate (float): The initial learning rate. It is a python float number.step_size (int): the interval to update.gamma (float, optional): The Ratio that the learning rate will be reduced. &＃96;&＃96;new_lr &＃61; origin_lr * gamma&＃96;&＃96; .It should be less than 1.0. Default: 0.1.last_epoch (int, optional): The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate."""def __init__(self,learning_rate,step_size,step_each_epoch,gamma,warmup_epoch&＃61;0,last_epoch&＃61;-1,**kwargs):super(Step, self).__init__()self.step_size &＃61; step_each_epoch * step_sizeself.learning_rate &＃61; learning_rateself.gamma &＃61; gammaself.last_epoch &＃61; last_epochself.warmup_epoch &＃61; round(warmup_epoch * step_each_epoch)def __call__(self):learning_rate &＃61; lr.StepDecay(learning_rate&＃61;self.learning_rate,step_size&＃61;self.step_size,gamma&＃61;self.gamma,last_epoch&＃61;self.last_epoch)if self.warmup_epoch > 0:learning_rate &＃61; lr.LinearWarmup(learning_rate&＃61;learning_rate,warmup_steps&＃61;self.warmup_epoch,start_lr&＃61;0.0,end_lr&＃61;self.learning_rate,last_epoch&＃61;self.last_epoch)return learning_rateclass Piecewise(object):"""Piecewise learning rate decay 分段学习率衰减Args:boundaries(list): A list of steps numbers. The type of element in the list is python int.values(list): A list of learning rate values that will be picked during different epoch boundaries.The type of element in the list is python float.last_epoch (int, optional): The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate."""def __init__(self,step_each_epoch,decay_epochs,values,warmup_epoch&＃61;0,last_epoch&＃61;-1,**kwargs):super(Piecewise, self).__init__()self.boundaries &＃61; [step_each_epoch * e for e in decay_epochs]self.values &＃61; valuesself.last_epoch &＃61; last_epochself.warmup_epoch &＃61; round(warmup_epoch * step_each_epoch)def __call__(self):learning_rate &＃61; lr.PiecewiseDecay(boundaries&＃61;self.boundaries,values&＃61;self.values,last_epoch&＃61;self.last_epoch)if self.warmup_epoch > 0:learning_rate &＃61; lr.LinearWarmup(learning_rate&＃61;learning_rate,warmup_steps&＃61;self.warmup_epoch,start_lr&＃61;0.0,end_lr&＃61;self.values[0],last_epoch&＃61;self.last_epoch)return learning_rateclass CyclicalCosine(object):"""Cyclical cosine learning rate decay 循环余弦学习率衰减Args:learning_rate(float): initial learning ratestep_each_epoch(int): steps each epochepochs(int): total training epochscycle(int): period of the cosine learning ratelast_epoch (int, optional): The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate."""def __init__(self,learning_rate,step_each_epoch,epochs,cycle,warmup_epoch&＃61;0,last_epoch&＃61;-1,**kwargs):super(CyclicalCosine, self).__init__()self.learning_rate &＃61; learning_rateself.T_max &＃61; step_each_epoch * epochsself.last_epoch &＃61; last_epochself.warmup_epoch &＃61; round(warmup_epoch * step_each_epoch)self.cycle &＃61; round(cycle * step_each_epoch)def __call__(self):learning_rate &＃61; CyclicalCosineDecay(learning_rate&＃61;self.learning_rate,T_max&＃61;self.T_max,cycle&＃61;self.cycle,last_epoch&＃61;self.last_epoch)if self.warmup_epoch > 0:learning_rate &＃61; lr.LinearWarmup(learning_rate&＃61;learning_rate,warmup_steps&＃61;self.warmup_epoch,start_lr&＃61;0.0,end_lr&＃61;self.learning_rate,last_epoch&＃61;self.last_epoch)return learning_rate

总结

以上是今天PP-OCR文字识别模型的学习率衰减策略的相关介绍。之后将会继续介绍PP-OCR文字识别模型的其他策略。

推荐阅读

tree
PyTorch常见预训练模型的下载链接及使用指南

本文提供了PyTorch框架中常用的预训练模型的下载链接及详细使用指南，涵盖ResNet、Inception、DenseNet、AlexNet、VGGNet等六大分类模型。每种模型的预训练参数均经过精心调优，适用于多种计算机视觉任务。文章不仅介绍了模型的下载方式，还详细说明了如何在实际项目中高效地加载和使用这些模型，为开发者提供全面的技术支持。 ... [详细]

蜡笔小新 2024-10-27 13:57:42
instance
深入解析 org.hibernate.event.spi.EventSource.getFactory() 方法及其应用实例

深入解析 org.hibernate.event.spi.EventSource.getFactory() 方法及其应用实例 ... [详细]

蜡笔小新 2024-10-26 16:51:55
tags
字符串对比竟也暗藏玄机，你是否认同？

在探讨字符串对比技术时，本文通过两个具体案例深入剖析了其背后的复杂性与技巧。首先，案例一部分详细介绍了需求背景、分析过程及两种不同的代码实现方法，并进行了总结。接着，案例二同样从需求描述出发，逐步解析问题并提供解决方案，旨在揭示字符串处理中容易被忽视的关键细节和技术挑战。 ... [详细]

蜡笔小新 2024-10-26 09:49:24
tree
如何使用 net.sf.extjwnl.data.Word 类及其代码示例详解

如何使用 net.sf.extjwnl.data.Word 类及其代码示例详解 ... [详细]

蜡笔小新 2024-11-01 19:30:32
tree
Python与R语言的功能对比及应用场景分析

Python与R语言在功能和应用场景上各有优势。尽管R语言在统计分析和数据可视化方面具有更强的专业性，但Python作为一种通用编程语言，适用于更广泛的领域，包括Web开发、自动化脚本和机器学习等。对于初学者而言，Python的学习曲线更为平缓，上手更加容易。此外，Python拥有庞大的社区支持和丰富的第三方库，使其在实际应用中更具灵活性和扩展性。 ... [详细]

蜡笔小新 2024-11-01 18:37:10
instance
Android动画简介与类型分析

本文介绍了Android动画的基本概念及其主要类型。Android动画主要包括三种形式：视图动画（也称为补间动画或Tween动画），主要通过改变视图的属性来实现动态效果；帧动画，通过顺序播放一系列预定义的图像来模拟动画效果；以及属性动画，通过对对象的属性进行平滑过渡来创建更加复杂的动画效果。每种类型的动画都有其独特的应用场景和实现方式，开发者可以根据具体需求选择合适的动画类型。 ... [详细]

蜡笔小新 2024-11-01 15:31:02
instance
开发心得：深入探讨Servlet、Dubbo与MyBatis中的责任链模式应用

开发心得：深入探讨Servlet、Dubbo与MyBatis中的责任链模式应用 ... [详细]

蜡笔小新 2024-10-31 20:30:19
input
深入解析 MXOTDLL.dll 在 C# 中的应用与优化策略

本文深入探讨了 MXOTDLL.dll 在 C# 环境中的应用与优化策略。针对近期公司从某生物技术供应商采购的指纹识别设备，该设备提供的 DLL 文件是用 C 语言编写的。为了更好地集成到现有的 C# 系统中，我们对原生的 C 语言 DLL 进行了封装，并利用 C# 的互操作性功能实现了高效调用。此外，文章还详细分析了在实际应用中可能遇到的性能瓶颈，并提出了一系列优化措施，以确保系统的稳定性和高效运行。 ... [详细]

蜡笔小新 2024-10-31 17:21:11
nodejs
MySQL 错误：检测到死锁，在尝试获取锁时；建议重启事务（Node.js 环境）

在 Node.js 环境中，MySQL 数据库操作时遇到了“检测到死锁，在尝试获取锁时；建议重启事务”的错误。本文将探讨该错误的原因，并提供有效的解决策略，包括事务管理优化和锁机制的理解。 ... [详细]

蜡笔小新 2024-10-31 16:30:01
nodejs
Go语言实现Redis客户端与服务器的交互机制深入解析

在前文对Godis v1.0版本的基础功能进行了详细介绍后，本文将重点探讨如何实现客户端与服务器之间的交互机制。通过具体代码实现，使客户端与服务器能够顺利通信，赋予项目实际运行的能力。本文将详细解析Go语言在实现这一过程中的关键技术和实现细节，帮助读者深入了解Redis客户端与服务器的交互原理。 ... [详细]

蜡笔小新 2024-10-30 18:27:00
blob
利用PaddleSharp模块在C#中实现图像文字识别功能测试

PaddleSharp 是 PaddleInferenceCAPI 的 C# 封装库，适用于 Windows (x64)、NVIDIA GPU 和 Linux (Ubuntu 20.04) 等平台。本文详细介绍了如何使用 PaddleSharp 在 C# 环境中实现图像文字识别功能，并进行了全面的功能测试，验证了其在多种硬件配置下的稳定性和准确性。 ... [详细]

蜡笔小新 2024-10-30 15:53:37
const
UVa 11978 福岛核爆问题：圆与多边形的交集面积计算及二分法应用

题目《UVa 11978 福岛核爆问题》涉及圆与多边形交集面积的计算及二分法的应用。该问题的核心在于通过精确的几何运算与高效的算法实现来解决复杂图形的面积计算。在实现过程中，特别需要注意的是对多边形顶点的平移处理，确保所有顶点包括最后一个顶点 \( p[n] \) 都经过正确的位移，以避免因细节疏忽导致的错误。此外，使用循环次数为50次的二分法能够有效提高算法的精度和稳定性。 ... [详细]

蜡笔小新 2024-10-30 12:36:08
const
优化后的标题：校园互联新方案：10397连接教育未来

优化后的标题：校园互联新方案：10397连接教育未来 ... [详细]

蜡笔小新 2024-10-30 10:30:24
const
2019年斯坦福大学CS224n课程笔记：深度学习在自然语言处理中的应用——Word2Vec与GloVe模型解析

本文详细解析了2019年斯坦福大学CS224n课程中关于深度学习在自然语言处理（NLP）领域的应用，重点探讨了Word2Vec和GloVe两种词嵌入模型的原理与实现方法。通过具体案例分析，深入阐述了这两种模型在提升NLP任务性能方面的优势与应用场景。 ... [详细]

蜡笔小新 2024-10-29 10:37:07
input
Python正则表达式详解：掌握数量词用法轻松上手

Python正则表达式详解：掌握数量词用法轻松上手 ... [详细]

蜡笔小新 2024-10-28 09:12:57

莪乜子12

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章