当前位置: 开发笔记 > 编程语言 > 正文

Mask-RCNN源码阅读笔记

作者：双豆儿_668 | 来源：互联网 | 2024-09-25 12:12

阅读了https:blog.csdn.netu011974639articledetails78483779?locationNum9&fps1这篇博客这篇博客介

阅读了https://blog.csdn.net/u011974639/article/details/78483779?locatiOnNum=9&fps=1这篇博客

这篇博客介绍了几个ipynb格式的代码,但没有其他python文件（包括coco源码）解析；

这些天研读了一下那些源码，有错误，忘大神指正批评。~~~~~~~~

一直存在草稿箱里没有发。。。

######################### 分隔符 ###########################################

读coco.py笔记：

coco 提供了图片，和一个图片可能的（多个）标注（annotations）,coco源码里简称为ann

一个image对应一个img_id;

一个image_id可以有多个 annotations代码简称为anns;

每个ann对应有它的类别category;

多个种类源码中用cats；

所以一个图片或者说一个img_id, 就对应了多个anns,和多个cats；一个图片就对应了多种类别的mask；

mask: [instance_number,(y1,x1,y2,x2)]

anchors: [anchor_count, (y1,x1,y2,x2)]

所以生成的结果是：

截自博客https://blog.csdn.net/u011974639/article/details/78483779?locatiOnNum=9&fps=1

RPN_ANCHOR_SCALES = (32,64,128,256,512)

1.Create model in training mode：

创建模型，就相当于创建一个骨架放在那里，此时还没有往里面传实际的数据

model = modellib.MaskRCNN(mode="training", cOnfig=config,model_dir=MODEL_DIR)

分析class MaskRCNN():

1.a Inputs

（1）使用keras.Layer.Input（）得到input_image和input_image_meta，创建输入层的骨架

（2） RPN GT

使用keras.Layer.Input（）得到input_rpn_match [None,1]，input_rpn_bbox [None,4]

（3）Detection GT (class IDs, bounding boxes, and masks)

使用keras.Layer.Input（）得到

# 1. GT Class IDs (zero padded) : input_gt_class_ids [None],

# 2. GT Boxes in pixels (zero padded)
# [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] in image coordinates : input_gt_boxes

# Normalize coordinates

# 3. GT Masks (zero padded) 使用keras.Layer.Input（）得到

# [batch, height, width, MAX_GT_INSTANCES]

1.b Build the shared convolutional layers

FPN的结构最后得到

rpn_feature_maps = [P2, P3, P4, P5, P6]

mrcnn_feature_maps = [P2, P3, P4, P5] depth都是256

# Generate Anchors

"""Generate anchors at different levels of a feature pyramid. Each scale
    is associated with a level of the pyramid, but each ratio is used in
    all levels of the pyramid.

    Returns:
    anchors: [N, (y1, x1, y2, x2)]. All generated anchors in one array. Sorted
        with the same order of the given scales. So, anchors of scale[0] come
        first, then anchors of scale[1], and so on.
    """

anchors是根据config配置的 RPN_ANCHOR_SCALES= (32, 64, 128, 256, 512) # Length of square anchor side in pixels 的每个scale遍历

最终得到所有的像素pixel对应的所有的anchors（一个像素3个anchors），这和之后的RPN得到的是对应的，从而能够根据RPN在根据ProposalLayer里面得到的Indice得到对应indice的anchors（和后面介绍结合就明白了），这样就过滤了anchors。

1.c RPN Model

rpn = build_rpn_model(config.RPN_ANCHOR_STRIDE,

len(config.RPN_ANCHOR_RATIOS), 256)

ayer_outputs = [] # list of lists
for p in rpn_feature_maps:
layer_outputs.append(rpn([p]))

"""Builds a Keras model of the Region Proposal Network.
    It wraps the RPN graph so it can be used multiple times with shared
    weights.

    anchors_per_location: number of anchors per pixel in the feature map
    anchor_stride: Controls the density of anchors. Typically 1 (anchors for
                   every pixel in the feature map), or 2 (every other pixel).
    depth: Depth of the backbone feature map.

    Returns a Keras Model object. The model outputs, when called, are:
    rpn_logits: [batch, H, W, 2] Anchor classifier logits (before softmax)
    rpn_probs: [batch, W, W, 2] Anchor classifier probabilities.               rpn_probs就是每个anchor的score(2是前景和背景),
    rpn_bbox: [batch, H, W, (dy, dx, log(dh), log(dw))] Deltas to be
                applied to anchors.

"""

depth是传入的 1.b 获得的feature map的depth都是256, 其实最后返回的 rpn_probs(Anchor Score) 的 shape应该是 [batch, anchors, 2],由[batch, height, width, anchors per location * 2] reshape后得来。

rpn_box(Bounding box refinement.) [batch, H, W, anchors per location, depth]

# where depth is [x, y, log(w), log(h)] ，Reshape to [batch, anchors, 4]

这就和anchors的形状对应上了~

将每个 1.b 得到的 feature map 送进模型 RPN model，得到每层的feature_map的 rpn_logits,rpn_probs,rpn_bbox,然后把所有层相对应的rpn_logits连在一起，另两个一样。

1.d Generate proposals
# Proposals are [batch, N, (y1, x1, y2, x2)] in normalized coordinates
# and zero padded.

函数：
          rpn_rois = ProposalLayer(proposal_count=proposal_count,
                                 nms_threshold=config.RPN_NMS_THRESHOLD,
                                 name="ROI",
                                 anchors=self.anchors,
                                 cOnfig=config)([rpn_class, rpn_bbox])

ProposalLayer

"""Receives anchor scores and selects a subset to pass as proposals
    to the second stage. Filtering is done based on anchor scores and
    non-max suppression to remove overlaps. It also applies bounding
    box refinement deltas to anchors.

    Inputs:
        rpn_probs: [batch, anchors, (bg prob, fg prob)]
        rpn_bbox: [batch, anchors, (dy, dx, log(dh), log(dw))]

    Returns:
        Proposals in normalized coordinates [batch, rois, (y1, x1, y2, x2)]
    """

anchors就是网络中后5个阶段stages的所有anchors；

1.d.1 ProposalLayer类里的call函数

（1）call函数接收inputs，就是上面的 rpn_probs，rpn_bbox；然后把rpn_probs里的前景作为scores，取前K个scores大的anchors的下标（indices）,然后根据这个indices，筛选出对应的scores,deltas,和anchors.

（2）# Apply deltas to anchors to get refined anchors. 对anchors根据deltas (dy,dx,dh,dw)进行调整

# 返回调整后的boxes : [batch, N, (y1, x1, y2, x2)] 相当于是新的anchors，因为更接近GT（ground truth）所以取名为boxes

（3）# Clip to image boundaries. [batch, N, (y1, x1, y2, x2)] 相当于把Boxes的边框限制在images的边界内

（4） Filter out small boxes

4.a Normalize dimensions to range of 0 to 1. --->normalized_boxes

4.b Non-max suppression 传入normalized_boxes和scores,进行NMS处理得到下标indices，然后根据indices筛选出对应的normalized_boxes,得到proposals（rpn_rois）. # Pad if needed

1.e Generate detection targets

# Subsamples proposals and generates target outputs for training
# Note that proposal class IDs, gt_boxes, and gt_masks are zero

# padded. Equally, returned rois and targets are zero padded.

class DetectionTargetLayer(KE.Layer):

"""Subsamples proposals and generates target box refinement, class_ids,
    and masks for each.

    Inputs:
    proposals: [batch, N, (y1, x1, y2, x2)] in normalized coordinates. Might
               be zero padded if there are not enough proposals.
    gt_class_ids: [batch, MAX_GT_INSTANCES] Integer class IDs.
    gt_boxes: [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] in normalized
              coordinates.
    gt_masks: [batch, height, width, MAX_GT_INSTANCES] of boolean type

    Returns: Target ROIs and corresponding class IDs, bounding box shifts,
    and masks.
    rois: [batch, TRAIN_ROIS_PER_IMAGE, (y1, x1, y2, x2)] in normalized
          coordinates
    target_class_ids: [batch, TRAIN_ROIS_PER_IMAGE]. Integer class IDs.
    target_deltas: [batch, TRAIN_ROIS_PER_IMAGE, NUM_CLASSES,
                    (dy, dx, log(dh), log(dw), class_id)]
                   Class-specific bbox refinements.
    target_mask: [batch, TRAIN_ROIS_PER_IMAGE, height, width)
                 Masks cropped to bbox boundaries and resized to neural
                 network output size.

    Note: Returned arrays might be zero padded if not enough target ROIs.

"""

# Compute overlaps matrix [proposals, gt_boxes]
overlaps = overlaps_graph(proposals, gt_boxes) 得到每个proposal对应于所有MAX_GT_INSTANCES数目的GT的IOU

tf.reduce_max(overlaps, axis=1) 计算每个proposal的最大值（每行的最大值）

返回的rois等包括positive的proposals和negative的proposals和 0 padded

1.f Network Heads

1.f.1 fpn_classifier_graph()

进行分类和bbox回归

"""Builds the computation graph of the feature pyramid network classifier
and regressor heads.

rois: [batch, num_rois, (y1, x1, y2, x2)] Proposal boxes in normalized
coordinates.
feature_maps: List of feature maps from diffent layers of the pyramid,
[P2, P3, P4, P5]. Each has a different resolution.
image_shape: [height, width, depth]
pool_size: The width of the square feature map generated from ROI Pooling.
num_classes: number of classes, which determines the depth of the results
train_bn: Boolean. Train or freeze Batch Norm layres

Returns:
logits: [N, NUM_CLASSES] classifier logits (before softmax)
probs: [N, NUM_CLASSES] classifier probabilities
bbox_deltas: [N, (dy, dx, log(dh), log(dw))] Deltas to apply to
proposal boxes

"""

1.f.1.1 PyramidROIAlign

"""Implements ROI Pooling on multiple levels of the feature pyramid.

Params:
- pool_shape: [height, width] of the output pooled regions. Usually [7, 7]
- image_shape: [height, width, channels]. Shape of input image in pixels

Inputs:
- boxes: [batch, num_boxes, (y1, x1, y2, x2)] in normalized
coordinates. Possibly padded with zeros if not enough
boxes to fill the array.
- Feature maps: List of feature maps from different levels of the pyramid.
Each is [batch, height, width, channels]

Output:
Pooled regions in the shape: [batch, num_boxes, height, width, channels].
The width and height are those specific in the pool_shape in the layer
constructor.
"""

ROIAlign 的原理： http://blog.leanote.com/post/afanti.deng@gmail.com/b5f4f526490b

1.f.2 build_fpn_mask_graph()

"""Builds the computation graph of the mask head of Feature Pyramid Network.

rois: [batch, num_rois, (y1, x1, y2, x2)] Proposal boxes in normalized
coordinates.
feature_maps: List of feature maps from diffent layers of the pyramid,
[P2, P3, P4, P5]. Each has a different resolution.
image_shape: [height, width, depth]
pool_size: The width of the square feature map generated from ROI Pooling.
num_classes: number of classes, which determines the depth of the results
train_bn: Boolean. Train or freeze Batch Norm layres

Returns: Masks [batch, roi_count, height, width, num_classes]
"""

最后返回model

调用model.train进行模型的训练（此时将dataset传入）

utils.py文件里：

np.meshgrid()将一个一维数组变成二维矩阵 https://www.cnblogs.com/sunshinewang/p/6897966.html

config.py文件里：

BACKBONE_SHAPES 就是 feature map shape :[256 256] [128 128] [64 64] [32 32] [16 16]...

根据scores选前K个anchors, 然后refine剩下的anchors，然后将边框限定在image的边界，nomorlize之后进行NMS获得最终的proposals。

整个流程：

1.RPN

1.a RPN Targets

The RPN targets are the training values for the RPN. To generate the targets, we start with a grid of anchors that cover the full image at different scales, and then we compute the IoU of the anchors with ground truth object. Positive anchors are those that have an IoU >= 0.7 with any ground truth object, and negative anchors are those that don't cover any object by more than 0.3 IoU. Anchors in between (i.e. cover an object by IoU >= 0.3 but <0.7) are considered neutral and excluded from training.

To train the RPN regressor, we also compute the shift and resizing needed to make the anchor cover the ground truth object completely.

# Generate RPN trainig targets
# target_rpn_match is 1 for positive anchors, -1 for negative anchors
# and 0 for neutral anchors.
target_rpn_match, target_rpn_bbox = modellib. build_rpn_targets(

image.shape, model.anchors, gt_class_id, gt_bbox, model.config)

#### target_rpn_bbox最多是在config里面指定的，实际上存的有数据数量的是positive_anchors的数目，存的数据内容实际上是(dy, dx, log(dh), log(dw))【这个是通过positive_anchors和对应的GT_bboxs计算得到】。

然后调用utils.apply_box_deltas()函数对positive_anchors根据dy, dx, log(dh), log(dw) 进行微调

实线是微调后的，虚线是微调前的

复习了一下Faster-RCNN

https://blog.csdn.net/u013832707/article/details/53641055/

对于窗口（proposal）是用的是，中心点和长宽标记 P = (px, py, pw, ph), 其实真正的P是这个窗口对应的CNN特征

边框回归：（1）先做平移；（2）后做尺度缩放；所以需要学习四个变量：dx(P),dy(P), dw(P), dh(P);

推荐阅读

get
CentOS7源码编译安装MySQL5.6

2019独角兽企业重金招聘Python工程师标准一、先在cmake官网下个最新的cmake源码包cmake官网：https:www.cmake.org如此时最新 ... [详细]

蜡笔小新 2024-12-27 17:49:56
string
解析Java中Text.splitText()方法及其应用场景

本文详细介绍了Java中org.w3c.dom.Text类的splitText()方法，通过多个代码示例展示了其实际应用。该方法用于将文本节点在指定位置拆分为两个节点，并保持在文档树中。 ... [详细]

蜡笔小新 2024-12-26 18:31:42
uri
基于KVM的SRIOV直通配置及性能测试

SRIOV介绍、VF直通配置，以及包转发率性能测试小慢哥的原创文章，欢迎转载目录?1.SRIOV介绍?2.环境说明?3.开启SRIOV?4.生成VF?5.VF ... [详细]

蜡笔小新 2024-12-25 19:26:39
string
PHP Eloquent ORM 中的关联查询扩展

本文探讨了如何在 PHP 的 Eloquent ORM 中实现数据表之间的关联查询，并通过具体示例详细解释了如何将关联数据嵌入到查询结果中。这不仅提高了数据查询的效率，还简化了代码逻辑。 ... [详细]

蜡笔小新 2024-12-25 18:14:14
string
深入解析Spring Cloud Ribbon负载均衡机制

本文详细介绍了Spring Cloud中的Ribbon组件如何实现服务调用的负载均衡。通过分析其工作原理、源码结构及配置方式，帮助读者理解Ribbon在分布式系统中的重要作用。 ... [详细]

蜡笔小新 2024-12-27 16:01:25
string
网络运维工程师的前景与薪酬分析

网络运维工程师负责确保企业IT基础设施的稳定运行，保障业务连续性和数据安全。他们需要具备多种技能，包括搭建和维护网络环境、监控系统性能、处理突发事件等。本文将探讨网络运维工程师的职业前景及其平均薪酬水平。 ... [详细]

蜡笔小新 2024-12-26 14:35:04
get
Python文本处理与可视化：分词及词云生成

本文介绍如何使用Python进行文本处理，包括分词和生成词云图。通过整合多个文本文件、去除停用词并生成词云图，展示文本数据的可视化分析方法。 ... [详细]

蜡笔小新 2024-12-26 08:37:18
string
dotnet 通过 Elmish.WPF 使用 F# 编写 WPF 应用

本文来安利大家一个有趣而且强大的库，通过F#和C#混合编程编写WPF应用，可以在WPF中使用到F#强大的数据处理能力在GitHub上完全开源Elmis ... [详细]

蜡笔小新 2024-12-25 16:06:42
string
从零开始构建完整手机站：Vue CLI 3 实战指南（第一部分）

本系列教程将引导您使用 Vue CLI 3 构建一个功能齐全的移动应用。我们将深入探讨项目中涉及的每一个知识点，并确保这些内容与实际工作中的需求紧密结合。 ... [详细]

蜡笔小新 2024-12-26 13:30:37
config
百度服务再次遭遇技术问题，疑似DNS解析故障

近日晚间，百度多项在线服务出现加载异常，包括移动端搜索在内的多个功能受到影响。初步迹象表明，问题可能与DNS服务器解析有关。 ... [详细]

蜡笔小新 2024-12-26 12:52:25
config
Photoshop 教程全解

掌握 Photoshop 是学习网页设计的重要一步。本文将详细介绍 Photoshop 的基础与进阶功能，帮助您更好地进行图像处理和网页设计。推荐使用最新版本的 Photoshop，以体验更强大的功能和更高的效率。 ... [详细]

蜡笔小新 2024-12-26 09:08:14
python
Python实现照片磨皮效果

本文介绍如何使用Python和OpenCV库来实现照片的磨皮效果，使图片更加平滑并提升整体美感。 ... [详细]

蜡笔小新 2024-12-25 20:30:59
string
深入解析JDBC源码

本文详细探讨了JDBC（Java数据库连接）的内部机制，重点分析其作为服务提供者接口（SPI）框架的应用。通过类图和代码示例，展示了JDBC如何注册驱动程序、建立数据库连接以及执行SQL查询的过程。 ... [详细]

蜡笔小新 2024-12-25 19:59:15
config
如何彻底清除顽固软件如360

本文详细介绍了如何彻底卸载难以删除的软件，如360安全卫士。这类软件不仅难以卸载，还会在开机时启动多个应用，影响系统性能。我们将提供两种有效的方法来帮助您彻底清理这些顽固软件。 ... [详细]

蜡笔小新 2024-12-25 14:25:26
get
Python 中读取文件和图片的元数据日期

本文介绍如何使用 Python 获取文件和图片的创建、修改及拍摄日期。通过多种方法，如 PIL 库的 _getexif() 函数和 os 模块的 getmtime() 和 stat() 方法，详细讲解了这些技术的应用场景和注意事项。 ... [详细]

蜡笔小新 2024-12-25 13:04:12