热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

Mask-RCNN源码阅读笔记

阅读了https:blog.csdn.netu011974639articledetails78483779?locationNum9&fps1这篇博客这篇博客介

阅读了https://blog.csdn.net/u011974639/article/details/78483779?locatiOnNum=9&fps=1这篇博客

这篇博客介绍了几个ipynb格式的代码,但没有其他python文件(包括coco源码)解析;

这些天研读了一下那些源码,有错误,忘大神指正批评。~~~~~~~~

一直存在草稿箱里没有发。。。

######################### 分隔符 ###########################################

读coco.py笔记:

coco 提供了图片,和一个图片可能的(多个)标注(annotations),coco源码里简称为ann

一个image对应一个img_id;

一个image_id可以有多个 annotations代码简称为anns;

每个ann对应有它的类别category;

多个种类源码中用cats;

所以一个图片或者说一个img_id, 就对应了多个anns,和多个cats;一个图片就对应了多种类别的mask;

mask: [instance_number,(y1,x1,y2,x2)]

anchors: [anchor_count, (y1,x1,y2,x2)]

所以生成的结果是:

截自博客https://blog.csdn.net/u011974639/article/details/78483779?locatiOnNum=9&fps=1

RPN_ANCHOR_SCALES = (32,64,128,256,512)

1.Create model in training mode:

创建模型,就相当于创建一个骨架放在那里,此时还没有往里面传实际的数据

model = modellib.MaskRCNN(mode="training", cOnfig=config,model_dir=MODEL_DIR)

分析class MaskRCNN():

1.a Inputs

(1)使用keras.Layer.Input()得到input_image和input_image_meta,创建输入层的骨架

(2) RPN GT

    使用keras.Layer.Input()得到input_rpn_match [None,1],input_rpn_bbox [None,4]

(3)Detection GT (class IDs, bounding boxes, and masks)

    使用keras.Layer.Input()得到

   # 1. GT Class IDs (zero padded) :  input_gt_class_ids [None],

    # 2. GT Boxes in pixels (zero padded)
    # [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] in image coordinates : input_gt_boxes

    # Normalize coordinates

    # 3. GT Masks (zero padded)      使用keras.Layer.Input()得到

    # [batch, height, width, MAX_GT_INSTANCES]

1.b Build the shared convolutional layers

FPN的结构最后得到

rpn_feature_maps = [P2, P3, P4, P5, P6] 

mrcnn_feature_maps = [P2, P3, P4, P5]   depth都是256

# Generate Anchors 

"""Generate anchors at different levels of a feature pyramid. Each scale
    is associated with a level of the pyramid, but each ratio is used in
    all levels of the pyramid.

    Returns:
    anchors: [N, (y1, x1, y2, x2)]. All generated anchors in one array. Sorted
        with the same order of the given scales. So, anchors of scale[0] come
        first, then anchors of scale[1], and so on.
    """

   anchors是根据config配置的 RPN_ANCHOR_SCALES= (32, 64, 128, 256, 512) # Length of square anchor side in pixels     的每个scale遍历

最终得到所有的像素pixel对应的所有的anchors(一个像素3个anchors),这和之后的RPN得到的是对应的,从而能够根据RPN在根据ProposalLayer里面得到的Indice得到对应indice的anchors(和后面介绍结合就明白了),这样就过滤了anchors。

1.c RPN Model

rpn = build_rpn_model(config.RPN_ANCHOR_STRIDE,

                              len(config.RPN_ANCHOR_RATIOS), 256)

ayer_outputs = []  # list of lists
        for p in rpn_feature_maps:
            layer_outputs.append(rpn([p]))

"""Builds a Keras model of the Region Proposal Network.
    It wraps the RPN graph so it can be used multiple times with shared
    weights.

    anchors_per_location: number of anchors per pixel in the feature map
    anchor_stride: Controls the density of anchors. Typically 1 (anchors for
                   every pixel in the feature map), or 2 (every other pixel).
    depth: Depth of the backbone feature map.

    Returns a Keras Model object. The model outputs, when called, are:
    rpn_logits: [batch, H, W, 2] Anchor classifier logits (before softmax)
    rpn_probs: [batch, W, W, 2] Anchor classifier probabilities.               rpn_probs就是每个anchor的score(2是前景和背景),
    rpn_bbox: [batch, H, W, (dy, dx, log(dh), log(dw))] Deltas to be
                applied to anchors.

    """

depth是传入的 1.b 获得的feature map的depth都是256, 其实最后返回的 rpn_probs(Anchor Score) 的 shape应该是 [batch, anchors, 2],由[batch, height, width, anchors per location * 2] reshape后得来。

rpn_box(Bounding box refinement.) [batch, H, W, anchors per location, depth]

    # where depth is [x, y, log(w), log(h)]    ,Reshape to [batch, anchors, 4]

这就和anchors的形状对应上了~

将每个 1.b 得到的 feature  map 送进模型 RPN model,得到每层的feature_map的 rpn_logits,rpn_probs,rpn_bbox,然后把所有层相对应的rpn_logits连在一起,另两个一样。

1.d Generate proposals
       # Proposals are [batch, N, (y1, x1, y2, x2)] in normalized coordinates
        # and zero padded.

函数:
          rpn_rois = ProposalLayer(proposal_count=proposal_count,
                                 nms_threshold=config.RPN_NMS_THRESHOLD,
                                 name="ROI",
                                 anchors=self.anchors,
                                 cOnfig=config)([rpn_class, rpn_bbox])

ProposalLayer

"""Receives anchor scores and selects a subset to pass as proposals
    to the second stage. Filtering is done based on anchor scores and
    non-max suppression to remove overlaps. It also applies bounding
    box refinement deltas to anchors.

    Inputs:
        rpn_probs: [batch, anchors, (bg prob, fg prob)]
        rpn_bbox: [batch, anchors, (dy, dx, log(dh), log(dw))]

    Returns:
        Proposals in normalized coordinates [batch, rois, (y1, x1, y2, x2)]
    """

anchors就是网络中后5个阶段stages的所有anchors;

1.d.1 ProposalLayer类里的call函数

(1)call函数接收inputs,就是上面的 rpn_probs,rpn_bbox然后把rpn_probs里的前景作为scores,取前K个scores大的anchors的下标(indices),然后根据这个indices,筛选出对应的scores,deltas,和anchors.

(2)# Apply deltas to anchors to get refined anchors. 对anchors根据deltas (dy,dx,dh,dw)进行调整

        # 返回调整后的boxes :  [batch, N, (y1, x1, y2, x2)]    相当于是新的anchors,因为更接近GT(ground truth)所以取名为boxes

(3)# Clip to image boundaries. [batch, N, (y1, x1, y2, x2)]  相当于把Boxes的边框限制在images的边界内

(4) Filter out small boxes 

    4.a Normalize dimensions to range of 0 to 1.   --->normalized_boxes

    4.b Non-max suppression 传入normalized_boxes和scores,进行NMS处理得到下标indices,然后根据indices筛选出对应的normalized_boxes,得到proposals(rpn_rois).     # Pad if needed

1.e Generate detection targets

# Subsamples proposals and generates target outputs for training
# Note that proposal class IDs, gt_boxes, and gt_masks are zero

# padded. Equally, returned rois and targets are zero padded.

class DetectionTargetLayer(KE.Layer):

"""Subsamples proposals and generates target box refinement, class_ids,
    and masks for each.

    Inputs:
    proposals: [batch, N, (y1, x1, y2, x2)] in normalized coordinates. Might
               be zero padded if there are not enough proposals.
    gt_class_ids: [batch, MAX_GT_INSTANCES] Integer class IDs.
    gt_boxes: [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] in normalized
              coordinates.
    gt_masks: [batch, height, width, MAX_GT_INSTANCES] of boolean type

    Returns: Target ROIs and corresponding class IDs, bounding box shifts,
    and masks.
    rois: [batch, TRAIN_ROIS_PER_IMAGE, (y1, x1, y2, x2)] in normalized
          coordinates
    target_class_ids: [batch, TRAIN_ROIS_PER_IMAGE]. Integer class IDs.
    target_deltas: [batch, TRAIN_ROIS_PER_IMAGE, NUM_CLASSES,
                    (dy, dx, log(dh), log(dw), class_id)]
                   Class-specific bbox refinements.
    target_mask: [batch, TRAIN_ROIS_PER_IMAGE, height, width)
                 Masks cropped to bbox boundaries and resized to neural
                 network output size.

    Note: Returned arrays might be zero padded if not enough target ROIs.

    """

# Compute overlaps matrix [proposals, gt_boxes]
    overlaps = overlaps_graph(proposals, gt_boxes)    得到每个proposal对应于所有MAX_GT_INSTANCES数目的GT的IOU

    tf.reduce_max(overlaps, axis=1)    计算每个proposal的最大值 (每行的最大值)

返回的rois等包括positive的proposals和negative的proposals和 0 padded

1.f Network Heads

1.f.1 fpn_classifier_graph()

进行分类和bbox回归

"""Builds the computation graph of the feature pyramid network classifier
    and regressor heads.


    rois: [batch, num_rois, (y1, x1, y2, x2)] Proposal boxes in normalized
          coordinates.
    feature_maps: List of feature maps from diffent layers of the pyramid,
                  [P2, P3, P4, P5]. Each has a different resolution.
    image_shape: [height, width, depth]
    pool_size: The width of the square feature map generated from ROI Pooling.
    num_classes: number of classes, which determines the depth of the results
    train_bn: Boolean. Train or freeze Batch Norm layres


    Returns:
        logits: [N, NUM_CLASSES] classifier logits (before softmax)
        probs: [N, NUM_CLASSES] classifier probabilities
        bbox_deltas: [N, (dy, dx, log(dh), log(dw))] Deltas to apply to
                     proposal boxes

    """

1.f.1.1 PyramidROIAlign

"""Implements ROI Pooling on multiple levels of the feature pyramid.

    Params:
    - pool_shape: [height, width] of the output pooled regions. Usually [7, 7]
    - image_shape: [height, width, channels]. Shape of input image in pixels


    Inputs:
    - boxes: [batch, num_boxes, (y1, x1, y2, x2)] in normalized
             coordinates. Possibly padded with zeros if not enough
             boxes to fill the array.
    - Feature maps: List of feature maps from different levels of the pyramid.
                    Each is [batch, height, width, channels]


    Output:
    Pooled regions in the shape: [batch, num_boxes, height, width, channels].
    The width and height are those specific in the pool_shape in the layer
    constructor.
    """

ROIAlign 的原理: http://blog.leanote.com/post/afanti.deng@gmail.com/b5f4f526490b

1.f.2 build_fpn_mask_graph()

"""Builds the computation graph of the mask head of Feature Pyramid Network.

    rois: [batch, num_rois, (y1, x1, y2, x2)] Proposal boxes in normalized
          coordinates.
    feature_maps: List of feature maps from diffent layers of the pyramid,
                  [P2, P3, P4, P5]. Each has a different resolution.
    image_shape: [height, width, depth]
    pool_size: The width of the square feature map generated from ROI Pooling.
    num_classes: number of classes, which determines the depth of the results
    train_bn: Boolean. Train or freeze Batch Norm layres


    Returns: Masks [batch, roi_count, height, width, num_classes]
    """
最后返回model 

调用model.train进行模型的训练(此时将dataset传入)


utils.py文件里:

    np.meshgrid()将一个一维数组变成二维矩阵 https://www.cnblogs.com/sunshinewang/p/6897966.html

config.py文件里:

    BACKBONE_SHAPES 就是 feature map shape :[256 256] [128 128] [64 64] [32 32] [16 16]...


    根据scores选前K个anchors, 然后refine剩下的anchors,然后将边框限定在image的边界,nomorlize之后进行NMS获得最终的proposals。


整个流程:

1.RPN

1.a RPN Targets

The RPN targets are the training values for the RPN. To generate the targets, we start with a grid of anchors that cover the full image at different scales, and then we compute the IoU of the anchors with ground truth object. Positive anchors are those that have an IoU >= 0.7 with any ground truth object, and negative anchors are those that don't cover any object by more than 0.3 IoU. Anchors in between (i.e. cover an object by IoU >= 0.3 but <0.7) are considered neutral and excluded from training.

To train the RPN regressor, we also compute the shift and resizing needed to make the anchor cover the ground truth object completely.

# Generate RPN trainig targets
# target_rpn_match is 1 for positive anchors, -1 for negative anchors
# and 0 for neutral anchors.
target_rpn_match, target_rpn_bbox = modellib. build_rpn_targets(

    image.shape, model.anchors, gt_class_id, gt_bbox, model.config)

#### target_rpn_bbox最多是在config里面指定的,实际上存的有数据数量的是positive_anchors的数目,存的数据内容实际上是(dy, dx, log(dh), log(dw))【这个是通过positive_anchors和对应的GT_bboxs计算得到】。

然后调用utils.apply_box_deltas()函数对positive_anchors根据dy, dx, log(dh), log(dw) 进行微调


                    实线是微调后的,虚线是微调前的



复习了一下Faster-RCNN

https://blog.csdn.net/u013832707/article/details/53641055/

对于窗口(proposal)是用的是,中心点和长宽标记 P = (px, py, pw, ph), 其实真正的P是这个窗口对应的CNN特征

边框回归:(1)先做平移;(2)后做尺度缩放;所以需要学习四个变量:dx(P),dy(P), dw(P), dh(P);



推荐阅读
  • 深入解析Spring启动过程
    本文详细介绍了Spring框架的启动流程,帮助开发者理解其内部机制。通过具体示例和代码片段,解释了Bean定义、工厂类、读取器以及条件评估等关键概念,使读者能够更全面地掌握Spring的初始化过程。 ... [详细]
  • 本文探讨了如何利用HTML5和JavaScript在浏览器中进行本地文件的读取和写入操作,并介绍了获取本地文件路径的方法。HTML5提供了一系列API,使得这些操作变得更加简便和安全。 ... [详细]
  • 本文深入探讨了 Delphi 中类对象成员的核心概念,包括 System 单元的基础知识、TObject 类的定义及其方法、TClass 的作用以及对象的消息处理机制。文章不仅解释了这些概念的基本原理,还提供了丰富的补充和专业解答,帮助读者全面理解 Delphi 的面向对象编程。 ... [详细]
  • 反向投影技术主要用于在大型输入图像中定位特定的小型模板图像。通过直方图对比,它能够识别出最匹配的区域或点,从而确定模板图像在输入图像中的位置。 ... [详细]
  • 本文介绍如何使用 Android 的 Canvas 和 View 组件创建一个简单的绘图板应用程序,支持触摸绘画和保存图片功能。 ... [详细]
  • 深入解析 Android IPC 中的 Messenger 机制
    本文详细介绍了 Android 中基于消息传递的进程间通信(IPC)机制——Messenger。通过实例和源码分析,帮助开发者更好地理解和使用这一高效的通信工具。 ... [详细]
  • 本文详细探讨了Java中的ClassLoader类加载器的工作原理,包括其如何将class文件加载至JVM中,以及JVM启动时的动态加载策略。文章还介绍了JVM内置的三种类加载器及其工作方式,并解释了类加载器的继承关系和双亲委托机制。 ... [详细]
  • 雨林木风 GHOST XP SP3 经典珍藏版 V2017.11
    雨林木风 GHOST XP SP3 经典珍藏版 V2017.11 ... [详细]
  • 本文详细介绍了福昕软件公司开发的Foxit PDF SDK ActiveX控件(版本5.20),并提供了关于其在64位Windows 7系统和Visual Studio 2013环境下的使用方法。该控件文件名为FoxitPDFSDKActiveX520_Std_x64.ocx,适用于集成PDF功能到应用程序中。 ... [详细]
  • JavaScript 中创建对象的多种方法
    本文详细介绍了 JavaScript 中创建对象的几种常见方式,包括对象字面量、构造函数和 Object.create 方法,并提供了示例代码和属性描述符的解释。 ... [详细]
  • ListView简单使用
    先上效果:主要实现了Listview的绑定和点击事件。项目资源结构如下:先创建一个动物类,用来装载数据:Animal类如下:packagecom.example.simplelis ... [详细]
  • 本文将详细探讨 Java 中提供的不可变集合(如 `Collections.unmodifiableXXX`)和同步集合(如 `Collections.synchronizedXXX`)的实现原理及使用方法,帮助开发者更好地理解和应用这些工具。 ... [详细]
  • 当unique验证运到图片上传时
    2019独角兽企业重金招聘Python工程师标准model:public$imageFile;publicfunctionrules(){return[[[na ... [详细]
  • [Vue.js 3.0] Guide – Scaling Up – State Management
    [Vue.js 3.0] Guide – Scaling Up – State Management ... [详细]
  • 烤鸭|本文_Spring之Bean的生命周期详解
    烤鸭|本文_Spring之Bean的生命周期详解 ... [详细]
author-avatar
双豆儿_668
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有