热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

Mask-RCNN源码阅读笔记

阅读了https:blog.csdn.netu011974639articledetails78483779?locationNum9&fps1这篇博客这篇博客介

阅读了https://blog.csdn.net/u011974639/article/details/78483779?locatiOnNum=9&fps=1这篇博客

这篇博客介绍了几个ipynb格式的代码,但没有其他python文件(包括coco源码)解析;

这些天研读了一下那些源码,有错误,忘大神指正批评。~~~~~~~~

一直存在草稿箱里没有发。。。

######################### 分隔符 ###########################################

读coco.py笔记:

coco 提供了图片,和一个图片可能的(多个)标注(annotations),coco源码里简称为ann

一个image对应一个img_id;

一个image_id可以有多个 annotations代码简称为anns;

每个ann对应有它的类别category;

多个种类源码中用cats;

所以一个图片或者说一个img_id, 就对应了多个anns,和多个cats;一个图片就对应了多种类别的mask;

mask: [instance_number,(y1,x1,y2,x2)]

anchors: [anchor_count, (y1,x1,y2,x2)]

所以生成的结果是:

截自博客https://blog.csdn.net/u011974639/article/details/78483779?locatiOnNum=9&fps=1

RPN_ANCHOR_SCALES = (32,64,128,256,512)

1.Create model in training mode:

创建模型,就相当于创建一个骨架放在那里,此时还没有往里面传实际的数据

model = modellib.MaskRCNN(mode="training", cOnfig=config,model_dir=MODEL_DIR)

分析class MaskRCNN():

1.a Inputs

(1)使用keras.Layer.Input()得到input_image和input_image_meta,创建输入层的骨架

(2) RPN GT

    使用keras.Layer.Input()得到input_rpn_match [None,1],input_rpn_bbox [None,4]

(3)Detection GT (class IDs, bounding boxes, and masks)

    使用keras.Layer.Input()得到

   # 1. GT Class IDs (zero padded) :  input_gt_class_ids [None],

    # 2. GT Boxes in pixels (zero padded)
    # [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] in image coordinates : input_gt_boxes

    # Normalize coordinates

    # 3. GT Masks (zero padded)      使用keras.Layer.Input()得到

    # [batch, height, width, MAX_GT_INSTANCES]

1.b Build the shared convolutional layers

FPN的结构最后得到

rpn_feature_maps = [P2, P3, P4, P5, P6] 

mrcnn_feature_maps = [P2, P3, P4, P5]   depth都是256

# Generate Anchors 

"""Generate anchors at different levels of a feature pyramid. Each scale
    is associated with a level of the pyramid, but each ratio is used in
    all levels of the pyramid.

    Returns:
    anchors: [N, (y1, x1, y2, x2)]. All generated anchors in one array. Sorted
        with the same order of the given scales. So, anchors of scale[0] come
        first, then anchors of scale[1], and so on.
    """

   anchors是根据config配置的 RPN_ANCHOR_SCALES= (32, 64, 128, 256, 512) # Length of square anchor side in pixels     的每个scale遍历

最终得到所有的像素pixel对应的所有的anchors(一个像素3个anchors),这和之后的RPN得到的是对应的,从而能够根据RPN在根据ProposalLayer里面得到的Indice得到对应indice的anchors(和后面介绍结合就明白了),这样就过滤了anchors。

1.c RPN Model

rpn = build_rpn_model(config.RPN_ANCHOR_STRIDE,

                              len(config.RPN_ANCHOR_RATIOS), 256)

ayer_outputs = []  # list of lists
        for p in rpn_feature_maps:
            layer_outputs.append(rpn([p]))

"""Builds a Keras model of the Region Proposal Network.
    It wraps the RPN graph so it can be used multiple times with shared
    weights.

    anchors_per_location: number of anchors per pixel in the feature map
    anchor_stride: Controls the density of anchors. Typically 1 (anchors for
                   every pixel in the feature map), or 2 (every other pixel).
    depth: Depth of the backbone feature map.

    Returns a Keras Model object. The model outputs, when called, are:
    rpn_logits: [batch, H, W, 2] Anchor classifier logits (before softmax)
    rpn_probs: [batch, W, W, 2] Anchor classifier probabilities.               rpn_probs就是每个anchor的score(2是前景和背景),
    rpn_bbox: [batch, H, W, (dy, dx, log(dh), log(dw))] Deltas to be
                applied to anchors.

    """

depth是传入的 1.b 获得的feature map的depth都是256, 其实最后返回的 rpn_probs(Anchor Score) 的 shape应该是 [batch, anchors, 2],由[batch, height, width, anchors per location * 2] reshape后得来。

rpn_box(Bounding box refinement.) [batch, H, W, anchors per location, depth]

    # where depth is [x, y, log(w), log(h)]    ,Reshape to [batch, anchors, 4]

这就和anchors的形状对应上了~

将每个 1.b 得到的 feature  map 送进模型 RPN model,得到每层的feature_map的 rpn_logits,rpn_probs,rpn_bbox,然后把所有层相对应的rpn_logits连在一起,另两个一样。

1.d Generate proposals
       # Proposals are [batch, N, (y1, x1, y2, x2)] in normalized coordinates
        # and zero padded.

函数:
          rpn_rois = ProposalLayer(proposal_count=proposal_count,
                                 nms_threshold=config.RPN_NMS_THRESHOLD,
                                 name="ROI",
                                 anchors=self.anchors,
                                 cOnfig=config)([rpn_class, rpn_bbox])

ProposalLayer

"""Receives anchor scores and selects a subset to pass as proposals
    to the second stage. Filtering is done based on anchor scores and
    non-max suppression to remove overlaps. It also applies bounding
    box refinement deltas to anchors.

    Inputs:
        rpn_probs: [batch, anchors, (bg prob, fg prob)]
        rpn_bbox: [batch, anchors, (dy, dx, log(dh), log(dw))]

    Returns:
        Proposals in normalized coordinates [batch, rois, (y1, x1, y2, x2)]
    """

anchors就是网络中后5个阶段stages的所有anchors;

1.d.1 ProposalLayer类里的call函数

(1)call函数接收inputs,就是上面的 rpn_probs,rpn_bbox然后把rpn_probs里的前景作为scores,取前K个scores大的anchors的下标(indices),然后根据这个indices,筛选出对应的scores,deltas,和anchors.

(2)# Apply deltas to anchors to get refined anchors. 对anchors根据deltas (dy,dx,dh,dw)进行调整

        # 返回调整后的boxes :  [batch, N, (y1, x1, y2, x2)]    相当于是新的anchors,因为更接近GT(ground truth)所以取名为boxes

(3)# Clip to image boundaries. [batch, N, (y1, x1, y2, x2)]  相当于把Boxes的边框限制在images的边界内

(4) Filter out small boxes 

    4.a Normalize dimensions to range of 0 to 1.   --->normalized_boxes

    4.b Non-max suppression 传入normalized_boxes和scores,进行NMS处理得到下标indices,然后根据indices筛选出对应的normalized_boxes,得到proposals(rpn_rois).     # Pad if needed

1.e Generate detection targets

# Subsamples proposals and generates target outputs for training
# Note that proposal class IDs, gt_boxes, and gt_masks are zero

# padded. Equally, returned rois and targets are zero padded.

class DetectionTargetLayer(KE.Layer):

"""Subsamples proposals and generates target box refinement, class_ids,
    and masks for each.

    Inputs:
    proposals: [batch, N, (y1, x1, y2, x2)] in normalized coordinates. Might
               be zero padded if there are not enough proposals.
    gt_class_ids: [batch, MAX_GT_INSTANCES] Integer class IDs.
    gt_boxes: [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] in normalized
              coordinates.
    gt_masks: [batch, height, width, MAX_GT_INSTANCES] of boolean type

    Returns: Target ROIs and corresponding class IDs, bounding box shifts,
    and masks.
    rois: [batch, TRAIN_ROIS_PER_IMAGE, (y1, x1, y2, x2)] in normalized
          coordinates
    target_class_ids: [batch, TRAIN_ROIS_PER_IMAGE]. Integer class IDs.
    target_deltas: [batch, TRAIN_ROIS_PER_IMAGE, NUM_CLASSES,
                    (dy, dx, log(dh), log(dw), class_id)]
                   Class-specific bbox refinements.
    target_mask: [batch, TRAIN_ROIS_PER_IMAGE, height, width)
                 Masks cropped to bbox boundaries and resized to neural
                 network output size.

    Note: Returned arrays might be zero padded if not enough target ROIs.

    """

# Compute overlaps matrix [proposals, gt_boxes]
    overlaps = overlaps_graph(proposals, gt_boxes)    得到每个proposal对应于所有MAX_GT_INSTANCES数目的GT的IOU

    tf.reduce_max(overlaps, axis=1)    计算每个proposal的最大值 (每行的最大值)

返回的rois等包括positive的proposals和negative的proposals和 0 padded

1.f Network Heads

1.f.1 fpn_classifier_graph()

进行分类和bbox回归

"""Builds the computation graph of the feature pyramid network classifier
    and regressor heads.


    rois: [batch, num_rois, (y1, x1, y2, x2)] Proposal boxes in normalized
          coordinates.
    feature_maps: List of feature maps from diffent layers of the pyramid,
                  [P2, P3, P4, P5]. Each has a different resolution.
    image_shape: [height, width, depth]
    pool_size: The width of the square feature map generated from ROI Pooling.
    num_classes: number of classes, which determines the depth of the results
    train_bn: Boolean. Train or freeze Batch Norm layres


    Returns:
        logits: [N, NUM_CLASSES] classifier logits (before softmax)
        probs: [N, NUM_CLASSES] classifier probabilities
        bbox_deltas: [N, (dy, dx, log(dh), log(dw))] Deltas to apply to
                     proposal boxes

    """

1.f.1.1 PyramidROIAlign

"""Implements ROI Pooling on multiple levels of the feature pyramid.

    Params:
    - pool_shape: [height, width] of the output pooled regions. Usually [7, 7]
    - image_shape: [height, width, channels]. Shape of input image in pixels


    Inputs:
    - boxes: [batch, num_boxes, (y1, x1, y2, x2)] in normalized
             coordinates. Possibly padded with zeros if not enough
             boxes to fill the array.
    - Feature maps: List of feature maps from different levels of the pyramid.
                    Each is [batch, height, width, channels]


    Output:
    Pooled regions in the shape: [batch, num_boxes, height, width, channels].
    The width and height are those specific in the pool_shape in the layer
    constructor.
    """

ROIAlign 的原理: http://blog.leanote.com/post/afanti.deng@gmail.com/b5f4f526490b

1.f.2 build_fpn_mask_graph()

"""Builds the computation graph of the mask head of Feature Pyramid Network.

    rois: [batch, num_rois, (y1, x1, y2, x2)] Proposal boxes in normalized
          coordinates.
    feature_maps: List of feature maps from diffent layers of the pyramid,
                  [P2, P3, P4, P5]. Each has a different resolution.
    image_shape: [height, width, depth]
    pool_size: The width of the square feature map generated from ROI Pooling.
    num_classes: number of classes, which determines the depth of the results
    train_bn: Boolean. Train or freeze Batch Norm layres


    Returns: Masks [batch, roi_count, height, width, num_classes]
    """
最后返回model 

调用model.train进行模型的训练(此时将dataset传入)


utils.py文件里:

    np.meshgrid()将一个一维数组变成二维矩阵 https://www.cnblogs.com/sunshinewang/p/6897966.html

config.py文件里:

    BACKBONE_SHAPES 就是 feature map shape :[256 256] [128 128] [64 64] [32 32] [16 16]...


    根据scores选前K个anchors, 然后refine剩下的anchors,然后将边框限定在image的边界,nomorlize之后进行NMS获得最终的proposals。


整个流程:

1.RPN

1.a RPN Targets

The RPN targets are the training values for the RPN. To generate the targets, we start with a grid of anchors that cover the full image at different scales, and then we compute the IoU of the anchors with ground truth object. Positive anchors are those that have an IoU >= 0.7 with any ground truth object, and negative anchors are those that don't cover any object by more than 0.3 IoU. Anchors in between (i.e. cover an object by IoU >= 0.3 but <0.7) are considered neutral and excluded from training.

To train the RPN regressor, we also compute the shift and resizing needed to make the anchor cover the ground truth object completely.

# Generate RPN trainig targets
# target_rpn_match is 1 for positive anchors, -1 for negative anchors
# and 0 for neutral anchors.
target_rpn_match, target_rpn_bbox = modellib. build_rpn_targets(

    image.shape, model.anchors, gt_class_id, gt_bbox, model.config)

#### target_rpn_bbox最多是在config里面指定的,实际上存的有数据数量的是positive_anchors的数目,存的数据内容实际上是(dy, dx, log(dh), log(dw))【这个是通过positive_anchors和对应的GT_bboxs计算得到】。

然后调用utils.apply_box_deltas()函数对positive_anchors根据dy, dx, log(dh), log(dw) 进行微调


                    实线是微调后的,虚线是微调前的



复习了一下Faster-RCNN

https://blog.csdn.net/u013832707/article/details/53641055/

对于窗口(proposal)是用的是,中心点和长宽标记 P = (px, py, pw, ph), 其实真正的P是这个窗口对应的CNN特征

边框回归:(1)先做平移;(2)后做尺度缩放;所以需要学习四个变量:dx(P),dy(P), dw(P), dh(P);



推荐阅读
  • 在尝试加载支持推送通知的iOS应用程序的Ad Hoc构建时,遇到了‘no valid aps-environment entitlement found for application’的错误提示。本文将探讨此错误的原因及多种可能的解决方案。 ... [详细]
  • 在1995年,Simon Plouffe 发现了一种特殊的求和方法来表示某些常数。两年后,Bailey 和 Borwein 在他们的论文中发表了这一发现,这种方法被命名为 Bailey-Borwein-Plouffe (BBP) 公式。该问题要求计算圆周率 π 的第 n 个十六进制数字。 ... [详细]
  • 本文探讨了如何通过优化 DOM 操作来提升 JavaScript 的性能,包括使用 `createElement` 函数、动画元素、理解重绘事件及处理鼠标滚动事件等关键主题。 ... [详细]
  • 二维码的实现与应用
    本文介绍了二维码的基本概念、分类及其优缺点,并详细描述了如何使用Java编程语言结合第三方库(如ZXing和qrcode.jar)来实现二维码的生成与解析。 ... [详细]
  • 问题描述现在,不管开发一个多大的系统(至少我现在的部门是这样的),都会带一个日志功能;在实际开发过程中 ... [详细]
  • 入门指南:使用FastRPC技术连接Qualcomm Hexagon DSP
    本文旨在为初学者提供关于如何使用FastRPC技术连接Qualcomm Hexagon DSP的基础知识。FastRPC技术允许开发者在本地客户端实现远程调用,从而简化Hexagon DSP的开发和调试过程。 ... [详细]
  • 本文详细介绍了如何正确设置Shadowsocks公共代理,包括调整超时设置、检查系统限制、防止滥用及遵守DMCA法规等关键步骤。 ... [详细]
  • importjava.io.*;importjava.util.*;publicclass五子棋游戏{staticintm1;staticintn1;staticfinalintS ... [详细]
  • 深入解析WebP图片格式及其应用
    随着互联网技术的发展,无论是PC端还是移动端,图片数据流量占据了很大比重。尤其在高分辨率屏幕普及的背景下,如何在保证图片质量的同时减少文件大小,成为了亟待解决的问题。本文将详细介绍Google推出的WebP图片格式,探讨其在实际项目中的应用及优化策略。 ... [详细]
  • 长期从事ABAP开发工作的专业人士,在面对行业新趋势时,往往需要重新审视自己的发展方向。本文探讨了几位资深专家对ABAP未来走向的看法,以及开发者应如何调整技能以适应新的技术环境。 ... [详细]
  • CRZ.im:一款极简的网址缩短服务及其安装指南
    本文介绍了一款名为CRZ.im的极简网址缩短服务,该服务采用PHP和SQLite开发,体积小巧,约10KB。本文还提供了详细的安装步骤,包括环境配置、域名解析及Nginx伪静态设置。 ... [详细]
  • 如何从BAM文件绘制ATAC-seq插入片段长度分布图?
    在ATAC-seq数据处理中,插入片段长度的分布图是一个重要的质量控制指标,它能反映出核小体的周期性排列。本文将详细介绍如何从BAM文件中提取并绘制这些数据。 ... [详细]
  • 理解浏览器历史记录(2)hashchange、pushState
    阅读目录1.hashchange2.pushState本文也是一篇基础文章。继上文之后,本打算去研究pushState,偶然在一些信息中发现了锚点变 ... [详细]
  • 本文详细介绍了如何搭建一个高可用的MongoDB集群,包括环境准备、用户配置、目录创建、MongoDB安装、配置文件设置、集群组件部署等步骤。特别关注分片、读写分离及负载均衡的实现。 ... [详细]
  • 本文详细介绍了在Linux操作系统上安装和部署MySQL数据库的过程,包括必要的环境准备、安装步骤、配置优化及安全设置等内容。 ... [详细]
author-avatar
双豆儿_668
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有