热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

Mask-RCNN源码阅读笔记

阅读了https:blog.csdn.netu011974639articledetails78483779?locationNum9&fps1这篇博客这篇博客介

阅读了https://blog.csdn.net/u011974639/article/details/78483779?locatiOnNum=9&fps=1这篇博客

这篇博客介绍了几个ipynb格式的代码,但没有其他python文件(包括coco源码)解析;

这些天研读了一下那些源码,有错误,忘大神指正批评。~~~~~~~~

一直存在草稿箱里没有发。。。

######################### 分隔符 ###########################################

读coco.py笔记:

coco 提供了图片,和一个图片可能的(多个)标注(annotations),coco源码里简称为ann

一个image对应一个img_id;

一个image_id可以有多个 annotations代码简称为anns;

每个ann对应有它的类别category;

多个种类源码中用cats;

所以一个图片或者说一个img_id, 就对应了多个anns,和多个cats;一个图片就对应了多种类别的mask;

mask: [instance_number,(y1,x1,y2,x2)]

anchors: [anchor_count, (y1,x1,y2,x2)]

所以生成的结果是:

截自博客https://blog.csdn.net/u011974639/article/details/78483779?locatiOnNum=9&fps=1

RPN_ANCHOR_SCALES = (32,64,128,256,512)

1.Create model in training mode:

创建模型,就相当于创建一个骨架放在那里,此时还没有往里面传实际的数据

model = modellib.MaskRCNN(mode="training", cOnfig=config,model_dir=MODEL_DIR)

分析class MaskRCNN():

1.a Inputs

(1)使用keras.Layer.Input()得到input_image和input_image_meta,创建输入层的骨架

(2) RPN GT

    使用keras.Layer.Input()得到input_rpn_match [None,1],input_rpn_bbox [None,4]

(3)Detection GT (class IDs, bounding boxes, and masks)

    使用keras.Layer.Input()得到

   # 1. GT Class IDs (zero padded) :  input_gt_class_ids [None],

    # 2. GT Boxes in pixels (zero padded)
    # [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] in image coordinates : input_gt_boxes

    # Normalize coordinates

    # 3. GT Masks (zero padded)      使用keras.Layer.Input()得到

    # [batch, height, width, MAX_GT_INSTANCES]

1.b Build the shared convolutional layers

FPN的结构最后得到

rpn_feature_maps = [P2, P3, P4, P5, P6] 

mrcnn_feature_maps = [P2, P3, P4, P5]   depth都是256

# Generate Anchors 

"""Generate anchors at different levels of a feature pyramid. Each scale
    is associated with a level of the pyramid, but each ratio is used in
    all levels of the pyramid.

    Returns:
    anchors: [N, (y1, x1, y2, x2)]. All generated anchors in one array. Sorted
        with the same order of the given scales. So, anchors of scale[0] come
        first, then anchors of scale[1], and so on.
    """

   anchors是根据config配置的 RPN_ANCHOR_SCALES= (32, 64, 128, 256, 512) # Length of square anchor side in pixels     的每个scale遍历

最终得到所有的像素pixel对应的所有的anchors(一个像素3个anchors),这和之后的RPN得到的是对应的,从而能够根据RPN在根据ProposalLayer里面得到的Indice得到对应indice的anchors(和后面介绍结合就明白了),这样就过滤了anchors。

1.c RPN Model

rpn = build_rpn_model(config.RPN_ANCHOR_STRIDE,

                              len(config.RPN_ANCHOR_RATIOS), 256)

ayer_outputs = []  # list of lists
        for p in rpn_feature_maps:
            layer_outputs.append(rpn([p]))

"""Builds a Keras model of the Region Proposal Network.
    It wraps the RPN graph so it can be used multiple times with shared
    weights.

    anchors_per_location: number of anchors per pixel in the feature map
    anchor_stride: Controls the density of anchors. Typically 1 (anchors for
                   every pixel in the feature map), or 2 (every other pixel).
    depth: Depth of the backbone feature map.

    Returns a Keras Model object. The model outputs, when called, are:
    rpn_logits: [batch, H, W, 2] Anchor classifier logits (before softmax)
    rpn_probs: [batch, W, W, 2] Anchor classifier probabilities.               rpn_probs就是每个anchor的score(2是前景和背景),
    rpn_bbox: [batch, H, W, (dy, dx, log(dh), log(dw))] Deltas to be
                applied to anchors.

    """

depth是传入的 1.b 获得的feature map的depth都是256, 其实最后返回的 rpn_probs(Anchor Score) 的 shape应该是 [batch, anchors, 2],由[batch, height, width, anchors per location * 2] reshape后得来。

rpn_box(Bounding box refinement.) [batch, H, W, anchors per location, depth]

    # where depth is [x, y, log(w), log(h)]    ,Reshape to [batch, anchors, 4]

这就和anchors的形状对应上了~

将每个 1.b 得到的 feature  map 送进模型 RPN model,得到每层的feature_map的 rpn_logits,rpn_probs,rpn_bbox,然后把所有层相对应的rpn_logits连在一起,另两个一样。

1.d Generate proposals
       # Proposals are [batch, N, (y1, x1, y2, x2)] in normalized coordinates
        # and zero padded.

函数:
          rpn_rois = ProposalLayer(proposal_count=proposal_count,
                                 nms_threshold=config.RPN_NMS_THRESHOLD,
                                 name="ROI",
                                 anchors=self.anchors,
                                 cOnfig=config)([rpn_class, rpn_bbox])

ProposalLayer

"""Receives anchor scores and selects a subset to pass as proposals
    to the second stage. Filtering is done based on anchor scores and
    non-max suppression to remove overlaps. It also applies bounding
    box refinement deltas to anchors.

    Inputs:
        rpn_probs: [batch, anchors, (bg prob, fg prob)]
        rpn_bbox: [batch, anchors, (dy, dx, log(dh), log(dw))]

    Returns:
        Proposals in normalized coordinates [batch, rois, (y1, x1, y2, x2)]
    """

anchors就是网络中后5个阶段stages的所有anchors;

1.d.1 ProposalLayer类里的call函数

(1)call函数接收inputs,就是上面的 rpn_probs,rpn_bbox然后把rpn_probs里的前景作为scores,取前K个scores大的anchors的下标(indices),然后根据这个indices,筛选出对应的scores,deltas,和anchors.

(2)# Apply deltas to anchors to get refined anchors. 对anchors根据deltas (dy,dx,dh,dw)进行调整

        # 返回调整后的boxes :  [batch, N, (y1, x1, y2, x2)]    相当于是新的anchors,因为更接近GT(ground truth)所以取名为boxes

(3)# Clip to image boundaries. [batch, N, (y1, x1, y2, x2)]  相当于把Boxes的边框限制在images的边界内

(4) Filter out small boxes 

    4.a Normalize dimensions to range of 0 to 1.   --->normalized_boxes

    4.b Non-max suppression 传入normalized_boxes和scores,进行NMS处理得到下标indices,然后根据indices筛选出对应的normalized_boxes,得到proposals(rpn_rois).     # Pad if needed

1.e Generate detection targets

# Subsamples proposals and generates target outputs for training
# Note that proposal class IDs, gt_boxes, and gt_masks are zero

# padded. Equally, returned rois and targets are zero padded.

class DetectionTargetLayer(KE.Layer):

"""Subsamples proposals and generates target box refinement, class_ids,
    and masks for each.

    Inputs:
    proposals: [batch, N, (y1, x1, y2, x2)] in normalized coordinates. Might
               be zero padded if there are not enough proposals.
    gt_class_ids: [batch, MAX_GT_INSTANCES] Integer class IDs.
    gt_boxes: [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] in normalized
              coordinates.
    gt_masks: [batch, height, width, MAX_GT_INSTANCES] of boolean type

    Returns: Target ROIs and corresponding class IDs, bounding box shifts,
    and masks.
    rois: [batch, TRAIN_ROIS_PER_IMAGE, (y1, x1, y2, x2)] in normalized
          coordinates
    target_class_ids: [batch, TRAIN_ROIS_PER_IMAGE]. Integer class IDs.
    target_deltas: [batch, TRAIN_ROIS_PER_IMAGE, NUM_CLASSES,
                    (dy, dx, log(dh), log(dw), class_id)]
                   Class-specific bbox refinements.
    target_mask: [batch, TRAIN_ROIS_PER_IMAGE, height, width)
                 Masks cropped to bbox boundaries and resized to neural
                 network output size.

    Note: Returned arrays might be zero padded if not enough target ROIs.

    """

# Compute overlaps matrix [proposals, gt_boxes]
    overlaps = overlaps_graph(proposals, gt_boxes)    得到每个proposal对应于所有MAX_GT_INSTANCES数目的GT的IOU

    tf.reduce_max(overlaps, axis=1)    计算每个proposal的最大值 (每行的最大值)

返回的rois等包括positive的proposals和negative的proposals和 0 padded

1.f Network Heads

1.f.1 fpn_classifier_graph()

进行分类和bbox回归

"""Builds the computation graph of the feature pyramid network classifier
    and regressor heads.


    rois: [batch, num_rois, (y1, x1, y2, x2)] Proposal boxes in normalized
          coordinates.
    feature_maps: List of feature maps from diffent layers of the pyramid,
                  [P2, P3, P4, P5]. Each has a different resolution.
    image_shape: [height, width, depth]
    pool_size: The width of the square feature map generated from ROI Pooling.
    num_classes: number of classes, which determines the depth of the results
    train_bn: Boolean. Train or freeze Batch Norm layres


    Returns:
        logits: [N, NUM_CLASSES] classifier logits (before softmax)
        probs: [N, NUM_CLASSES] classifier probabilities
        bbox_deltas: [N, (dy, dx, log(dh), log(dw))] Deltas to apply to
                     proposal boxes

    """

1.f.1.1 PyramidROIAlign

"""Implements ROI Pooling on multiple levels of the feature pyramid.

    Params:
    - pool_shape: [height, width] of the output pooled regions. Usually [7, 7]
    - image_shape: [height, width, channels]. Shape of input image in pixels


    Inputs:
    - boxes: [batch, num_boxes, (y1, x1, y2, x2)] in normalized
             coordinates. Possibly padded with zeros if not enough
             boxes to fill the array.
    - Feature maps: List of feature maps from different levels of the pyramid.
                    Each is [batch, height, width, channels]


    Output:
    Pooled regions in the shape: [batch, num_boxes, height, width, channels].
    The width and height are those specific in the pool_shape in the layer
    constructor.
    """

ROIAlign 的原理: http://blog.leanote.com/post/afanti.deng@gmail.com/b5f4f526490b

1.f.2 build_fpn_mask_graph()

"""Builds the computation graph of the mask head of Feature Pyramid Network.

    rois: [batch, num_rois, (y1, x1, y2, x2)] Proposal boxes in normalized
          coordinates.
    feature_maps: List of feature maps from diffent layers of the pyramid,
                  [P2, P3, P4, P5]. Each has a different resolution.
    image_shape: [height, width, depth]
    pool_size: The width of the square feature map generated from ROI Pooling.
    num_classes: number of classes, which determines the depth of the results
    train_bn: Boolean. Train or freeze Batch Norm layres


    Returns: Masks [batch, roi_count, height, width, num_classes]
    """
最后返回model 

调用model.train进行模型的训练(此时将dataset传入)


utils.py文件里:

    np.meshgrid()将一个一维数组变成二维矩阵 https://www.cnblogs.com/sunshinewang/p/6897966.html

config.py文件里:

    BACKBONE_SHAPES 就是 feature map shape :[256 256] [128 128] [64 64] [32 32] [16 16]...


    根据scores选前K个anchors, 然后refine剩下的anchors,然后将边框限定在image的边界,nomorlize之后进行NMS获得最终的proposals。


整个流程:

1.RPN

1.a RPN Targets

The RPN targets are the training values for the RPN. To generate the targets, we start with a grid of anchors that cover the full image at different scales, and then we compute the IoU of the anchors with ground truth object. Positive anchors are those that have an IoU >= 0.7 with any ground truth object, and negative anchors are those that don't cover any object by more than 0.3 IoU. Anchors in between (i.e. cover an object by IoU >= 0.3 but <0.7) are considered neutral and excluded from training.

To train the RPN regressor, we also compute the shift and resizing needed to make the anchor cover the ground truth object completely.

# Generate RPN trainig targets
# target_rpn_match is 1 for positive anchors, -1 for negative anchors
# and 0 for neutral anchors.
target_rpn_match, target_rpn_bbox = modellib. build_rpn_targets(

    image.shape, model.anchors, gt_class_id, gt_bbox, model.config)

#### target_rpn_bbox最多是在config里面指定的,实际上存的有数据数量的是positive_anchors的数目,存的数据内容实际上是(dy, dx, log(dh), log(dw))【这个是通过positive_anchors和对应的GT_bboxs计算得到】。

然后调用utils.apply_box_deltas()函数对positive_anchors根据dy, dx, log(dh), log(dw) 进行微调


                    实线是微调后的,虚线是微调前的



复习了一下Faster-RCNN

https://blog.csdn.net/u013832707/article/details/53641055/

对于窗口(proposal)是用的是,中心点和长宽标记 P = (px, py, pw, ph), 其实真正的P是这个窗口对应的CNN特征

边框回归:(1)先做平移;(2)后做尺度缩放;所以需要学习四个变量:dx(P),dy(P), dw(P), dh(P);



推荐阅读
  • 我是python小白一枚,对kivy开发手机app产生了兴趣,并没感觉到kivy写代码有多难,折腾打包成手机apk倒是花了好长时间,走过了大大小小的坑,这里把经验记录下来,供大家参 ... [详细]
  • 配 ... [详细]
  • 开发笔记:googletest安装与使用
    本文由编程笔记#小编为大家整理,主要介绍了googletest安装与使用相关的知识,希望对你有一定的参考价值。简介googletest是Google公司 ... [详细]
  • MyBatis缓存分为一级缓存和二级缓存一级缓存在SqlSession上二级缓存在SqlSessionFactory上如何配置一级缓存??默认开启&#x ... [详细]
  • 一、EF的三种设计模型CodeFirst,ModelFirst,DBFirst三种在软件的实际开发中最常用的就是后两种。下面简单介绍两种设计模型的区别和两种更新的区别 ... [详细]
  • Nginx使用AWStats日志分析的步骤及注意事项
    本文介绍了在Centos7操作系统上使用Nginx和AWStats进行日志分析的步骤和注意事项。通过AWStats可以统计网站的访问量、IP地址、操作系统、浏览器等信息,并提供精确到每月、每日、每小时的数据。在部署AWStats之前需要确认服务器上已经安装了Perl环境,并进行DNS解析。 ... [详细]
  • 在Docker中,将主机目录挂载到容器中作为volume使用时,常常会遇到文件权限问题。这是因为容器内外的UID不同所导致的。本文介绍了解决这个问题的方法,包括使用gosu和suexec工具以及在Dockerfile中配置volume的权限。通过这些方法,可以避免在使用Docker时出现无写权限的情况。 ... [详细]
  • Iamtryingtomakeaclassthatwillreadatextfileofnamesintoanarray,thenreturnthatarra ... [详细]
  • 向QTextEdit拖放文件的方法及实现步骤
    本文介绍了在使用QTextEdit时如何实现拖放文件的功能,包括相关的方法和实现步骤。通过重写dragEnterEvent和dropEvent函数,并结合QMimeData和QUrl等类,可以轻松实现向QTextEdit拖放文件的功能。详细的代码实现和说明可以参考本文提供的示例代码。 ... [详细]
  • 在云服务器中搭建Jupyter Notebook环境
    目录前言二、JupyterNotebook搭建步骤1.云服务器准备2.安装Python及pip3.安装JupyterNotebook4.运行JupyterNoteboo ... [详细]
  • 情况说明最近打开Github经常会遇到用户头像或者仓库中的图片无法预览。F12打开控制台也能看到一堆报错信息。解决方法找到hosts文件Win:C:\Windows\Sys ... [详细]
  • 近期因为内部培训有序列化的需求,于是趁此机会由浅入深的剖析一下序列化相关内容。之前也写过由浅入深的xml漏洞系列,欢迎阅读:https:skysec.top20180817浅析xml及其安全问题 ... [详细]
  • 微软头条实习生分享深度学习自学指南
    本文介绍了一位微软头条实习生自学深度学习的经验分享,包括学习资源推荐、重要基础知识的学习要点等。作者强调了学好Python和数学基础的重要性,并提供了一些建议。 ... [详细]
  • 安装mysqlclient失败解决办法
    本文介绍了在MAC系统中,使用django使用mysql数据库报错的解决办法。通过源码安装mysqlclient或将mysql_config添加到系统环境变量中,可以解决安装mysqlclient失败的问题。同时,还介绍了查看mysql安装路径和使配置文件生效的方法。 ... [详细]
  • Linux服务器密码过期策略、登录次数限制、私钥登录等配置方法
    本文介绍了在Linux服务器上进行密码过期策略、登录次数限制、私钥登录等配置的方法。通过修改配置文件中的参数,可以设置密码的有效期、最小间隔时间、最小长度,并在密码过期前进行提示。同时还介绍了如何进行公钥登录和修改默认账户用户名的操作。详细步骤和注意事项可参考本文内容。 ... [详细]
author-avatar
双豆儿_668
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有