阅读了https://blog.csdn.net/u011974639/article/details/78483779?locatiOnNum=9&fps=1这篇博客
这篇博客介绍了几个ipynb格式的代码,但没有其他python文件(包括coco源码)解析;
这些天研读了一下那些源码,有错误,忘大神指正批评。~~~~~~~~
一直存在草稿箱里没有发。。。
######################### 分隔符 ###########################################
读coco.py笔记:
coco 提供了图片,和一个图片可能的(多个)标注(annotations),coco源码里简称为ann
一个image对应一个img_id;
一个image_id可以有多个 annotations代码简称为anns;
每个ann对应有它的类别category;
多个种类源码中用cats;
所以一个图片或者说一个img_id, 就对应了多个anns,和多个cats;一个图片就对应了多种类别的mask;
mask: [instance_number,(y1,x1,y2,x2)]
anchors: [anchor_count, (y1,x1,y2,x2)]
所以生成的结果是:
截自博客https://blog.csdn.net/u011974639/article/details/78483779?locatiOnNum=9&fps=1
RPN_ANCHOR_SCALES = (32,64,128,256,512)
1.Create model in training mode:创建模型,就相当于创建一个骨架放在那里,此时还没有往里面传实际的数据
model = modellib.MaskRCNN(mode="training", cOnfig=config,model_dir=MODEL_DIR)
分析class MaskRCNN():(1)使用keras.Layer.Input()得到input_image和input_image_meta,创建输入层的骨架
(2) RPN GT
使用keras.Layer.Input()得到input_rpn_match [None,1],input_rpn_bbox [None,4]
(3)Detection GT (class IDs, bounding boxes, and masks)
使用keras.Layer.Input()得到
# 1. GT Class IDs (zero padded) : input_gt_class_ids [None],
# 2. GT Boxes in pixels (zero padded)
# [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] in image coordinates : input_gt_boxes
# Normalize coordinates
# 3. GT Masks (zero padded) 使用keras.Layer.Input()得到
# [batch, height, width, MAX_GT_INSTANCES]
FPN的结构最后得到
rpn_feature_maps = [P2, P3, P4, P5, P6]
mrcnn_feature_maps = [P2, P3, P4, P5] depth都是256
anchors是根据config配置的 RPN_ANCHOR_SCALES= (32, 64, 128, 256, 512) # Length of square anchor side in pixels 的每个scale遍历
最终得到所有的像素pixel对应的所有的anchors(一个像素3个anchors),这和之后的RPN得到的是对应的,从而能够根据RPN在根据ProposalLayer里面得到的Indice得到对应indice的anchors(和后面介绍结合就明白了),这样就过滤了anchors。
rpn = build_rpn_model(config.RPN_ANCHOR_STRIDE,
len(config.RPN_ANCHOR_RATIOS), 256)
ayer_outputs = [] # list of lists
for p in rpn_feature_maps:
layer_outputs.append(rpn([p]))
"""Builds a Keras model of the Region Proposal Network.
It wraps the RPN graph so it can be used multiple times with shared
weights.
anchors_per_location: number of anchors per pixel in the feature map
anchor_stride: Controls the density of anchors. Typically 1 (anchors for
every pixel in the feature map), or 2 (every other pixel).
depth: Depth of the backbone feature map.
Returns a Keras Model object. The model outputs, when called, are:
rpn_logits: [batch, H, W, 2] Anchor classifier logits (before softmax)
rpn_probs: [batch, W, W, 2] Anchor classifier probabilities. rpn_probs就是每个anchor的score(2是前景和背景),
rpn_bbox: [batch, H, W, (dy, dx, log(dh), log(dw))] Deltas to be
applied to anchors.
"""
depth是传入的 1.b 获得的feature map的depth都是256, 其实最后返回的 rpn_probs(Anchor Score) 的 shape应该是 [batch, anchors, 2],由[batch, height, width, anchors per location * 2] reshape后得来。
rpn_box(Bounding box refinement.) [batch, H, W, anchors per location, depth]
# where depth is [x, y, log(w), log(h)] ,Reshape to [batch, anchors, 4]
这就和anchors的形状对应上了~
将每个 1.b 得到的 feature map 送进模型 RPN model,得到每层的feature_map的 rpn_logits,rpn_probs,rpn_bbox,然后把所有层相对应的rpn_logits连在一起,另两个一样。
函数:
rpn_rois = ProposalLayer(proposal_count=proposal_count,
nms_threshold=config.RPN_NMS_THRESHOLD,
name="ROI",
anchors=self.anchors,
cOnfig=config)([rpn_class, rpn_bbox])
"""Receives anchor scores and selects a subset to pass as proposals
to the second stage. Filtering is done based on anchor scores and
non-max suppression to remove overlaps. It also applies bounding
box refinement deltas to anchors.
Inputs:
rpn_probs: [batch, anchors, (bg prob, fg prob)]
rpn_bbox: [batch, anchors, (dy, dx, log(dh), log(dw))]
Returns:
Proposals in normalized coordinates [batch, rois, (y1, x1, y2, x2)]
"""
anchors就是网络中后5个阶段stages的所有anchors;
(1)call函数接收inputs,就是上面的 rpn_probs,rpn_bbox;然后把rpn_probs里的前景作为scores,取前K个scores大的anchors的下标(indices),然后根据这个indices,筛选出对应的scores,deltas,和anchors.
(2)# Apply deltas to anchors to get refined anchors. 对anchors根据deltas (dy,dx,dh,dw)进行调整
# 返回调整后的boxes : [batch, N, (y1, x1, y2, x2)] 相当于是新的anchors,因为更接近GT(ground truth)所以取名为boxes
(3)# Clip to image boundaries. [batch, N, (y1, x1, y2, x2)] 相当于把Boxes的边框限制在images的边界内
(4) Filter out small boxes
4.a Normalize dimensions to range of 0 to 1. --->normalized_boxes
4.b Non-max suppression 传入normalized_boxes和scores,进行NMS处理得到下标indices,然后根据indices筛选出对应的normalized_boxes,得到proposals(rpn_rois). # Pad if needed
# Subsamples proposals and generates target outputs for training
# Note that proposal class IDs, gt_boxes, and gt_masks are zero
# padded. Equally, returned rois and targets are zero padded.
class DetectionTargetLayer(KE.Layer):
"""Subsamples proposals and generates target box refinement, class_ids,
and masks for each.
Inputs:
proposals: [batch, N, (y1, x1, y2, x2)] in normalized coordinates. Might
be zero padded if there are not enough proposals.
gt_class_ids: [batch, MAX_GT_INSTANCES] Integer class IDs.
gt_boxes: [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] in normalized
coordinates.
gt_masks: [batch, height, width, MAX_GT_INSTANCES] of boolean type
Returns: Target ROIs and corresponding class IDs, bounding box shifts,
and masks.
rois: [batch, TRAIN_ROIS_PER_IMAGE, (y1, x1, y2, x2)] in normalized
coordinates
target_class_ids: [batch, TRAIN_ROIS_PER_IMAGE]. Integer class IDs.
target_deltas: [batch, TRAIN_ROIS_PER_IMAGE, NUM_CLASSES,
(dy, dx, log(dh), log(dw), class_id)]
Class-specific bbox refinements.
target_mask: [batch, TRAIN_ROIS_PER_IMAGE, height, width)
Masks cropped to bbox boundaries and resized to neural
network output size.
Note: Returned arrays might be zero padded if not enough target ROIs.
"""
# Compute overlaps matrix [proposals, gt_boxes]
overlaps = overlaps_graph(proposals, gt_boxes) 得到每个proposal对应于所有MAX_GT_INSTANCES数目的GT的IOU
tf.reduce_max(overlaps, axis=1) 计算每个proposal的最大值 (每行的最大值)
返回的rois等包括positive的proposals和negative的proposals和 0 padded
"""Builds the computation graph of the feature pyramid network classifier
and regressor heads.
rois: [batch, num_rois, (y1, x1, y2, x2)] Proposal boxes in normalized
coordinates.
feature_maps: List of feature maps from diffent layers of the pyramid,
[P2, P3, P4, P5]. Each has a different resolution.
image_shape: [height, width, depth]
pool_size: The width of the square feature map generated from ROI Pooling.
num_classes: number of classes, which determines the depth of the results
train_bn: Boolean. Train or freeze Batch Norm layres
Returns:
logits: [N, NUM_CLASSES] classifier logits (before softmax)
probs: [N, NUM_CLASSES] classifier probabilities
bbox_deltas: [N, (dy, dx, log(dh), log(dw))] Deltas to apply to
proposal boxes
"""
调用model.train进行模型的训练(此时将dataset传入)
utils.py文件里:
np.meshgrid()将一个一维数组变成二维矩阵 https://www.cnblogs.com/sunshinewang/p/6897966.html
config.py文件里:
BACKBONE_SHAPES 就是 feature map shape :[256 256] [128 128] [64 64] [32 32] [16 16]...
根据scores选前K个anchors, 然后refine剩下的anchors,然后将边框限定在image的边界,nomorlize之后进行NMS获得最终的proposals。
The RPN targets are the training values for the RPN. To generate the targets, we start with a grid of anchors that cover the full image at different scales, and then we compute the IoU of the anchors with ground truth object. Positive anchors are those that have an IoU >= 0.7 with any ground truth object, and negative anchors are those that don't cover any object by more than 0.3 IoU. Anchors in between (i.e. cover an object by IoU >= 0.3 but <0.7) are considered neutral and excluded from training.
To train the RPN regressor, we also compute the shift and resizing needed to make the anchor cover the ground truth object completely.
# Generate RPN trainig targetsimage.shape, model.anchors, gt_class_id, gt_bbox, model.config)
#### target_rpn_bbox最多是在config里面指定的,实际上存的有数据数量的是positive_anchors的数目,存的数据内容实际上是(dy, dx, log(dh), log(dw))【这个是通过positive_anchors和对应的GT_bboxs计算得到】。
然后调用utils.apply_box_deltas()函数对positive_anchors根据dy, dx, log(dh), log(dw) 进行微调
实线是微调后的,虚线是微调前的
复习了一下Faster-RCNN
https://blog.csdn.net/u013832707/article/details/53641055/
对于窗口(proposal)是用的是,中心点和长宽标记 P = (px, py, pw, ph), 其实真正的P是这个窗口对应的CNN特征
边框回归:(1)先做平移;(2)后做尺度缩放;所以需要学习四个变量:dx(P),dy(P), dw(P), dh(P);