Overview of the High Efficiency Video Coding(HEVC) Standard之四
H. 帧间预测 Interpicture Prediction
1) 预测块(PB)的划分 PB Partitioning:
Compared to intrapicture-predicted CBs, HEVC supports more PB partition shapes for interpicture-predicted CBs. The partitioning modes of PART_2N×2N, PART_2N×N, and PART_N×2N indicate the cases when the CB is not split, split into two equal-size PBs horizontally, and split into two equal-size PBs vertically, respectively. PART_N×N specifies that the CB is split into four equal-size PBs, but this mode is only supported when the CB size is equal to the smallest allowed CB size. In addition, there are four partitioning types that support splitting the CB into two PBs having different sizes: PART_2N×nU, PART_2N×nD, PART_nL×2N, and PART_nR×2N. These types are known as asymmetric motion partitions. 相对于帧内预测CB, HEVC对帧间预测CB提供了更多的PB划分形状; 下面四种模式对应的CB划分形状如下: PART_2N×2N, CB不划分; PART_2N×N, CB水平划分成两个相等尺寸的PB; PART_N×2N, CB垂直划分成两个相等尺寸的PB; PART_N×N, CB划分成四个相等尺寸的PB, 但是,只有当CB尺寸等于最小允许的CB尺寸时,这种模式才有效;
Fig. 7. Integer and fractional sample positions for luma interpolation.
2) 分像素插值 Fractional Sample Interpolation:
The samples of the PB for an intrapicture-predicted CB are obtained from those of corresponding block region in the reference picture identified by a reference picture index, which is at a position displaced by the horizontal and vertical components of the motion vector. 对于帧间预测编码块(CB)的预测块(PB)像素是从参考图像--以参考图像索引标记-- 的对应块区域得到,这个位置表示为运动矢量的水平和垂直分量;
Except for the case when the motion vector has an integer value, fractional sample interpolation is used to generate the prediction samples for noninteger sampling positions. As in H.264/MPEG-4 AVC, HEVC supports motion vectors with units of one quarter of the distance between luma samples. 除了使用整数值的MV外,为了相邻像素位置,分像素插值被用来生成预测像素。 和H.264/MPEG-4 AVC一样,HEVC也支持四分之一亮度像素的MV;
For chroma samples, the motion vector accuracy is determined according to the chroma sampling format, which for 4:2:0 sampling results in units of one eighth of the distance between chroma samples. 对于色度像素来说,MV的精度依据色度像素格式来确定, 对于4:2:0像素格式,MV的精度为八分之一像素;
The fractional sample interpolation for luma samples in HEVC uses separable application of an eight-tap filter for the half-sample positions and a seven-tap filter for the quartersample positions. This is in contrast to the process used in H.264/MPEG-4 AVC, which applies a two-stage interpolation process by first generating the values of one or two neighboring samples at half-sample positions using six-tap filtering, rounding the intermediate results, and then averaging two values at integer or half-sample positions. HEVC instead uses a single consistent separable interpolation process for generating all fractional positions without intermediate rounding operations, which improves precision and simplifies the architecture of the fractional sample interpolation. The interpolation precision is also improved in HEVC by using longer filters, i.e., seven-tap or eight-tap filtering rather than the sixtap filtering used in H.264/MPEG-4 AVC. Using only seven taps rather than the eight used for half-sample positions was sufficient for the quarter-sample interpolation positions since the quarter-sample positions are relatively close to integer sample positions, so the most distant sample in an eight-tap interpolator would effectively be farther away than in the half sample case (where the relative distances of the integer-sample positions are symmetric). The actual filter tap values of the interpolation filtering kernel were partially derived from DCT basis function equations. 在HEVC中,亮度像素的分像素插值应用了两种方法: 对半像素使用八阶滤波器; 对四分之一像素使用7阶滤波器; 这一点和H.264/MPEG-4 AVC是不一样的; H.264/MPEG-4 AVC是用的两步插值处理: 先使用六阶滤波器,舍入均值 ,在半像素位置生成一个或两个相邻像素的值; 然后在整像素和半像素位置取两个值的平均; HEVC对所有分像素位置使用了独立的插值处理,而不用中间的舍入操作, 这种方式提高了精度并简化了分像素插值的架构; 而且,在HEVC中,使用更长的滤波器,如七阶和八阶滤波器来提高插值精度, 而不是像在H.264/MPEG-4 AVC中用的六阶滤波器; 对半像素位置使用七阶滤波器,而不像四分之一插值像素位置使用八阶滤波器, 是因为四分之一像素位置更接近整像素位置, 因此,在八阶插值中,最远的像素相比半像素情况会更远; 在半像素中,相对于整像素的位置是非对称的; 实际上,插值滤波器内核的滤波阶值部分是从DCT基本函数等式中推导出来的;
In Fig. 7, the positions labeled with upper-case letters, Ai,j , represent the available luma samples at integer sample locations, whereas the other positions labeled with lower-case letters represent samples at noninteger sample locations, which need to be generated by interpolation. 在图7中,标记为大写字母的位置,Ai,j, 表示在整像素位置有效的亮度像素; 因此,其它的标记为小写字母的位置表示非整数像素位置的像素,它们是需要插值生成的;
The samples labeled a0,j, b0,j, c0,j, d0,0, h0,0, and n0,0 are derived from the samples Ai,j by applying the eight-tap filter for half-sample positions and the seven-tap filter for the quarter-sample positions as follows: a0,j, b0,j, c0,j, d0,0, h0,0, and n0,0像素都是对Ai,j像素, 在半像素位置时,用八阶滤波器, 在四分之一像素位置,用七阶滤波器,推导等式如下:
where the constant B ≥ 8 is the bit depth of the reference samples (and typically B = 8 for most applications) and the filter coefficient values are given in Table II. In these formulas, >> denotes an arithmetic right shift operation. 等式中,B是参考像素的比特深度,通常为8; 滤波器系数值如表II中所示, 在这些等式中,>>表示算术右移操作;
TABLE II Filter Coefficients for Luma Fractional Sample Interpolation
The samples labeled e0,0, f0,0, g0,0, i0,0, j0,0, k0,0, p0,0, q0,0, and r0,0 can be derived by applying the corresponding filters to samples located at vertically adjacent a0,j, b0,j and c0,j positions as follows: 像素e0,0, f0,0, g0,0, i0,0, j0,0, k0,0, p0,0, q0,0,and r0,0的值是 对垂直相邻的像素位置a0,j, b0,j and c0,j使用如下等式得到的:
The interpolation filtering is separable when B is equal to 8, so the same values could be computed in this case by applying the vertical filtering before the horizontal filtering. When implemented appropriately, the motion compensation process of HEVC can be performed using only 16-b storage elements (although care must be taken to do this correctly). 当B等于8时,插值滤波器是独立的; 因此,同一值在水平滤波之前已被垂直滤波计算; 如果实现得很好,HEVC的运动补偿处理可以只需要16比特的存储空间;
It is at this point in the process that weighted prediction is applied when selected by the encoder. Whereas H.264/MPEG-4 AVC supported both temporally implicit and explicit weighted prediction, in HEVC only explicit weighted prediction is applied, by scaling and offsetting the prediction with values sent explicitly by the encoder. The bit depth of the prediction is then adjusted to the original bit depth of the reference samples. In the case of uniprediction, the interpolated (and possibly weighted) prediction value is rounded, right-shifted, and clipped to have the original bit depth. In the case of biprediction, the interpolated (and possibly weighted) prediction values from two PBs are added first, and then rounded, right-shifted, and clipped. 如果编码器有选择了,那么现在进入权值预测处理; H.264/MPEG-4 AVC支持隐示和显示的权值预测; 而在HEVC中,只能使用显示的权值预测; 需要通过缩放和位移预测值并显式地在编码端发送来实现; 然后,预测的比特深度调整到参考像素原始比特深度; 在单向预测中,插值预测值被舍入,右移,并切断到原始比特深度; 在双向预测中,从两个PB中得到的插值预测值先被相加,然后舍入,右移和切断;
In H.264/MPEG-4 AVC, up to three stages of rounding operations are required to obtain each prediction sample (for samples located at quarter-sample positions). If biprediction is used, the total number of rounding operations is then seven in the worst case. In HEVC, at most two rounding operations are needed to obtain each sample located at the quarter-sample positions, thus five rounding operations are sufficient in the worst case when biprediction is used. Moreover, in the most common usage, where the bit depth B is 8 b, the total number of rounding operations in the worst case is further reduced to 3. Due to the lower number of rounding operations, the accumulated rounding error is decreased and greater flexibility is enabled in regard to the manner of performing the necessary operations in the decoder. 在H.264/MPEG-4 AVC中,需要对第个预测像素(位于四分之一像素位置的像素) 进行三步的舍入操作; 而如果是双向预测,则在最坏的情况下,需要最多可能到七步的舍入操作; 在HEVC中,最多需要两步舍入操作来得到每个位于四分之一像素位置的像素; 因此,对于双向预测,最多只需要五步的舍入操作; 而且,对于最通常的情况,比特尝试B等于8时,在最坏情况下整个舍入操作也 只需要三步; 由于舍入操作步骤的减少,累积的舍入错误会增加,但对于解码器来说, 有了更多的灵活性;
The fractional sample interpolation process for the chroma components is similar to the one for the luma component, except that the number of filter taps is 4 and the fractional accuracy is 1/8 for the usual 4:2:0 chroma format case. HEVC defines a set of four-tap filters for eighth-sample positions, as given in Table III for the case of 4:2:0 chroma format (where, in H.264/MPEG-4 AVC, only two-tap bilinear filtering was applied). 对于色度分量的分像素插值处理和亮度分量是相似的; 只是在4:2:0色度格式下,分像素的精度为1/8,并且使用四阶滤波器; HEVC对八分之一像素位置定义了一个四阶滤波器集来处理, 如表III中所示:
TABLE III Filter Coefficients for Chroma FractionalSample Interpolation
Filter coefficient values denoted as filter1[i], filter2[i], filter3[ i], and filter4[i] with i = _1,. . . , 2 are used for interpolating the 1/8th, 2/8th, 3/8th, and 4/8th fractional positions for the chroma samples, respectively. Using symmetry for the 5/8th, 6/8th, and 7/8th fractional positions, the mirrored values of filter3[1_i], filter2[1_i], and filter1[1_i] with i = _1, . . . , 2 are used, respectively. 标记为filter1[i], filter2[i], filter3[i], and filter4[i] with i = _1,. . . , 2 是滤波系数值用于1/8th, 2/8th, 3/8th, and 4/8th分像素位置的插值; 对于非对称的5/8th, 6/8th, and 7/8th分像素位置, 则使用filter3[1_i], filter2[1_i], and filter1[1_i] with i = _1, . . . ,2的镜像值;
3) 合并模式 Merge Mode:
Motion information typically consists of the horizontal and vertical motion vector displacement values, one or two reference picture indices, and, in the case of prediction regions in B slices, an identification of which reference picture list is associated with each index. HEVC includes a merge mode to derive the motion information from spatially or temporally neighboring blocks. It is denoted as merge mode since it forms a merged region sharing all motion information. 运动信息通常由 水平和垂直运动矢量位移值, 一个或两个(对于B片,每个参考图像列表都有一个索引)参考图像索引组成; HEVC允许使用一个合并模式来从空域或时域相邻块来推导运动停下; 命名为合并模式是因为这种方式共享了所有的运动信息来形成一个合并区域;
The merge mode is conceptually similar to the direct and skip modes in H.264/MPEG-4 AVC. However, there are two important differences. First, it transmits index information to select one out of several available candidates, in a manner sometimes referred to as a motion vector competition scheme. It also explicitly identifies the reference picture list and reference picture index, whereas the direct mode assumes that these have some predefined values. 合并模式在概念上和H.264/MPEG-4 AVC中的direct和skip模式相似; 然而,这两者有两个很大的不同点: 首先,它是从多个有效候选中选择一个出来作为索引信息传输,这是一种MV竞争方案; 其次,它显式地标识了参考图像列表和参考图像索引,而direct模式假定这个的值是相同的;
Fig. 8. Positions of spatial candidates of motion information.
The set of possible candidates in the merge mode consists of spatial neighbor candidates, a temporal candidate, and generated candidates. Fig. 8 shows the positions of five spatial candidates. For each candidate position, the availability is checked according to the order {a1, b1, b0, a0, b2}. If the block located at the position is intrapicture predicted or the position is outside of the current slice or tile, it is considered as unavailable. 合并模式中的可能候选者由 空域相邻候选者, 时域相邻候选者, 生成的候选者组成。 图8显示了5个空域候选者的位置; 对于每个候选者的位置,依据{a1, b1, b0, a0, b2}这个顺序来检查有效性; 如果这个块的位置是帧内预测模式,或是超出了当前片或瓦片,就认为它是无效的;
After validating the spatial candidates, two kinds of redundancy are removed. If the candidate position for the current PU would refer to the first PU within the same CU, the position is excluded, as the same merge could be achieved by a CU without splitting into prediction partitions. Furthermore, any redundant entries where candidates have exactly the same motion information are also excluded. 在对空域候选者验证完成后,下面两种类型的冗余被移除: 对于当前PU, 如果候选者的位置是同一个CU中的第一个PU,这个位置的候选者被排除; 因为同样的合并可以通过不对预测单元进行划分来实现; 有着完全相同运动信息的候选都也要被移除;
For the temporal candidate, the right bottom position just outside of the collocated PU of the reference picture is used if it is available. Otherwise, the center position is used instead. The way to choose the collocated PU is similar to that of prior standards, but HEVC allows more flexibility by transmitting an index to specify which reference picture list is used for the collocated reference picture. 对于时域候选者,参考图像对应PU外的右下位置,如果有效,则可以用作候选者; 否则,使用中心位置来代替; 这种选择对应位置PU的方法在以前的编码标准中也多有应用; 而HEVC只是通过传输一个索引来说明哪个参考图像列表被用作对应参考图像,这样的更灵活;
One issue related to the use of the temporal candidate is the amount of the memory to store the motion information of the reference picture. This is addressed by restricting the granularity for storing the temporal motion candidates to only the resolution of a 16×16 luma grid, even when smaller PB structures are used at the corresponding location in the reference picture. In addition, a PPS-level flag allows the encoder to disable the use of the temporal candidate, which is useful for applications with error-prone transmission. 时域候选者的一个问题是存储参考图像的运动信息需要内存开销; 这个问题可以通过限制存储时域运动候选者的粒度到来解决, 如只允许16x16的亮度网格,即使更小的PB结构被用于参考图像对应位置; 另外,在PPS级有标志可以关闭时域候选者的使用, 这对于易出错传输链路的应用很有用;
The maximum number of merge candidates C is specified in the slice header. If the number of merge candidates found (including the temporal candidate) is larger than C, only the first C – 1 spatial candidates and the temporal candidate are retained. Otherwise, if the number of merge candidates identified is less than C, additional candidates are generated until the number is equal to C. This simplifies the parsing and makes it more robust, as the ability to parse the coded data is not dependent on merge candidate availability. 合并候选者的最大数目C定义在片头; 如果发现合并候选者的数目大于C, 则只有前C-1个空域候选者和时域候选者有效; 否则,如果合并候选者的数目小于C, 需要生成额外的候选者直到数目等于C; 这种方式简化的解析,并且使其更健壮; 因为解析编码数据的能力不依赖于合并候选者的有效性;
For B slices, additional merge candidates are generated by choosing two existing candidates according to a predefined order for reference picture list 0 and list 1. For example, the first generated candidate uses the first merge candidate for list 0 and the second merge candidate for list 1. HEVC specifies a total of 12 predefined pairs of two in the following order in the already constructed merge candidate list as (0, 1), (1, 0), (0, 2), (2, 0), (1, 2), (2, 1), (0, 3), (3, 0), (1, 3), (3, 1), (2, 3), and (3, 2). Among them, up to five candidates can be included after removing redundant entries. 对于B片,额外的合并候选者,依据参考图像列表0和列表1预定义的顺序, 选取两个存在的候选者来得到; 例如,第一个生成的候选者使用列表0第一个合并候选者; 而第二个生成的候选者使用列表1第一个合并修行者; HEVC以下面的顺序定义了12个已重建的合并候选者组成预定义对的候选者: (0, 1), (1, 0), (0, 2), (2, 0), (1, 2), (2, 1), (0, 3), (3, 0), (1, 3), (3, 1), (2, 3), (3, 2). 在它们中间,在移除冗余后,最多可以用五个候选者;
When the slice is a P slice or the number of merge candidates is still less than C, zero motion vectors associated with reference indices from zero to the number of reference pictures minus one are used to fill any remaining entries in the merge candidate list. 当当前片为P片或合并候选者的数量仍小于C时, 参考索引从0到索引数减一的相应的零运动矢量被用于填充合并候选者列表;
In HEVC, the skip mode is treated as a special case of the merge mode when all coded block flags are equal to zero. In this specific case, only a skip flag and the corresponding merge index are transmitted to the decoder. The B-direct mode of H.264/MPEG-4 AVC is also replaced by the merge mode, since the merge mode allows all motion information to be derived from the spatial and temporal motion information of the neighboring blocks with residual coding. 在HEVC中,当所有的编码块标志等于零时,skip模式被处理成特别的合并模式,; 在这种特殊情况中,只有skip标志和对应的合并索引传输给解码器; H.264/MPEG-4 AVC 中B-direct模式也被合并模式代替, 因为合并模式允许所有的运动信息从相邻块空域和时域运动信息中推导得到;
4) 非合并模式的运动矢量预测 Motion Vector Prediction for Nonmerge Mode:
When an interpicture-predicted CB is not coded in the skip or merge modes, the motion vector is differentially coded using a motion vector predictor. Similar to the merge mode, HEVC allows the encoder to choose the motion vector predictor among multiple predictor candidates. The difference between the predictor and the actual motion vector and the index of the candidate are transmitted to the decoder. 当帧间预测CB不能被编码成skip或合并模式时,MV就使用运动矢量预测值与MV的差来编码; 和合并模式类似,HEVC允许编码器在多个预测值候选中选择MV预测值; 预测值和实际MV间的差值以及候选者的索引被一起传输给解码器;
Only two spatial motion candidates are chosen according to the availability among five candidates in Fig. 8. The first spatial motion candidate is chosen from the set of left positions {a0, a1} and the second one from the set of above positions {b0, b1, b2} according to their availabilities, while keeping the searching order as indicated in the two sets. 在图8中,依据其有效性,在五个候选中间只有两个空域运动候选者; 依据它们的有效性, 第一个空域运动候选者从左位置{a0, a1}集中产生; 第二个空域运动候选者从上位置{b0, b1, b2}集中产生; 并且是以这个顺序来搜索;
HEVC only allows a much lower number of candidates to be used in the motion vector prediction process for the nonmerge case, since the encoder can send a coded difference to change the motion vector. Furthermore, the encoder needs to perform motion estimation, which is one of the most computationally expensive operations in the encoder, and complexity is reduced by allowing a small number of candidates. 在非合并模式情况下,HEVC允许用于MV预测处理的候选者个数更少, 因为编码端可以通过发送编码后的差值来进行运动估计; 而且, 因为编码端要执行需要大量计算资源的运动估计, 更少的候选者能降低计算复杂度;
When the reference index of the neighboring PU is not equal to that of the current PU, a scaled version of the motion vector is used. The neighboring motion vector is scaled according to the temporal distances between the current picture and the reference pictures indicated by the reference indices of the neighboring PU and the current PU, respectively. When two spatial candidates have the same motion vector components, one redundant spatial candidate is excluded. 如果相邻PU的参考索引不等于当前的PU时,需要对MV进行缩放; 依据当前图像和由相邻PU的参考索引指示的参考图像时域距离, 对相邻MV进行缩放; 当两个空域候选者有相同的MV分量时,需要移除一个冗余的空域候选者;
When the number of motion vector predictors is not equal to two and the use of temporal MV prediction is not explicitly disabled, the temporal MV prediction candidate is included. This means that the temporal candidate is not used at all when two spatial candidates are available. Finally, a zero motion vector is included repeatedly until the number of motion vector prediction candidates is equal to two, which guarantees that the number of motion vector predictors is two. Thus, only a coded flag is necessary to identify which motion vector prediction is used in the case of nonmerge mode. 当MV预测值个数不等于2且时域MV预测被有被显示关闭时, 就可以使用时域MV预测候选者; 换句话说,意思是当两个空域候选者都有效时,就不能用时域候选者; 最后,零运动矢量可以重复使用直到MV的预测候选者为2, 这样就保证了MV预测值的个数为2; 因此,在MV预测在非合并模式中只需要用一个标志来标识; 六三,含章可贞,或从王事,无成有终。 【白话】六三,胸怀才华而不显露,如果辅佐君主,能克尽职守,功成不居。