HEVC核心编码技术之三.帧间预测

作者：rz白雪 | 来源：互联网 | 2023-08-10 09:59

OverviewoftheHighEfficiencyVideoCoding(HEVC)Standard之四H.帧间预测InterpicturePredic

Overview of the High Efficiency Video Coding(HEVC) Standard之四

H. 帧间预测
Interpicture Prediction

1) 预测块(PB)的划分
PB Partitioning:

Compared to intrapicture-predicted
CBs, HEVC supports more PB partition shapes for
interpicture-predicted CBs. The partitioning modes of
PART_2N×2N, PART_2N×N, and PART_N×2N indicate
the cases when the CB is not split, split into two equal-size
PBs horizontally, and split into two equal-size PBs vertically,
respectively. PART_N×N specifies that the CB is split into
four equal-size PBs, but this mode is only supported when the
CB size is equal to the smallest allowed CB size. In addition,
there are four partitioning types that support splitting the
CB into two PBs having different sizes: PART_2N×nU,
PART_2N×nD, PART_nL×2N, and PART_nR×2N. These
types are known as asymmetric motion partitions.
相对于帧内预测CB, HEVC对帧间预测CB提供了更多的PB划分形状；
下面四种模式对应的CB划分形状如下:
PART_2N×2N, CB不划分；
PART_2N×N, CB水平划分成两个相等尺寸的PB;
PART_N×2N, CB垂直划分成两个相等尺寸的PB;
PART_N×N, CB划分成四个相等尺寸的PB,
但是，只有当CB尺寸等于最小允许的CB尺寸时，这种模式才有效；

另外，还有四种划分类型将CB划分成两个不同尺寸的PB：
ART_2N×nU,
PART_2N×nD,
PART_nL×2N,
PART_nR×2N.
这些划分类型被称作非对称运动划分；

Fig. 7. Integer and fractional sample positions for luma interpolation.

2) 分像素插值
Fractional Sample Interpolation:

The samples of the PB for an intrapicture-predicted CB are obtained from those of
corresponding block region in the reference picture identified
by a reference picture index, which is at a position displaced
by the horizontal and vertical components of the motion vector.
对于帧间预测编码块(CB)的预测块(PB)像素是从参考图像--以参考图像索引标记--
的对应块区域得到，这个位置表示为运动矢量的水平和垂直分量；

Except for the case when the motion vector has an integer
value, fractional sample interpolation is used to generate the
prediction samples for noninteger sampling positions. As in
H.264/MPEG-4 AVC, HEVC supports motion vectors with
units of one quarter of the distance between luma samples.
除了使用整数值的MV外，为了相邻像素位置，分像素插值被用来生成预测像素。
和H.264/MPEG-4 AVC一样，HEVC也支持四分之一亮度像素的MV；

For chroma samples, the motion vector accuracy is determined
according to the chroma sampling format, which for 4:2:0
sampling results in units of one eighth of the distance between
chroma samples.
对于色度像素来说，MV的精度依据色度像素格式来确定，
对于4:2:0像素格式，MV的精度为八分之一像素；

The fractional sample interpolation for luma samples in
HEVC uses separable application of an eight-tap filter for the
half-sample positions and a seven-tap filter for the quartersample
positions. This is in contrast to the process used in
H.264/MPEG-4 AVC, which applies a two-stage interpolation
process by first generating the values of one or two
neighboring samples at half-sample positions using six-tap
filtering, rounding the intermediate results, and then averaging
two values at integer or half-sample positions. HEVC instead
uses a single consistent separable interpolation process for
generating all fractional positions without intermediate rounding
operations, which improves precision and simplifies the
architecture of the fractional sample interpolation. The interpolation
precision is also improved in HEVC by using longer
filters, i.e., seven-tap or eight-tap filtering rather than the sixtap
filtering used in H.264/MPEG-4 AVC. Using only seven
taps rather than the eight used for half-sample positions was
sufficient for the quarter-sample interpolation positions since
the quarter-sample positions are relatively close to integer sample
positions, so the most distant sample in an eight-tap
interpolator would effectively be farther away than in the half sample
case (where the relative distances of the integer-sample
positions are symmetric). The actual filter tap values of the
interpolation filtering kernel were partially derived from DCT
basis function equations.
在HEVC中，亮度像素的分像素插值应用了两种方法:
对半像素使用八阶滤波器；
对四分之一像素使用7阶滤波器；
这一点和H.264/MPEG-4 AVC是不一样的；
H.264/MPEG-4 AVC是用的两步插值处理:
先使用六阶滤波器，舍入均值，在半像素位置生成一个或两个相邻像素的值；
然后在整像素和半像素位置取两个值的平均；
HEVC对所有分像素位置使用了独立的插值处理，而不用中间的舍入操作，
这种方式提高了精度并简化了分像素插值的架构；
而且，在HEVC中，使用更长的滤波器，如七阶和八阶滤波器来提高插值精度，
而不是像在H.264/MPEG-4 AVC中用的六阶滤波器；
对半像素位置使用七阶滤波器，而不像四分之一插值像素位置使用八阶滤波器，
是因为四分之一像素位置更接近整像素位置，
因此，在八阶插值中，最远的像素相比半像素情况会更远；
在半像素中，相对于整像素的位置是非对称的；
实际上，插值滤波器内核的滤波阶值部分是从DCT基本函数等式中推导出来的；

In Fig. 7, the positions labeled with upper-case letters,
Ai,j , represent the available luma samples at integer sample
locations, whereas the other positions labeled with lower-case
letters represent samples at noninteger sample locations, which
need to be generated by interpolation.
在图7中，标记为大写字母的位置，Ai,j，表示在整像素位置有效的亮度像素；
因此，其它的标记为小写字母的位置表示非整数像素位置的像素，它们是需要插值生成的；

The samples labeled a0,j, b0,j, c0,j, d0,0, h0,0, and n0,0
are derived from the samples Ai,j by applying the eight-tap
filter for half-sample positions and the seven-tap filter for the
quarter-sample positions as follows:
a0,j, b0,j, c0,j, d0,0, h0,0, and n0,0像素都是对Ai,j像素，
在半像素位置时，用八阶滤波器，
在四分之一像素位置，用七阶滤波器，推导等式如下：

a0,j = (i=_3..3 Ai,j qfilter[i]) >> (B _ 8)
b0,j = (i=_3..4 Ai,j hfilter[i]) >> (B _ 8)
c0,j = (i=_2..4 Ai,j qfilter[1 _ i]) >> (B _ 8)
d0,0 = (i=_3..3 A0,j qfilter[j]) >> (B _ 8)
h0,0 = (i=_3..4 A0,j hfilter[j]) >> (B _ 8)
n0,0 = (j=_2..4 A0,j qfilter[1 _ j]) >> (B _ 8)

where the constant B ≥ 8 is the bit depth of the reference
samples (and typically B = 8 for most applications) and the
filter coefficient values are given in Table II. In these formulas,
>> denotes an arithmetic right shift operation.
等式中，B是参考像素的比特深度，通常为8；
滤波器系数值如表II中所示，
在这些等式中，>>表示算术右移操作；

TABLE II
Filter Coefficients for Luma Fractional Sample Interpolation

The samples labeled e0,0, f0,0, g0,0, i0,0, j0,0, k0,0, p0,0, q0,0,
and r0,0 can be derived by applying the corresponding filters
to samples located at vertically adjacent a0,j, b0,j and c0,j
positions as follows:
像素e0,0, f0,0, g0,0, i0,0, j0,0, k0,0, p0,0, q0,0,and r0,0的值是
对垂直相邻的像素位置a0,j, b0,j and c0,j使用如下等式得到的：

e0,0 = (v=_3..3 a0,v qfilter[v]) >> 6
f0,0 = (v=_3..3 b0,v qfilter[v]) >> 6
g0,0 = (v=_3..3 c0,v qfilter[v]) >> 6
i0,0 = (v=_3..4 a0,v hfilter[v]) >> 6
j0,0 = (v=_3..4 b0,v hfilter[v]) >> 6
k0,0 = (v=_3..4 c0,v hfilter[v]) >> 6
p0,0 = (v=_2..4 a0,v qfilter[1 _ v]) >> 6
q0,0 = (v=_2..4 b0,v qfilter[1 _ v]) >> 6
r0,0 = (v=_2..4 c0,v qfilter[1 _ v]) >> 6.

The interpolation filtering is separable when B is equal to
8, so the same values could be computed in this case by
applying the vertical filtering before the horizontal filtering.
When implemented appropriately, the motion compensation
process of HEVC can be performed using only 16-b storage
elements (although care must be taken to do this correctly).
当B等于8时，插值滤波器是独立的；
因此，同一值在水平滤波之前已被垂直滤波计算；
如果实现得很好，HEVC的运动补偿处理可以只需要16比特的存储空间；

It is at this point in the process that weighted prediction
is applied when selected by the encoder. Whereas
H.264/MPEG-4 AVC supported both temporally implicit and
explicit weighted prediction, in HEVC only explicit weighted
prediction is applied, by scaling and offsetting the prediction
with values sent explicitly by the encoder. The bit depth of
the prediction is then adjusted to the original bit depth of
the reference samples. In the case of uniprediction, the interpolated
(and possibly weighted) prediction value is rounded,
right-shifted, and clipped to have the original bit depth. In the
case of biprediction, the interpolated (and possibly weighted)
prediction values from two PBs are added first, and then
rounded, right-shifted, and clipped.
如果编码器有选择了，那么现在进入权值预测处理；
H.264/MPEG-4 AVC支持隐示和显示的权值预测；
而在HEVC中，只能使用显示的权值预测；
需要通过缩放和位移预测值并显式地在编码端发送来实现；
然后，预测的比特深度调整到参考像素原始比特深度；
在单向预测中，插值预测值被舍入，右移，并切断到原始比特深度；
在双向预测中，从两个PB中得到的插值预测值先被相加，然后舍入，右移和切断；

In H.264/MPEG-4 AVC, up to three stages of rounding
operations are required to obtain each prediction sample (for
samples located at quarter-sample positions). If biprediction is
used, the total number of rounding operations is then seven
in the worst case. In HEVC, at most two rounding operations
are needed to obtain each sample located at the quarter-sample
positions, thus five rounding operations are sufficient in the
worst case when biprediction is used. Moreover, in the most
common usage, where the bit depth B is 8 b, the total number
of rounding operations in the worst case is further reduced
to 3. Due to the lower number of rounding operations, the
accumulated rounding error is decreased and greater flexibility
is enabled in regard to the manner of performing the necessary
operations in the decoder.
在H.264/MPEG-4 AVC中，需要对第个预测像素(位于四分之一像素位置的像素)
进行三步的舍入操作；
而如果是双向预测，则在最坏的情况下，需要最多可能到七步的舍入操作；
在HEVC中，最多需要两步舍入操作来得到每个位于四分之一像素位置的像素；
因此，对于双向预测，最多只需要五步的舍入操作；
而且，对于最通常的情况，比特尝试B等于8时，在最坏情况下整个舍入操作也
只需要三步；
由于舍入操作步骤的减少，累积的舍入错误会增加，但对于解码器来说，
有了更多的灵活性；

The fractional sample interpolation process for the chroma
components is similar to the one for the luma component,
except that the number of filter taps is 4 and the fractional
accuracy is 1/8 for the usual 4:2:0 chroma format case. HEVC
defines a set of four-tap filters for eighth-sample positions, as
given in Table III for the case of 4:2:0 chroma format (where,
in H.264/MPEG-4 AVC, only two-tap bilinear filtering was applied).
对于色度分量的分像素插值处理和亮度分量是相似的；
只是在4：2：0色度格式下，分像素的精度为1/8，并且使用四阶滤波器；
HEVC对八分之一像素位置定义了一个四阶滤波器集来处理，
如表III中所示：

TABLE III
Filter Coefficients for Chroma Fractional Sample Interpolation

Filter coefficient values denoted as filter1[i], filter2[i], filter3[
i], and filter4[i] with i = _1,. . . , 2 are used for interpolating
the 1/8th, 2/8th, 3/8th, and 4/8th fractional positions
for the chroma samples, respectively. Using symmetry for the
5/8th, 6/8th, and 7/8th fractional positions, the mirrored values
of filter3[1_i], filter2[1_i], and filter1[1_i] with i = _1, . . . ,
2 are used, respectively.
标记为filter1[i], filter2[i], filter3[i], and filter4[i] with i = _1,. . . , 2
是滤波系数值用于1/8th, 2/8th, 3/8th, and 4/8th分像素位置的插值；
对于非对称的5/8th, 6/8th, and 7/8th分像素位置，
则使用filter3[1_i], filter2[1_i], and filter1[1_i] with i = _1, . . . ,2的镜像值；

3) 合并模式
Merge Mode:

Motion information typically consists of
the horizontal and vertical motion vector displacement values,
one or two reference picture indices, and, in the case of prediction
regions in B slices, an identification of which reference
picture list is associated with each index. HEVC includes a
merge mode to derive the motion information from spatially
or temporally neighboring blocks. It is denoted as merge mode
since it forms a merged region sharing all motion information.
运动信息通常由
水平和垂直运动矢量位移值，
一个或两个(对于B片，每个参考图像列表都有一个索引)参考图像索引组成；
HEVC允许使用一个合并模式来从空域或时域相邻块来推导运动停下；
命名为合并模式是因为这种方式共享了所有的运动信息来形成一个合并区域；

The merge mode is conceptually similar to the direct and
skip modes in H.264/MPEG-4 AVC. However, there are two
important differences. First, it transmits index information to
select one out of several available candidates, in a manner
sometimes referred to as a motion vector competition scheme.
It also explicitly identifies the reference picture list and reference
picture index, whereas the direct mode assumes that
these have some predefined values.
合并模式在概念上和H.264/MPEG-4 AVC中的direct和skip模式相似；
然而，这两者有两个很大的不同点:
首先，它是从多个有效候选中选择一个出来作为索引信息传输，这是一种MV竞争方案；
其次，它显式地标识了参考图像列表和参考图像索引，而direct模式假定这个的值是相同的；

Fig. 8. Positions of spatial candidates of motion information.

The set of possible candidates in the merge mode consists
of spatial neighbor candidates, a temporal candidate, and
generated candidates. Fig. 8 shows the positions of five spatial
candidates. For each candidate position, the availability is
checked according to the order {a1, b1, b0, a0, b2}. If the
block located at the position is intrapicture predicted or the
position is outside of the current slice or tile, it is considered
as unavailable.
合并模式中的可能候选者由
空域相邻候选者，
时域相邻候选者，
生成的候选者组成。
图8显示了5个空域候选者的位置；
对于每个候选者的位置，依据{a1, b1, b0, a0, b2}这个顺序来检查有效性；
如果这个块的位置是帧内预测模式，或是超出了当前片或瓦片，就认为它是无效的；

After validating the spatial candidates, two kinds of redundancy
are removed. If the candidate position for the current
PU would refer to the first PU within the same CU, the
position is excluded, as the same merge could be achieved by
a CU without splitting into prediction partitions. Furthermore,
any redundant entries where candidates have exactly the same
motion information are also excluded.
在对空域候选者验证完成后，下面两种类型的冗余被移除：
对于当前PU, 如果候选者的位置是同一个CU中的第一个PU,这个位置的候选者被排除；
因为同样的合并可以通过不对预测单元进行划分来实现；
有着完全相同运动信息的候选都也要被移除；

For the temporal candidate, the right bottom position just
outside of the collocated PU of the reference picture is used if
it is available. Otherwise, the center position is used instead.
The way to choose the collocated PU is similar to that of prior
standards, but HEVC allows more flexibility by transmitting
an index to specify which reference picture list is used for the
collocated reference picture.
对于时域候选者，参考图像对应PU外的右下位置，如果有效，则可以用作候选者；
否则，使用中心位置来代替；
这种选择对应位置PU的方法在以前的编码标准中也多有应用；
而HEVC只是通过传输一个索引来说明哪个参考图像列表被用作对应参考图像，这样的更灵活；

One issue related to the use of the temporal candidate is
the amount of the memory to store the motion information
of the reference picture. This is addressed by restricting the
granularity for storing the temporal motion candidates to only
the resolution of a 16×16 luma grid, even when smaller
PB structures are used at the corresponding location in the
reference picture. In addition, a PPS-level flag allows the
encoder to disable the use of the temporal candidate, which is
useful for applications with error-prone transmission.
时域候选者的一个问题是存储参考图像的运动信息需要内存开销；
这个问题可以通过限制存储时域运动候选者的粒度到来解决，
如只允许16x16的亮度网格，即使更小的PB结构被用于参考图像对应位置；
另外，在PPS级有标志可以关闭时域候选者的使用，
这对于易出错传输链路的应用很有用；

The maximum number of merge candidates C is specified
in the slice header. If the number of merge candidates found
(including the temporal candidate) is larger than C, only the
first C – 1 spatial candidates and the temporal candidate
are retained. Otherwise, if the number of merge candidates
identified is less than C, additional candidates are generated
until the number is equal to C. This simplifies the parsing and
makes it more robust, as the ability to parse the coded data is
not dependent on merge candidate availability.
合并候选者的最大数目C定义在片头；
如果发现合并候选者的数目大于C, 则只有前C-1个空域候选者和时域候选者有效；
否则，如果合并候选者的数目小于C, 需要生成额外的候选者直到数目等于C;
这种方式简化的解析，并且使其更健壮；
因为解析编码数据的能力不依赖于合并候选者的有效性；

For B slices, additional merge candidates are generated by
choosing two existing candidates according to a predefined
order for reference picture list 0 and list 1. For example, the
first generated candidate uses the first merge candidate for list
0 and the second merge candidate for list 1. HEVC specifies
a total of 12 predefined pairs of two in the following order in
the already constructed merge candidate list as (0, 1), (1, 0),
(0, 2), (2, 0), (1, 2), (2, 1), (0, 3), (3, 0), (1, 3), (3, 1), (2, 3),
and (3, 2). Among them, up to five candidates can be included
after removing redundant entries.
对于B片，额外的合并候选者，依据参考图像列表0和列表1预定义的顺序，
选取两个存在的候选者来得到；
例如，第一个生成的候选者使用列表0第一个合并候选者；
而第二个生成的候选者使用列表1第一个合并修行者；
HEVC以下面的顺序定义了12个已重建的合并候选者组成预定义对的候选者：
(0, 1), (1, 0),
(0, 2), (2, 0),
(1, 2), (2, 1),
(0, 3), (3, 0),
(1, 3), (3, 1),
(2, 3), (3, 2).
在它们中间，在移除冗余后，最多可以用五个候选者；

When the slice is a P slice or the number of merge
candidates is still less than C, zero motion vectors associated
with reference indices from zero to the number of reference
pictures minus one are used to fill any remaining entries in
the merge candidate list.
当当前片为P片或合并候选者的数量仍小于C时，
参考索引从0到索引数减一的相应的零运动矢量被用于填充合并候选者列表；

In HEVC, the skip mode is treated as a special case of the
merge mode when all coded block flags are equal to zero.
In this specific case, only a skip flag and the corresponding
merge index are transmitted to the decoder. The B-direct mode
of H.264/MPEG-4 AVC is also replaced by the merge mode,
since the merge mode allows all motion information to be
derived from the spatial and temporal motion information of
the neighboring blocks with residual coding.
在HEVC中，当所有的编码块标志等于零时，skip模式被处理成特别的合并模式，；
在这种特殊情况中，只有skip标志和对应的合并索引传输给解码器；
H.264/MPEG-4 AVC 中B-direct模式也被合并模式代替，
因为合并模式允许所有的运动信息从相邻块空域和时域运动信息中推导得到；

4) 非合并模式的运动矢量预测
Motion Vector Prediction for Nonmerge Mode:

When an interpicture-predicted CB is not coded in the skip or
merge modes, the motion vector is differentially coded using
a motion vector predictor. Similar to the merge mode, HEVC
allows the encoder to choose the motion vector predictor
among multiple predictor candidates. The difference between
the predictor and the actual motion vector and the index of
the candidate are transmitted to the decoder.
当帧间预测CB不能被编码成skip或合并模式时，MV就使用运动矢量预测值与MV的差来编码；
和合并模式类似，HEVC允许编码器在多个预测值候选中选择MV预测值；
预测值和实际MV间的差值以及候选者的索引被一起传输给解码器；

Only two spatial motion candidates are chosen according
to the availability among five candidates in Fig. 8. The first
spatial motion candidate is chosen from the set of left positions
{a0, a1} and the second one from the set of above positions
{b0, b1, b2} according to their availabilities, while keeping the
searching order as indicated in the two sets.
在图8中，依据其有效性，在五个候选中间只有两个空域运动候选者；
依据它们的有效性，
第一个空域运动候选者从左位置{a0, a1}集中产生；
第二个空域运动候选者从上位置{b0, b1, b2}集中产生；
并且是以这个顺序来搜索；

HEVC only allows a much lower number of candidates to be
used in the motion vector prediction process for the nonmerge
case, since the encoder can send a coded difference to change
the motion vector. Furthermore, the encoder needs to perform
motion estimation, which is one of the most computationally
expensive operations in the encoder, and complexity is reduced
by allowing a small number of candidates.
在非合并模式情况下，HEVC允许用于MV预测处理的候选者个数更少，
因为编码端可以通过发送编码后的差值来进行运动估计；
而且，因为编码端要执行需要大量计算资源的运动估计，
更少的候选者能降低计算复杂度；

When the reference index of the neighboring PU is not
equal to that of the current PU, a scaled version of the
motion vector is used. The neighboring motion vector is
scaled according to the temporal distances between the current
picture and the reference pictures indicated by the reference
indices of the neighboring PU and the current PU,
respectively. When two spatial candidates have the same
motion vector components, one redundant spatial candidate is
excluded.
如果相邻PU的参考索引不等于当前的PU时，需要对MV进行缩放；
依据当前图像和由相邻PU的参考索引指示的参考图像时域距离，
对相邻MV进行缩放；
当两个空域候选者有相同的MV分量时，需要移除一个冗余的空域候选者；

When the number of motion vector predictors is not equal to
two and the use of temporal MV prediction is not explicitly
disabled, the temporal MV prediction candidate is included.
This means that the temporal candidate is not used at all when
two spatial candidates are available. Finally, a zero motion
vector is included repeatedly until the number of motion vector
prediction candidates is equal to two, which guarantees that the
number of motion vector predictors is two. Thus, only a coded
flag is necessary to identify which motion vector prediction is
used in the case of nonmerge mode.
当MV预测值个数不等于2且时域MV预测被有被显示关闭时，
就可以使用时域MV预测候选者；
换句话说，意思是当两个空域候选者都有效时，就不能用时域候选者；
最后，零运动矢量可以重复使用直到MV的预测候选者为2，
这样就保证了MV预测值的个数为2；
因此，在MV预测在非合并模式中只需要用一个标志来标识；

六三，含章可贞，或从王事，无成有终。
【白话】六三，胸怀才华而不显露，如果辅佐君主，能克尽职守，功成不居。

《象》曰：“含章可贞”，以时发也；“或从王事”，知光大也。
【白话】《象辞》说：“胸怀才华而不显露”，是要把握时机才发挥，
“如果辅佐君主”，必能大显身手，一展抱负。