H. 帧间预测
Interpicture Prediction

1) 预测块(PB)的划分
PB Partitioning: 

Compared to intrapicture-predicted
CBs, HEVC supports more PB partition shapes for
interpicture-predicted CBs. The partitioning modes of
PART_2N×2N, PART_2N×N, and PART_N×2N indicate
the cases when the CB is not split, split into two equal-size
PBs horizontally, and split into two equal-size PBs vertically,
respectively. PART_N×N specifies that the CB is split into
four equal-size PBs, but this mode is only supported when the
CB size is equal to the smallest allowed CB size. In addition,
there are four partitioning types that support splitting the
CB into two PBs having different sizes: PART_2N×nU,
PART_2N×nD, PART_nL×2N, and PART_nR×2N. These
types are known as asymmetric motion partitions.
相对于帧内预测CB, HEVC对帧间预测CB提供了更多的PB划分形状;
PART_2N×2N,   CB不划分;
PART_2N×N,    CB水平划分成两个相等尺寸的PB;
PART_N×2N,    CB垂直划分成两个相等尺寸的PB;
PART_N×N,     CB划分成四个相等尺寸的PB,


Fig. 7. Integer and fractional sample positions for luma interpolation.

2) 分像素插值
Fractional Sample Interpolation: 

The samples of the PB for an intrapicture-predicted CB are obtained from those of
corresponding block region in the reference picture identified
by a reference picture index, which is at a position displaced
by the horizontal and vertical components of the motion vector.

Except for the case when the motion vector has an integer
value, fractional sample interpolation is used to generate the
prediction samples for noninteger sampling positions. As in
H.264/MPEG-4 AVC, HEVC supports motion vectors with
units of one quarter of the distance between luma samples.
和H.264/MPEG-4 AVC一样,HEVC也支持四分之一亮度像素的MV;

For chroma samples, the motion vector accuracy is determined
according to the chroma sampling format, which for 4:2:0
sampling results in units of one eighth of the distance between
chroma samples.

The fractional sample interpolation for luma samples in
HEVC uses separable application of an eight-tap filter for the
half-sample positions and a seven-tap filter for the quartersample
positions. This is in contrast to the process used in
H.264/MPEG-4 AVC, which applies a two-stage interpolation
process by first generating the values of one or two
neighboring samples at half-sample positions using six-tap
filtering, rounding the intermediate results, and then averaging
two values at integer or half-sample positions. HEVC instead
uses a single consistent separable interpolation process for
generating all fractional positions without intermediate rounding
operations, which improves precision and simplifies the
architecture of the fractional sample interpolation. The interpolation
precision is also improved in HEVC by using longer
filters, i.e., seven-tap or eight-tap filtering rather than the sixtap
filtering used in H.264/MPEG-4 AVC. Using only seven
taps rather than the eight used for half-sample positions was
sufficient for the quarter-sample interpolation positions since
the quarter-sample positions are relatively close to integer sample
positions, so the most distant sample in an eight-tap
interpolator would effectively be farther away than in the half sample
case (where the relative distances of the integer-sample
positions are symmetric). The actual filter tap values of the
interpolation filtering kernel were partially derived from DCT
basis function equations.
这一点和H.264/MPEG-4 AVC是不一样的;
H.264/MPEG-4 AVC是用的两步插值处理:
先使用六阶滤波器,舍入均值 ,在半像素位置生成一个或两个相邻像素的值;
而不是像在H.264/MPEG-4 AVC中用的六阶滤波器;

In Fig. 7, the positions labeled with upper-case letters,
Ai,j , represent the available luma samples at integer sample
locations, whereas the other positions labeled with lower-case
letters represent samples at noninteger sample locations, which
need to be generated by interpolation.
在图7中,标记为大写字母的位置,Ai,j, 表示在整像素位置有效的亮度像素;

The samples labeled a0,j, b0,j, c0,j, d0,0, h0,0, and n0,0
are derived from the samples Ai,j by applying the eight-tap
filter for half-sample positions and the seven-tap filter for the
quarter-sample positions as follows:
a0,j, b0,j, c0,j, d0,0, h0,0, and n0,0像素都是对Ai,j像素,

a0,j = (i=_3..3 Ai,j qfilter[i]) >> (B _ 8)
b0,j = (i=_3..4 Ai,j hfilter[i]) >> (B _ 8)
c0,j = (i=_2..4 Ai,j qfilter[1 _ i]) >> (B _ 8)
d0,0 = (i=_3..3 A0,j qfilter[j]) >> (B _ 8)
h0,0 = (i=_3..4 A0,j hfilter[j]) >> (B _ 8)
n0,0 = (j=_2..4 A0,j qfilter[1 _ j]) >> (B _ 8)

where the constant B ≥ 8 is the bit depth of the reference
samples (and typically B = 8 for most applications) and the
filter coefficient values are given in Table II. In these formulas,
>> denotes an arithmetic right shift operation.

Filter Coefficients for Luma Fractional Sample Interpolation

The samples labeled e0,0, f0,0, g0,0, i0,0, j0,0, k0,0, p0,0, q0,0,
and r0,0 can be derived by applying the corresponding filters
to samples located at vertically adjacent a0,j, b0,j and c0,j
positions as follows:
像素e0,0, f0,0, g0,0, i0,0, j0,0, k0,0, p0,0, q0,0,and r0,0的值是
对垂直相邻的像素位置a0,j, b0,j and c0,j使用如下等式得到的:

e0,0 = (v=_3..3 a0,v qfilter[v]) >> 6
f0,0 = (v=_3..3 b0,v qfilter[v]) >> 6
g0,0 = (v=_3..3 c0,v qfilter[v]) >> 6
i0,0 = (v=_3..4 a0,v hfilter[v]) >> 6
j0,0 = (v=_3..4 b0,v hfilter[v]) >> 6
k0,0 = (v=_3..4 c0,v hfilter[v]) >> 6
p0,0 = (v=_2..4 a0,v qfilter[1 _ v]) >> 6
q0,0 = (v=_2..4 b0,v qfilter[1 _ v]) >> 6
r0,0 = (v=_2..4 c0,v qfilter[1 _ v]) >> 6.

The interpolation filtering is separable when B is equal to
8, so the same values could be computed in this case by
applying the vertical filtering before the horizontal filtering.
When implemented appropriately, the motion compensation
process of HEVC can be performed using only 16-b storage
elements (although care must be taken to do this correctly).

It is at this point in the process that weighted prediction
is applied when selected by the encoder. Whereas
H.264/MPEG-4 AVC supported both temporally implicit and
explicit weighted prediction, in HEVC only explicit weighted
prediction is applied, by scaling and offsetting the prediction
with values sent explicitly by the encoder. The bit depth of
the prediction is then adjusted to the original bit depth of
the reference samples. In the case of uniprediction, the interpolated
(and possibly weighted) prediction value is rounded,
right-shifted, and clipped to have the original bit depth. In the
case of biprediction, the interpolated (and possibly weighted)
prediction values from two PBs are added first, and then
rounded, right-shifted, and clipped.
H.264/MPEG-4 AVC支持隐示和显示的权值预测;

In H.264/MPEG-4 AVC, up to three stages of rounding
operations are required to obtain each prediction sample (for
samples located at quarter-sample positions). If biprediction is
used, the total number of rounding operations is then seven
in the worst case. In HEVC, at most two rounding operations
are needed to obtain each sample located at the quarter-sample
positions, thus five rounding operations are sufficient in the
worst case when biprediction is used. Moreover, in the most
common usage, where the bit depth B is 8 b, the total number
of rounding operations in the worst case is further reduced
to 3. Due to the lower number of rounding operations, the
accumulated rounding error is decreased and greater flexibility
is enabled in regard to the manner of performing the necessary
operations in the decoder.
在H.264/MPEG-4 AVC中,需要对第个预测像素(位于四分之一像素位置的像素)

The fractional sample interpolation process for the chroma
components is similar to the one for the luma component,
except that the number of filter taps is 4 and the fractional
accuracy is 1/8 for the usual 4:2:0 chroma format case. HEVC
defines a set of four-tap filters for eighth-sample positions, as
given in Table III for the case of 4:2:0 chroma format (where,
in H.264/MPEG-4 AVC, only two-tap bilinear filtering was applied).

Filter Coefficients for Chroma Fractional   Sample Interpolation

Filter coefficient values denoted as filter1[i], filter2[i], filter3[
i], and filter4[i] with i = _1,. . . , 2 are used for interpolating
the 1/8th, 2/8th, 3/8th, and 4/8th fractional positions
for the chroma samples, respectively. Using symmetry for the
5/8th, 6/8th, and 7/8th fractional positions, the mirrored values
of filter3[1_i], filter2[1_i], and filter1[1_i] with i = _1, . . . ,
2 are used, respectively.
标记为filter1[i], filter2[i], filter3[i], and filter4[i] with i = _1,. . . , 2
是滤波系数值用于1/8th, 2/8th, 3/8th, and 4/8th分像素位置的插值;
对于非对称的5/8th, 6/8th, and 7/8th分像素位置,
则使用filter3[1_i], filter2[1_i], and filter1[1_i] with i = _1, . . . ,2的镜像值; 

3) 合并模式
Merge Mode: 

Motion information typically consists of
the horizontal and vertical motion vector displacement values,
one or two reference picture indices, and, in the case of prediction
regions in B slices, an identification of which reference
picture list is associated with each index. HEVC includes a
merge mode to derive the motion information from spatially
or temporally neighboring blocks. It is denoted as merge mode
since it forms a merged region sharing all motion information.

The merge mode is conceptually similar to the direct and
skip modes in H.264/MPEG-4 AVC. However, there are two
important differences. First, it transmits index information to
select one out of several available candidates, in a manner
sometimes referred to as a motion vector competition scheme.
It also explicitly identifies the reference picture list and reference
picture index, whereas the direct mode assumes that
these have some predefined values.
合并模式在概念上和H.264/MPEG-4 AVC中的direct和skip模式相似;

Fig. 8. Positions of spatial candidates of motion information.

The set of possible candidates in the merge mode consists
of spatial neighbor candidates, a temporal candidate, and
generated candidates. Fig. 8 shows the positions of five spatial
candidates. For each candidate position, the availability is
checked according to the order {a1, b1, b0, a0, b2}. If the
block located at the position is intrapicture predicted or the
position is outside of the current slice or tile, it is considered
as unavailable.
对于每个候选者的位置,依据{a1, b1, b0, a0, b2}这个顺序来检查有效性;

After validating the spatial candidates, two kinds of redundancy
are removed. If the candidate position for the current
PU would refer to the first PU within the same CU, the
position is excluded, as the same merge could be achieved by
a CU without splitting into prediction partitions. Furthermore,
any redundant entries where candidates have exactly the same
motion information are also excluded.
对于当前PU, 如果候选者的位置是同一个CU中的第一个PU,这个位置的候选者被排除;

For the temporal candidate, the right bottom position just
outside of the collocated PU of the reference picture is used if
it is available. Otherwise, the center position is used instead.
The way to choose the collocated PU is similar to that of prior
standards, but HEVC allows more flexibility by transmitting
an index to specify which reference picture list is used for the
collocated reference picture.

One issue related to the use of the temporal candidate is
the amount of the memory to store the motion information
of the reference picture. This is addressed by restricting the
granularity for storing the temporal motion candidates to only
the resolution of a 16×16 luma grid, even when smaller
PB structures are used at the corresponding location in the
reference picture. In addition, a PPS-level flag allows the
encoder to disable the use of the temporal candidate, which is
useful for applications with error-prone transmission.

The maximum number of merge candidates C is specified
in the slice header. If the number of merge candidates found
(including the temporal candidate) is larger than C, only the
first C – 1 spatial candidates and the temporal candidate
are retained. Otherwise, if the number of merge candidates
identified is less than C, additional candidates are generated
until the number is equal to C. This simplifies the parsing and
makes it more robust, as the ability to parse the coded data is
not dependent on merge candidate availability.
如果发现合并候选者的数目大于C, 则只有前C-1个空域候选者和时域候选者有效;
否则,如果合并候选者的数目小于C, 需要生成额外的候选者直到数目等于C; 

For B slices, additional merge candidates are generated by
choosing two existing candidates according to a predefined
order for reference picture list 0 and list 1. For example, the
first generated candidate uses the first merge candidate for list
0 and the second merge candidate for list 1. HEVC specifies
a total of 12 predefined pairs of two in the following order in
the already constructed merge candidate list as (0, 1), (1, 0),
(0, 2), (2, 0), (1, 2), (2, 1), (0, 3), (3, 0), (1, 3), (3, 1), (2, 3),
and (3, 2). Among them, up to five candidates can be included
after removing redundant entries.
(0, 1), (1, 0),
(0, 2), (2, 0), 
(1, 2), (2, 1), 
(0, 3), (3, 0), 
(1, 3), (3, 1), 
(2, 3), (3, 2).

When the slice is a P slice or the number of merge
candidates is still less than C, zero motion vectors associated
with reference indices from zero to the number of reference
pictures minus one are used to fill any remaining entries in
the merge candidate list.

In HEVC, the skip mode is treated as a special case of the
merge mode when all coded block flags are equal to zero.
In this specific case, only a skip flag and the corresponding
merge index are transmitted to the decoder. The B-direct mode
of H.264/MPEG-4 AVC is also replaced by the merge mode,
since the merge mode allows all motion information to be
derived from the spatial and temporal motion information of
the neighboring blocks with residual coding.
H.264/MPEG-4 AVC 中B-direct模式也被合并模式代替,

4) 非合并模式的运动矢量预测
Motion Vector Prediction for Nonmerge Mode: 

When an interpicture-predicted CB is not coded in the skip or
merge modes, the motion vector is differentially coded using
a motion vector predictor. Similar to the merge mode, HEVC
allows the encoder to choose the motion vector predictor
among multiple predictor candidates. The difference between
the predictor and the actual motion vector and the index of
the candidate are transmitted to the decoder.

Only two spatial motion candidates are chosen according
to the availability among five candidates in Fig. 8. The first
spatial motion candidate is chosen from the set of left positions
{a0, a1} and the second one from the set of above positions
{b0, b1, b2} according to their availabilities, while keeping the
searching order as indicated in the two sets.
第一个空域运动候选者从左位置{a0, a1}集中产生;
第二个空域运动候选者从上位置{b0, b1, b2}集中产生;

HEVC only allows a much lower number of candidates to be
used in the motion vector prediction process for the nonmerge
case, since the encoder can send a coded difference to change
the motion vector. Furthermore, the encoder needs to perform
motion estimation, which is one of the most computationally
expensive operations in the encoder, and complexity is reduced
by allowing a small number of candidates.
而且, 因为编码端要执行需要大量计算资源的运动估计,

When the reference index of the neighboring PU is not
equal to that of the current PU, a scaled version of the
motion vector is used. The neighboring motion vector is
scaled according to the temporal distances between the current
picture and the reference pictures indicated by the reference
indices of the neighboring PU and the current PU,
respectively. When two spatial candidates have the same
motion vector components, one redundant spatial candidate is

When the number of motion vector predictors is not equal to
two and the use of temporal MV prediction is not explicitly
disabled, the temporal MV prediction candidate is included.
This means that the temporal candidate is not used at all when
two spatial candidates are available. Finally, a zero motion
vector is included repeatedly until the number of motion vector
prediction candidates is equal to two, which guarantees that the
number of motion vector predictors is two. Thus, only a coded
flag is necessary to identify which motion vector prediction is
used in the case of nonmerge mode.



