Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/QuPengfei/Technical-Overview-Of-AV1-Spec

This will give a snapshot of coding tools in the AV1 Spec.
https://github.com/QuPengfei/Technical-Overview-Of-AV1-Spec

Last synced: about 9 hours ago
JSON representation

This will give a snapshot of coding tools in the AV1 Spec.

Awesome Lists containing this project

README

        

Technical Overview of AV1 Spec
====================================

Abstract
=========

AV1 (AOMedia Video Codec 1.0) evolved on the basis of VP9 (Google), Thor (Cisco)
and Daala (Mozila) under the AOM (Alliance for Open Media). It includes a number
of enhancement and the new tools that have been added to improve the coding
efficiency. The new tools that are added so far include 4 main aspects:
prediction, transform, in-loop filter and entropy encoder. This document
provides a snapshot of the coding tools in the current finalized version (on
March, 2018) of AV1 spec.

Introduction
============

According to the AOM web page, AV1 is designed with the following feature.

- Royally free

- Scales to any modern device at any bandwidth

- For use in both commercial and non-commercial content, including
user-generated content

- Developed for the internet and related applications and services-from
browsers and streaming to videoconferencing services

- Designed with a low computational footprint and optimized for hardware

- Bringing features like 4k UHD, HDR, and WCG to real-time video

Profile & Levels
================

Profiles and levels specify restrictions on the capabilities needed to decode
the bitstreams. The profile specifies the bit depth and subsampling formats
supported, while the level defines resolution and performance characteristics.
By now levels is still under discussion and there is no more details.

AV1 support the three named profiles as the table list.

| Profile | Bit depth | Monochrome support | Chroma subsampling | Name |
|---------|-----------|--------------------|---------------------|--------------|
| 0 | 8/10 | Yes | 4:2:0 | Main |
| 1 | 8/10 | No | 4:4:4 | High |
| 2 | 8/10 | Yes | 4:2:2 | Professional |
| 2 | 12 | Yes | 4:2:0, 4:2:2, 4:4:4 | Professional |

Table 1. AV1 Profile

Block Structure
===============

Basic Coding block
------------------

AV1 support the larger super block size, which is up to 128x128 super block is
allowed. It supports from 128x128 down to 4x4 coding block. Each 4x4 luma block
is allowed to independently select inter or intra mode, its reference mode, and
interpolation filter type. For Chroma, 2x2 block size is allowed but still 4x4
transform block size is used.

Basic Prediction Block
----------------------

AV1 support up to 10 partition type. The size of partition unit is allowed down
to 4x4 and totally there are 24 types of block size.

| Partition index | Type of partition |
|-----------------|-------------------|
| 0 | PARTITION_NONE |
| 1 | PARTITION_HORZ |
| 2 | PARTITION_VERT |
| 3 | PARTITION_SPLIT |
| 4 | PARTITION_HORZ_A |
| 5 | PARTITION_HORZ_B |
| 6 | PARTITION_VERT_A |
| 7 | PARTITION_VERT_B |
| 8 | PARTITION_HORZ_4 |
| 9 | PARTITION_VERT_4 |

Table 2. Type of Block partition

| Index | Partition Block size | Index | Partition Block size |
|-------|----------------------|-------|----------------------|
| 0 | BLOCK_4X4 | 12 | BLOCK_64X64 |
| 1 | BLOCK_4X8 | 13 | BLOCK_64X128 |
| 2 | BLOCK_8X4 | 14 | BLOCK_128X64 |
| 3 | BLOCK_8X8 | 15 | BLOCK_128X128 |
| 4 | BLOCK_8X16 | 16 | BLOCK_4X16 |
| 5 | BLOCK_16X8 | 17 | BLOCK_16X4 |
| 6 | BLOCK_16X16 | 18 | BLOCK_8X32 |
| 7 | BLOCK_16X32 | 19 | BLOCK_32X8 |
| 8 | BLOCK_32X16 | 20 | BLOCK_16X64 |
| 9 | BLOCK_32X32 | 21 | BLOCK_64X16 |
| 10 | BLOCK_32X64 | 22 | BLOCK_32X128 |
| 11 | BLOCK_64X32 | 23 | BLOCK_128X32 |

Table 3. Size of Block Partition

Basic Transform Block
---------------------

Both square and rectangle transform block size is supported in AV1. There are
total 19 transform block size.

| Index | TxSize | Index | TxSize |
|-------|----------|-------|----------|
| 0 | TX_4X4 | 10 | TX_32X16 |
| 1 | TX_8X8 | 11 | TX_32X64 |
| 2 | TX_16X16 | 12 | TX_64X32 |
| 3 | TX_32X32 | 13 | TX_4X16 |
| 4 | TX_64X64 | 14 | TX_16X4 |
| 5 | TX_4X8 | 15 | TX_8X32 |
| 6 | TX_8X4 | 16 | TX_32X8 |
| 7 | TX_8X16 | 17 | TX_16X64 |
| 8 | TX_16X8 | 18 | TX_64X16 |
| 9 | TX_16X32 | | |

> Table 4. Size of Transform Block

Intra Prediction
================

Intra Prediction in AV1 expends largely compared to VP9. Here is snapshot of
Intra Mode.

| Index | Intra mode | AV1 | VP9 | Comments |
|-------|---------------------|-----|-----|-----------------------------------------------|
| 0 | DC_PRED | X | X | |
| 1 | V_PRED | X | X | AV1 support 7 kind of mode based on this mode |
| 2 | H_PRED | X | X | AV1 support 7 kind of mode based on this mode |
| 3 | D45_PRED | X | X | AV1 support 7 kind of mode based on this mode |
| 4 | D135_PRED | X | X | AV1 support 7 kind of mode based on this mode |
| 5 | D113_PRED | X | X | AV1 support 7 kind of mode based on this mode |
| 6 | D157_PRED | X | X | AV1 support 7 kind of mode based on this mode |
| 7 | D203_PRED | X | X | AV1 support 7 kind of mode based on this mode |
| 8 | D67_PRED | X | X | AV1 support 7 kind of mode based on this mode |
| 9 | SMOOTH_PRED | X | | |
| 10 | SMOOTH_V_PRED | X | | |
| 11 | SMOOTH_H_PRED | X | | |
| 12 | TM_PRED(PAETH_PRED) | X | X | AV1 replace TM_PRED with PAETH_PRED |
| 13 | Palette Mode | X | | |

Table 5. Summary of Intra Mode between AV1 and VP9

Directional Intra Prediction Mode
---------------------------------

VP9 only supports 8 directional intra prediction modes: D45_PRED, D63_PRED,
H_PRED, D117_PRED, D135_PRED, D153_PRED, V_PRED, D207_PRED. These modes
correspond to prediction angles of 45, 63, 90, 117, 135, 153, 180, and 207
degrees, respectively.

To improve intra coding efficiency, more prediction angle options are added to
AV1. The prediction angle is calculated as the following:

Prediction angle = nominal_angle + (angle_delta \* angle_step),

| nominal_angle | angle_step | angle_delta | Total number of angles |
|-------------------------------------|------------|-------------|------------------------|
| 45, 63, 90, 117, 135, 153, 180, 207 | 3 | [-3, +3] | 8\*7=56 |

Table 6. Finer of Intra Mode

- norminal_angle is determined by the prediction mode, and is the same as VP9;

- angle_delta is in a predefined range and angle_step is a predefined value.
In current configuration, angle_delta is in the range of [-3, +3] and
angle_step is 3. These settings are selected experimentally.

- The total number of supported prediction angles is therefore increased from
8 to 8 \* 7 = 56.

Smooth Mode
-----------

It is a Non- Directional Intra Prediction mode. VP9 has 2 non-directional intra
prediction modes: DC_PRED and TM_PRED. AV1 expands on this by adding 3 new
smooth prediction modes: SMOOTH_PRED, SMOOTH_V_PRED and SMOOTH_H_PRED. The new
modes work as follows:

|Mode|Comments|
|-|-|
| SMOOTH_PRED | Useful for predicting blocks that have a smooth gradient. It works as follows: estimate the pixels on the rightmost column with the value of the last pixel in top row, and estimate the pixels in the last row of the current block using the last pixel in left column. Then calculate the rest of the pixels by an average of quadratic interpolation in vertical and horizontal directions, based on distance of the pixel from the predicted pixels. |
| SMOOTH_V_PRED | Similar to SMOOTH_PRED, but uses quadratic interpolation only in the vertical direction |
| SMOOTH_H_PRED | Similar to SMOOTH_PRED, but uses quadratic interpolation only in the horizontal direction |

Table 7. Smooth mode of Intra mode

Paeth Mode
----------

It is a Non- Directional Intra Prediction mode. The new prediction mode
PAETH_PRED replaces the existing mode TM_PRED.

TM_PRED: Predictor(TM) = left + top – top_left

PAETH_PRED: Predictor (PAETH) = argmin \|x- Predictor(TM)\|

The idea is to find out the One of left, top, top_left closest in value to
Predictor(TM).

Palette Mode
------------

Sometimes, given intra block can be approximated by a block with small number of
unique colors. This is especially true for artificial videos like
screen-capture, games etc. For such cases, AV1 introduces a new intra coding
mode called palette mode. This predictor for a block is signaled by storing (i)
a color palette, with 2 to 8 colors, and (ii) color indices into the palette for
all pixels in the block. The residual pixel values of the block are as usual
transformed and quantized before being entropy-coded.

Palette mode can be used by both intra-only as well as inter frames. The number
of base colors determines the trade-off between fidelity and compactness. The
color indices for pixels are obtained by the nearest neighbor method. The color
indices are encoded using the neighborhood-based context to be as compact as
possible.

Palette Mode is not new. We can see the Palette Mode and Intra block copy in the
HEVC SCC (Screen Content Coding) extension.

Filter Intra mode
-----------------

AV1 adopt the new mode to interpolate (intra filter) the reference samples
before prediction. This will reduce the impact of quantization noise. Here is
the table to specify the type of intra filtering.

| Index | Filter intra type |
|-------|-------------------|
| 0 | INTRA_DC_PRED |
| 1 | INTRA_V_PRED |
| 2 | INTRA_H_PRED |
| 3 | INTRA_D153_PRED |
| 4 | INTRA_TM_PRED |

Table 8 Type of Intra filter Mode

Intra Block Copy Mode
---------------------

This tool is very efficient for coding of screen content video in that repeated
patterns in text and graphics rich content occur frequently within the same
picture. Having a previously reconstructed block with equal or similar pattern
as a predictor can effectively reduce the prediction error and therefore improve
coding efficiency.

In AV1, Intra block copy is only allowed in intra frames. It disables all loop
filtering and only integer offsets are allowed in block copy mode.

Predict Chroma from Luma
------------------------

Chroma from luma (CfL) prediction is a new and promising chroma-only intra
predictor that models chroma pixels as a linear function of the coincident
reconstructed luma pixels.

Inter Prediction
================

Affine/Warped Motion Compensation
---------------------------------

Traditional modern codecs, including VP9, use block motion compensation where
motion vectors are translational only. This is not sufficient for real video
which often contains complex motion. For example, motion due to camera shake,
panning and zoom might require transformations that support shearing, scaling,
rotation and changes in aspect ratio. In AV1, we introduce warped motion
compensation implemented as similarity and affine transformations to better
capture the diversity of motion that exists in real video. There are two
affine/warped motion compensation.

| Affine Motion Compensation | Comments |
|----------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Global | It is common for videos to contain a global camera motion which is pertinent to an entire inter frame. It is therefore beneficial to transmit a set of motion parameters at the frame level that is applicable to a large number of blocks in the frame. When a frame is encoded, a set of global motion parameters is computed and transmitted between that frame and each reference frame. These parameters may be either translational, similarity or affine motion model. Subsequently, any block in the frame can signal use of the global motion mode with a given reference to create a suitable predictor. |
| Local | Affine motion compensation is also useful to describe complex local object motion. Here, we estimate affine parameters for a single block using the translational motion vectors that are typically conveyed for all inter blocks. Specifically, we estimate an affine or similarity model using the motion vectors from the current block and its causal neighbors which share the same reference frame. |

Table 9 Affine Motion Compensation

OBMC (Overlapped Block Motion Compensation)
-------------------------------------------

Motions assigned to surrounding blocks will contribute to predicting a current
block, via a well-defined overlapping scheme appropriately designed for advanced
variable block-size partitioning frameworks.

The OBMC will blend multiple predictors from neighbor blocks. It is not new
concept and was proposed and implemented back in the era of h.263. The OBMC was
proved to largely reduce prediction errors but not adopted by recent codecs due
to extra complexity in the scenario of hybrid inter/intra variable block size
coding. In AV1, a practical overlapping mechanism based on two-stage 1-D
filtering is proposed for the advanced partitioning framework to implement
causal overlapped block prediction.

Sub-pixel Interpolation Filter
------------------------------

The motion vector used in modern video codecs is allowed to have a fractional
position for a better prediction quality. So, an interpolation filter module is
needed to generate the prediction block at a fractional position in the
reference frame. VP9 codec uses a separable interpolation filter to perform
inter prediction with ⅛ motion vector precision. Three filter types, SHARP,
REGULAR and SMOOTH, in descending order of cutoff frequencies, are provided to
deal with various types of noise/distortions that can occur in reference
frames/blocks. Given a filter type and a motion vector, the interpolation filter
is performed by two one-dimensional filters, one for horizontal direction and
one for vertical direction.

In AV1 codec, dual interpolation filter is introduced on top of the
interpolation module inherited from VP9. Dual filter allows each block/frame to
use a different interpolation filter type in horizontal and vertical direction.
Up to 9 types of filter will be applied to the block.

This idea is based on the observation that a reference frame/block’s horizontal
and vertical signals may have distinct frequency characteristics; therefore,
using different filter types may produce a better prediction. As before, both
the filter types are transmitted in the bitstream on a per block or per frame
basis.

At the same time AV1 use the high intermediate precision between the horizontal
and vertical filter. The same high precision before average the predictors with
compound mode.

Dynamic MV reference
--------------------

VP9 has two candidates MV in the ref list and 4 type of mode (NEARESTMV, NEARMV,
NEWMV, and ZEROMV) are used. AV1 support 4 candidate MV and more modes.

For single ref mode, AV1 is same as VP9.

For compound mode, VP9 restricts motion vectors for a compound predictor to
share one motion vector referencing mode, even though they may use different
reference frames. To add more flexibility, on top of existing four combinations
(NEAREST_NEARESTMV, NEAR_NEARMV, NEW_NEWMV, ZERO_ZEROMV) in VP9, AV1 supports
four more empirically selected combinations: NEAREST_NEWMV, NEW_NEARESTMV,
NEAR_NEWMV, and NEW_NEARMV.

| Index | Type | Ref Mode |
|-------|------------------------------|-----------------|
| 0 | NEARESTMV | single ref mode |
| 1 | NEARMV | single ref mode |
| 2 | GLOBALMV(ZEROMV) | single ref mode |
| 3 | NEWMV | single ref mode |
| 4 | NEAREST_NEARESTMV | compound mode |
| 5 | NEAR_NEARMV | compound mode |
| 6 | NEAREST_NEWMV | compound mode |
| 7 | NEW_NEARESTMV | compound mode |
| 8 | NEAR_NEWMV | compound mode |
| 9 | NEW_NEARMV | compound mode |
| 10 | GLOBAL_GLOBALMV(ZERO_ZEROMV) | compound mode |
| 11 | NEW_NEWMV | compound mode |

Table 10 MV mode

Extended Compound Modes
-----------------------

AV1 Compound mode support both predictors from the same direction and VP9 only
support from the different direction (One forward and one backward reference
frame). VP9 only support 1/2 weight to blend the two predictor and AV1 support
more flexible weight blending.

| Index | Compound type | Comments |
|-------|-------------------|---------------------------------------------------------------------------------------------------------------------------------------|
| 0 | COMPOUND_WEDGE | Inter-Inter Wedge mode Inter-Intra Wedge mode |
| 1 | COMPOUND_SEG | Inter-Inter Compound Segment mode |
| 2 | COMPOUND_AVERAGE | (1/2,1/2) weight will be applied to blend the predictors |
| 3 | COMPOUND_INTRA | Inter-Intra Gradual mode |
| 4 | COMPOUND_DISTANCE | This process computes weights to be used for blending predictions together based on the expected output times of the reference frames |

Table 11. Compound type

Here are more details about the Compound Segment Mode:

- Inter-Inter Compound Segment mode

In many cases, regions in one predictor will contain useful content that is not
present in the other. The two inter predictors have a larger pixel difference
generally.

- Inter-Inter Wedge mode

Boundaries of moving objects in a video often separate two regions with distinct
motions. Coding these regions with separate motion vector reference combinations
should be beneficial; however, finding exact object boundaries is not only
difficult, but expensive to communicate in the bitstream. Our approach is to
design a codebook of masks with only a few possible partitioning combinations
and signaling the codebook index in the bitstream.

The AV1 wedge codebook contains partition orientations that are either
horizontal, vertical or oblique with slopes: 2, -2, 0.5 and -0.5. The wedge
prediction mode is used for all square and rectangular blocks, using the 16-ary
shape codebooks.

| Index | Wedge direction | Comments |
|-------|------------------|----------|
| 0 | WEDGE_HORIZONTAL | |
| 1 | WEDGE_VERTICAL | |
| 2 | WEDGE_OBLIQUE27 | |
| 3 | WEDGE_OBLIQUE63 | |
| 4 | WEDGE_OBLIQUE117 | |
| 5 | WEDGE_OBLIQUE153 | |

Table 12. Wedge direction

- Inter-Intra Gradual mode

Decay the weight gradually for the intra from the prediction boundary and
increase the weight of inter correspondingly. It support four modes, which
include horizontal mode, vertical mode, DC_PRED, and SMOOTH_PRED.

- Inter-Intra Wedge mode

Blocks cannot always perfectly partition moving objects. For example, occlusion
can occur in the middle of a block, it is better to apply different prediction
techniques to different contents. Contents that are not occluded in reference
frame will prefer inter prediction, while newly revealed content could benefit
more from intra prediction using local reference.

Extended Reference frame Number
-------------------------------

Up to 7 reference frames out of 8 in the frame stored buffer are extended to be
used in the inter mode. The reference frames is allowed to come from the same
side or different side in the AV1.

LAST3_FRAME, LAST2_FRAME and LAST_FRAME are forward references and LAST_FRAME is
the near past frame. BWDREF_FRAME is a backward reference, similar to
ALTREF_FRAME.

Here is the table to show the reference frame type.

| Index | Ref frame Name |
|-------|----------------|
| 0 | INTRA_FRAME |
| 1 | LAST_FRAME |
| 2 | LAST2_FRAME |
| 3 | LAST3_FRAME |
| 4 | GOLDEN_FRAME |
| 5 | BWDREF_FRAME |
| 6 | ALTREF2_FRAME |
| 7 | ALTREF_FRAME |

Table 13 Reference frame type

In-loop Filter
==============

Several in-loop tools in AV1 are employed. De-blocking, CDEF and loop
restoration are cascaded.

De-blocking filter
------------------

AV1 support 4 filter levels per frame and VP9 only has one. Two levels are for
Luma component (horizontal and vertical levels). The other two levels are for U
and V component separately. In AV1, filter level is allowed to change superblock
by superblock.

CDEF (Constrained Directional Enhancement Filter)
-------------------------------------------------

CDEF is the combination of CLPF (Constrained Low Pass Filter) and Deringing
filter. The main goal of the in-loop CEDF is to filter the coding artifacts and
ringing while preserving the detail of image. It takes into account the
direction of edge and patterns in the image. It is the similar to the SAO of
HEVC.

The CDEF is based on the following observation. The amount of ringing artifacts
in a coded image tends to be roughly proportional to the quantization step size.
The amount of detail is a property of the input image, but the smallest detail
actually retained in the quantized image tends to also be proportional to the
quantization step size. For a given quantization step size, the amplitude of the
ringing is generally less than the amplitude of the details.

CDEF works as the following steps:

- The frame is divided into filter blocks of 64x64 pixels. Some CDEF
parameters are signaled at the frame level, and some may be signaled at the
filter block level.

- To identify the direction of edge or pattern in each filter block.

- To adaptively filter along the identified direction and to a lesser degree
along directions rotated 45 degrees from the identified direction. The
filter strengths are signaled explicitly, which allows a high degree of
control over the blurring.

The main reason for identifying the direction is to align the filter taps along
that direction to reduce ringing while preserving the directional edges or
patterns. CDEF defines primary taps and secondary taps filter. The primary taps
follow the direction and the secondary taps form a cross, oriented 45 off the
direction. Both primary and secondary taps filter have 8 types.

LR (In-loop Restoration) filter
-------------------------------

AV1 employ a set of in-loop image restoration tool after de-blocking to
generally de-noise and enhance the quality of the edge. In-loop restoration
scheme have two types of filter to remove blur artifacts due to block
processing. One is Wiener Filter. The other is Dual Self-Guided filter. These
tools are integrated into AV1 with a switchable framework, which trigger the
different tool in the different image region.

Multi-Symbol Entropy Coder
==========================

Multi-symbol adaptive arithmetic coding model is adopted in AV1. Both syntax
element and coefficient are coded with this model.

Most recent video codecs encode information using binary arithmetic coding, such
as CABA or CAVLC in AVC/HEVC, meaning that each symbol can only take two values.
The AV1 entropy encoder come from the Daala range coder and supports up to 16
values per symbol, making it possible to encode fewer symbols. This is
equivalent to coding up to four binary values in parallel and reduces serial
dependencies, allowing hardware implementations to use lower clock rates, and
thus less power.

Transform
=========

Transform type
--------------

For AV1, there is a richer set of transforms for coding Inter and Intra
prediction residues. Inter prediction residues do not have a well-defined
structure as in the Intra case, but using a bank of transforms, each adapted to
a specific type of residue profile within the block, is generally helpful.

In AV1, four types of transform are used mainly in the horizontal and vertical
direction separately. The total 16 different transforms are available.

| Transform type | Comments |
|----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------|
| DCT | Inter and Intra modes continue to make use of DCT. |
| ADST | Asymmetric Discrete Sine Transform |
| Flip ADST | It applies ADST in reverse order |
| IDTX | Identity transform seems to be particularly useful for coding residue with sharp lines and edges. Identity transform is useful for screen content coding |

Table 14 The Main Transform Type in each of direction

For each small coded block (4x4 or 8x8), it is possible to choose one of up to
16 different transforms as follows(Detail in Table):

{DCT, ADST, FlipADST, IDTX} horizontal x {DCT, ADST, FlipADST, IDTX} vertical

As block sizes get larger, some of these transforms begin to act similarly.
Thus, a reduced set of transforms is used for 16x16, 32x32 and 64x64 block
sizes. In the transform selection process for Inter and Intra modes, the encoder
does a search over the entire set of transforms and selects the one that
produces the best rate-distortion cost. Once a transform is selected, a
transform type symbol from the set of types available at that size is used to
indicate the actual transform used in the bitstream.

There are 6 types of transform sets in the AV1 spec, which specify the transform
type of Intra and Inter blocks. The transform sets determine what subset of
transform types can be used, according to the following table.

| Inter or not | Set Number | Transform set |
|--------------|------------|----------------|
| Don’t care | 0 | TX_SET_DCTONLY |
| 0 | 1 | TX_SET_INTRA_1 |
| 0 | 2 | TX_SET_INTRA_2 |
| 1 | 1 | TX_SET_INTER_1 |
| 1 | 2 | TX_SET_INTER_2 |
| 1 | 3 | TX_SET_INTER_3 |

Table 15 Transform Set in the AV1 spec

| Transform type | TX_SET_DCTONLY | TX_SET_INTRA_1 | TX_SET_INTRA_2 | TX_SET_INTER_1 | TX_SET_INTER_2 |
|-------------------|----------------|----------------|----------------|----------------|----------------|
| DCT_DCT | X | X | X | X | X |
| ADST_DCT | | X | X | X | X |
| DCT_ADST | | X | X | X | X |
| ADST_ADST | | X | X | X | X |
| FLIPADST_DCT | | | | X | X |
| DCT_FLIPADST | | | | X | X |
| FLIPADST_FLIPADST | | | | X | X |
| ADST_FLIPADST | | | | X | X |
| FLIPADST_ADST | | | | X | X |
| IDTX | | X | X | X | X |
| V_DCT | | X | | X | X |
| H_DCT | | X | | X | X |
| V_ADST | | | | X | |
| H_ADST | | | | X | |
| V_FLIPADST | | | | X | |
| H_FLIPADST | | | | X | |

Table 16 Detailed Transform type supported in each transform set.

Transform Block Shape and Size
------------------------------

Both square and rectangle shape block are used in AV1. The transform block size
is less than the partition block size. The block size is very flexible and up to
64x64 and down to 4x4. Details see the table in the Block section.

Tiles
=====

AV1 support flexible tiles, which include uniform and non-uniform tile spacing.
Tile area is limited to a maximum 4096x2304. Tiles can be grouped into tile
group and each group can be decoded independently to achieve error resilience.
Loop filter can be enabled or disabled across tiles.

Segment
=======

Same as VP9, AV1 provides a means of segmenting the image and then applying
various adjustments at the segment level. Up to 8 segments may be specified for
any given frame. For each of these segments it is possible to specify:

- A quantizer (absolute value or delta).

- A loop filter strength (absolute value or delta).

- A prediction reference frame.

- A block skip mode that implies both the use of a (0,0) motion vector and that
no residual will be coded.

SVC (Scalable Video Coding)
===========================

AV1 support temporal and spatial layer coding. Temporal layer support up to 8
layers and spatial layer support up to 3 layers.

| Index | Scalability mode | Index | Scalability mode |
|-------|------------------|--------|-------------------|
| 0 | SCALABILITY_L1T2 | 8 | SCALABILITY_L2T2h |
| 1 | SCALABILITY_L1T3 | 9 | SCALABILITY_L2T3h |
| 2 | SCALABILITY_L2T1 | 10 | SCALABILITY_S2T1h |
| 3 | SCALABILITY_L2T2 | 11 | SCALABILITY_S2T2h |
| 4 | SCALABILITY_L2T3 | 12 | SCALABILITY_S2T3h |
| 5 | SCALABILITY_S2T1 | 13 | SCALABILITY_SS |
| 6 | SCALABILITY_S2T2 | 14-255 | reserved |
| 7 | SCALABILITY_S2T3 | | |

Table 17. Temporal and Spatial Mode

| Scalability mode | Spatial Layers | Resolution Ratio | Temporal Layers | Inter-layer-dependency |
|-------------------|----------------|------------------|-----------------|------------------------|
| SCALABILITY_L1T2 | 1 | | 2 | |
| SCALABILITY_L1T3 | 1 | | 3 | |
| SCALABILITY_L2T1 | 2 | 2:1 | 1 | Yes |
| SCALABILITY_L2T2 | 2 | 2:1 | 2 | Yes |
| SCALABILITY_L2T3 | 2 | 2:1 | 3 | Yes |
| SCALABILITY_S2T1 | 2 | 2:1 | 1 | No |
| SCALABILITY_S2T2 | 2 | 2:1 | 2 | No |
| SCALABILITY_S2T3 | 2 | 2:1 | 3 | No |
| SCALABILITY_L2T2h | 2 | 1.5:1 | 2 | Yes |
| SCALABILITY_L2T3h | 2 | 1.5:1 | 3 | Yes |
| SCALABILITY_S2T1h | 2 | 1.5:1 | 1 | No |

Table 17. Details in the Temporal and Spatial Mode

Other Tools
===========

Quantization Matric
-------------------

AV1 support 15 sets of QMs, which are based on the contrast-sensitive functions.
QMs are applied to a frame based on selectable scaling of its quantization
level, higher level of quantization imply flatter matrices. The matrices become
flatter as the quantization index value increases (and the quality decreases).
Inter matrices are slightly flatter than intra matrices.

Superblock Delta-quantization
-----------------------------

AV1 allow the per-superblock changes in quantization parameter to support
sub-frame rate control. At the same time it support the ROI level rate control
on the top of segmentation level parameter.

OBU (Open Bitstream Unit)
-------------------------

An AV1 bitstream consists of a number of OBUs that are normally held within a
container format alongside audio and timing information. Here the new tool OBU
is introduced in AV1 and it is similar to NAL (Network Abstract Layer) in
AVC/HEVC spec.

The OBU header is similar to the NAL header. In general the total 8 bits are
presented. The OBU extra 8 bits of extension header is used if temporal and
spatial layer exist in the bitstream. obu_type is the most important syntax to
describe the type of OBU .

| Index | obu_type |
|-------|----------------------------|
| 0 | Reserved |
| 1 | OBU_SEQUENCE_HEADER |
| 2 | OBU_TD |
| 3 | OBU_FRAME_HEADER |
| 4 | OBU_TILE_GROUP |
| 5 | OBU_METADATA |
| 6 | OBU_FRAME |
| 7 | OBU_REDUNDANT_FRAME_HEADER |
| 8-14 | Reserved |
| 15 | OBU_PADDING |

Table 18. Type of OBU

Reference:
==========

1.