Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/QuPengfei/Technical-Overview-Of-AV1-Spec

This will give a snapshot of coding tools in the AV1 Spec.
https://github.com/QuPengfei/Technical-Overview-Of-AV1-Spec
Last synced: 3 months ago
JSON representation
This will give a snapshot of coding tools in the AV1 Spec.
Host: GitHub
URL: https://github.com/QuPengfei/Technical-Overview-Of-AV1-Spec
Owner: QuPengfei
Created: 2018-04-11T07:30:15.000Z (almost 7 years ago)
Default Branch: master
Last Pushed: 2018-04-11T08:36:47.000Z (almost 7 years ago)
Last Synced: 2024-08-03T09:04:50.328Z (7 months ago)
Size: 16.6 KB
Stars: 70
Watchers: 4
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

        Technical Overview of AV1 Spec

====================================

Abstract 

=========

AV1 (AOMedia Video Codec 1.0) evolved on the basis of VP9 (Google), Thor (Cisco)

and Daala (Mozila) under the AOM (Alliance for Open Media). It includes a number

of enhancement and the new tools that have been added to improve the coding

efficiency. The new tools that are added so far include 4 main aspects:

prediction, transform, in-loop filter and entropy encoder. This document

provides a snapshot of the coding tools in the current finalized version (on

March, 2018) of AV1 spec.

Introduction

============

According to the AOM web page, AV1 is designed with the following feature.

-   Royally free

-   Scales to any modern device at any bandwidth

-   For use in both commercial and non-commercial content, including

    user-generated content

-   Developed for the internet and related applications and services-from

    browsers and streaming to videoconferencing services

-   Designed with a low computational footprint and optimized for hardware

-   Bringing features like 4k UHD, HDR, and WCG to real-time video

Profile & Levels

================

Profiles and levels specify restrictions on the capabilities needed to decode

the bitstreams. The profile specifies the bit depth and subsampling formats

supported, while the level defines resolution and performance characteristics.

By now levels is still under discussion and there is no more details.

AV1 support the three named profiles as the table list.

| Profile | Bit depth | Monochrome support | Chroma subsampling  | Name         |

|---------|-----------|--------------------|---------------------|--------------|

| 0       | 8/10      | Yes                | 4:2:0               | Main         |

| 1       | 8/10      | No                 | 4:4:4               | High         |

| 2       | 8/10      | Yes                | 4:2:2               | Professional |

| 2       | 12        | Yes                | 4:2:0, 4:2:2, 4:4:4 | Professional |

Table 1. AV1 Profile

Block Structure

===============

Basic Coding block

------------------

AV1 support the larger super block size, which is up to 128x128 super block is

allowed. It supports from 128x128 down to 4x4 coding block. Each 4x4 luma block

is allowed to independently select inter or intra mode, its reference mode, and

interpolation filter type. For Chroma, 2x2 block size is allowed but still 4x4

transform block size is used.

Basic Prediction Block

----------------------

AV1 support up to 10 partition type. The size of partition unit is allowed down

to 4x4 and totally there are 24 types of block size.

| Partition index | Type of partition |

|-----------------|-------------------|

| 0               | PARTITION_NONE    |

| 1               | PARTITION_HORZ    |

| 2               | PARTITION_VERT    |

| 3               | PARTITION_SPLIT   |

| 4               | PARTITION_HORZ_A  |

| 5               | PARTITION_HORZ_B  |

| 6               | PARTITION_VERT_A  |

| 7               | PARTITION_VERT_B  |

| 8               | PARTITION_HORZ_4  |

| 9               | PARTITION_VERT_4  |

Table 2. Type of Block partition

| Index | Partition Block size | Index | Partition Block size |

|-------|----------------------|-------|----------------------|

| 0     | BLOCK_4X4            | 12    | BLOCK_64X64          |

| 1     | BLOCK_4X8            | 13    | BLOCK_64X128         |

| 2     | BLOCK_8X4            | 14    | BLOCK_128X64         |

| 3     | BLOCK_8X8            | 15    | BLOCK_128X128        |

| 4     | BLOCK_8X16           | 16    | BLOCK_4X16           |

| 5     | BLOCK_16X8           | 17    | BLOCK_16X4           |

| 6     | BLOCK_16X16          | 18    | BLOCK_8X32           |

| 7     | BLOCK_16X32          | 19    | BLOCK_32X8           |

| 8     | BLOCK_32X16          | 20    | BLOCK_16X64          |

| 9     | BLOCK_32X32          | 21    | BLOCK_64X16          |

| 10    | BLOCK_32X64          | 22    | BLOCK_32X128         |

| 11    | BLOCK_64X32          | 23    | BLOCK_128X32         |

Table 3. Size of Block Partition

Basic Transform Block

---------------------

Both square and rectangle transform block size is supported in AV1. There are

total 19 transform block size.

| Index | TxSize   | Index | TxSize   |

|-------|----------|-------|----------|

| 0     | TX_4X4   | 10    | TX_32X16 |

| 1     | TX_8X8   | 11    | TX_32X64 |

| 2     | TX_16X16 | 12    | TX_64X32 |

| 3     | TX_32X32 | 13    | TX_4X16  |

| 4     | TX_64X64 | 14    | TX_16X4  |

| 5     | TX_4X8   | 15    | TX_8X32  |

| 6     | TX_8X4   | 16    | TX_32X8  |

| 7     | TX_8X16  | 17    | TX_16X64 |

| 8     | TX_16X8  | 18    | TX_64X16 |

| 9     | TX_16X32 |       |          |

>   Table 4. Size of Transform Block

Intra Prediction

================

Intra Prediction in AV1 expends largely compared to VP9. Here is snapshot of

Intra Mode.

| Index | Intra mode          | AV1 | VP9 | Comments                                      |

|-------|---------------------|-----|-----|-----------------------------------------------|

| 0     | DC_PRED             | X   | X   |                                               |

| 1     | V_PRED              | X   | X   | AV1 support 7 kind of mode based on this mode |

| 2     | H_PRED              | X   | X   | AV1 support 7 kind of mode based on this mode |

| 3     | D45_PRED            | X   | X   | AV1 support 7 kind of mode based on this mode |

| 4     | D135_PRED           | X   | X   | AV1 support 7 kind of mode based on this mode |

| 5     | D113_PRED           | X   | X   | AV1 support 7 kind of mode based on this mode |

| 6     | D157_PRED           | X   | X   | AV1 support 7 kind of mode based on this mode |

| 7     | D203_PRED           | X   | X   | AV1 support 7 kind of mode based on this mode |

| 8     | D67_PRED            | X   | X   | AV1 support 7 kind of mode based on this mode |

| 9     | SMOOTH_PRED         | X   |     |                                               |

| 10    | SMOOTH_V_PRED       | X   |     |                                               |

| 11    | SMOOTH_H_PRED       | X   |     |                                               |

| 12    | TM_PRED(PAETH_PRED) | X   | X   | AV1 replace TM_PRED with PAETH_PRED           |

| 13    | Palette Mode        | X   |     |                                               |

Table 5. Summary of Intra Mode between AV1 and VP9

Directional Intra Prediction Mode

---------------------------------

VP9 only supports 8 directional intra prediction modes: D45_PRED, D63_PRED,

H_PRED, D117_PRED, D135_PRED, D153_PRED, V_PRED, D207_PRED. These modes

correspond to prediction angles of 45, 63, 90, 117, 135, 153, 180, and 207

degrees, respectively.

To improve intra coding efficiency, more prediction angle options are added to

AV1. The prediction angle is calculated as the following:

Prediction angle = nominal_angle + (angle_delta \* angle_step),

| nominal_angle                       | angle_step | angle_delta | Total number of angles |

|-------------------------------------|------------|-------------|------------------------|

| 45, 63, 90, 117, 135, 153, 180, 207 | 3          | [-3, +3]    | 8\*7=56                |

Table 6. Finer of Intra Mode

-   norminal_angle is determined by the prediction mode, and is the same as VP9;

-   angle_delta is in a predefined range and angle_step is a predefined value.

    In current configuration, angle_delta is in the range of [-3, +3] and

    angle_step is 3. These settings are selected experimentally.

-   The total number of supported prediction angles is therefore increased from

    8 to 8 \* 7 = 56.

Smooth Mode

-----------

It is a Non- Directional Intra Prediction mode. VP9 has 2 non-directional intra

prediction modes: DC_PRED and TM_PRED. AV1 expands on this by adding 3 new

smooth prediction modes: SMOOTH_PRED, SMOOTH_V_PRED and SMOOTH_H_PRED. The new

modes work as follows:

|Mode|Comments|

|-|-|

| SMOOTH_PRED   | Useful for predicting blocks that have a smooth gradient. It works as follows: estimate the pixels on the rightmost column with the value of the last pixel in top row, and estimate the pixels in the last row of the current block using the last pixel in left column. Then calculate the rest of the pixels by an average of quadratic interpolation in vertical and horizontal directions, based on distance of the pixel from the predicted pixels. |

| SMOOTH_V_PRED | Similar to SMOOTH_PRED, but uses quadratic interpolation only in the vertical direction                                                                                                                                                                                                                                                                                                                                                                   |

| SMOOTH_H_PRED | Similar to SMOOTH_PRED, but uses quadratic interpolation only in the horizontal direction                                                                                                                                                                                                                                                                                                                                                                 |

Table 7. Smooth mode of Intra mode

Paeth Mode

----------

It is a Non- Directional Intra Prediction mode. The new prediction mode

PAETH_PRED replaces the existing mode TM_PRED.

TM_PRED: Predictor(TM) = left + top – top_left

PAETH_PRED: Predictor (PAETH) = argmin \|x- Predictor(TM)\|

The idea is to find out the One of left, top, top_left closest in value to

Predictor(TM).

Palette Mode

------------

Sometimes, given intra block can be approximated by a block with small number of

unique colors. This is especially true for artificial videos like

screen-capture, games etc. For such cases, AV1 introduces a new intra coding

mode called palette mode. This predictor for a block is signaled by storing (i)

a color palette, with 2 to 8 colors, and (ii) color indices into the palette for

all pixels in the block. The residual pixel values of the block are as usual

transformed and quantized before being entropy-coded.

Palette mode can be used by both intra-only as well as inter frames. The number

of base colors determines the trade-off between fidelity and compactness. The

color indices for pixels are obtained by the nearest neighbor method. The color

indices are encoded using the neighborhood-based context to be as compact as

possible.

Palette Mode is not new. We can see the Palette Mode and Intra block copy in the

HEVC SCC (Screen Content Coding) extension.

Filter Intra mode

-----------------

AV1 adopt the new mode to interpolate (intra filter) the reference samples

before prediction. This will reduce the impact of quantization noise. Here is

the table to specify the type of intra filtering.

| Index | Filter intra type |

|-------|-------------------|

| 0     | INTRA_DC_PRED     |

| 1     | INTRA_V_PRED      |

| 2     | INTRA_H_PRED      |

| 3     | INTRA_D153_PRED   |

| 4     | INTRA_TM_PRED     |

Table 8 Type of Intra filter Mode

Intra Block Copy Mode

---------------------

This tool is very efficient for coding of screen content video in that repeated

patterns in text and graphics rich content occur frequently within the same

picture. Having a previously reconstructed block with equal or similar pattern

as a predictor can effectively reduce the prediction error and therefore improve

coding efficiency.

In AV1, Intra block copy is only allowed in intra frames. It disables all loop

filtering and only integer offsets are allowed in block copy mode.

Predict Chroma from Luma

------------------------

Chroma from luma (CfL) prediction is a new and promising chroma-only intra

predictor that models chroma pixels as a linear function of the coincident

reconstructed luma pixels.

Inter Prediction

================

Affine/Warped Motion Compensation

---------------------------------

Traditional modern codecs, including VP9, use block motion compensation where

motion vectors are translational only. This is not sufficient for real video

which often contains complex motion. For example, motion due to camera shake,

panning and zoom might require transformations that support shearing, scaling,

rotation and changes in aspect ratio. In AV1, we introduce warped motion

compensation implemented as similarity and affine transformations to better

capture the diversity of motion that exists in real video. There are two

affine/warped motion compensation.

| Affine Motion Compensation | Comments                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |

|----------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|

| Global                     | It is common for videos to contain a global camera motion which is pertinent to an entire inter frame. It is therefore beneficial to transmit a set of motion parameters at the frame level that is applicable to a large number of blocks in the frame. When a frame is encoded, a set of global motion parameters is computed and transmitted between that frame and each reference frame. These parameters may be either translational, similarity or affine motion model. Subsequently, any block in the frame can signal use of the global motion mode with a given reference to create a suitable predictor. |

| Local                      | Affine motion compensation is also useful to describe complex local object motion. Here, we estimate affine parameters for a single block using the translational motion vectors that are typically conveyed for all inter blocks. Specifically, we estimate an affine or similarity model using the motion vectors from the current block and its causal neighbors which share the same reference frame.                                                                                                                                                                                                          |

Table 9 Affine Motion Compensation

OBMC (Overlapped Block Motion Compensation)

-------------------------------------------

Motions assigned to surrounding blocks will contribute to predicting a current

block, via a well-defined overlapping scheme appropriately designed for advanced

variable block-size partitioning frameworks.

The OBMC will blend multiple predictors from neighbor blocks. It is not new

concept and was proposed and implemented back in the era of h.263. The OBMC was

proved to largely reduce prediction errors but not adopted by recent codecs due

to extra complexity in the scenario of hybrid inter/intra variable block size

coding. In AV1, a practical overlapping mechanism based on two-stage 1-D

filtering is proposed for the advanced partitioning framework to implement

causal overlapped block prediction.

Sub-pixel Interpolation Filter

------------------------------

The motion vector used in modern video codecs is allowed to have a fractional

position for a better prediction quality. So, an interpolation filter module is

needed to generate the prediction block at a fractional position in the

reference frame. VP9 codec uses a separable interpolation filter to perform

inter prediction with ⅛ motion vector precision. Three filter types, SHARP,

REGULAR and SMOOTH, in descending order of cutoff frequencies, are provided to

deal with various types of noise/distortions that can occur in reference

frames/blocks. Given a filter type and a motion vector, the interpolation filter

is performed by two one-dimensional filters, one for horizontal direction and

one for vertical direction.

In AV1 codec, dual interpolation filter is introduced on top of the

interpolation module inherited from VP9. Dual filter allows each block/frame to

use a different interpolation filter type in horizontal and vertical direction.

Up to 9 types of filter will be applied to the block.

This idea is based on the observation that a reference frame/block’s horizontal

and vertical signals may have distinct frequency characteristics; therefore,

using different filter types may produce a better prediction. As before, both

the filter types are transmitted in the bitstream on a per block or per frame

basis.

At the same time AV1 use the high intermediate precision between the horizontal

and vertical filter. The same high precision before average the predictors with

compound mode.

Dynamic MV reference

--------------------

VP9 has two candidates MV in the ref list and 4 type of mode (NEARESTMV, NEARMV,

NEWMV, and ZEROMV) are used. AV1 support 4 candidate MV and more modes.

For single ref mode, AV1 is same as VP9.

For compound mode, VP9 restricts motion vectors for a compound predictor to

share one motion vector referencing mode, even though they may use different

reference frames. To add more flexibility, on top of existing four combinations

(NEAREST_NEARESTMV, NEAR_NEARMV, NEW_NEWMV, ZERO_ZEROMV) in VP9, AV1 supports

four more empirically selected combinations: NEAREST_NEWMV, NEW_NEARESTMV,

NEAR_NEWMV, and NEW_NEARMV.

| Index | Type                         | Ref Mode        |

|-------|------------------------------|-----------------|

| 0     | NEARESTMV                    | single ref mode |

| 1     | NEARMV                       | single ref mode |

| 2     | GLOBALMV(ZEROMV)             | single ref mode |

| 3     | NEWMV                        | single ref mode |

| 4     | NEAREST_NEARESTMV            | compound mode   |

| 5     | NEAR_NEARMV                  | compound mode   |

| 6     | NEAREST_NEWMV                | compound mode   |

| 7     | NEW_NEARESTMV                | compound mode   |

| 8     | NEAR_NEWMV                   | compound mode   |

| 9     | NEW_NEARMV                   | compound mode   |

| 10    | GLOBAL_GLOBALMV(ZERO_ZEROMV) | compound mode   |

| 11    | NEW_NEWMV                    | compound mode   |

Table 10 MV mode

Extended Compound Modes

-----------------------

AV1 Compound mode support both predictors from the same direction and VP9 only

support from the different direction (One forward and one backward reference

frame). VP9 only support 1/2 weight to blend the two predictor and AV1 support

more flexible weight blending.

| Index | Compound type     | Comments                                                                                                                              |

|-------|-------------------|---------------------------------------------------------------------------------------------------------------------------------------|

| 0     | COMPOUND_WEDGE    | Inter-Inter Wedge mode Inter-Intra Wedge mode                                                                                         |

| 1     | COMPOUND_SEG      | Inter-Inter Compound Segment mode                                                                                                     |

| 2     | COMPOUND_AVERAGE  | (1/2,1/2) weight will be applied to blend the predictors                                                                              |

| 3     | COMPOUND_INTRA    | Inter-Intra Gradual mode                                                                                                              |

| 4     | COMPOUND_DISTANCE | This process computes weights to be used for blending predictions together based on the expected output times of the reference frames |

Table 11. Compound type

Here are more details about the Compound Segment Mode:

-   Inter-Inter Compound Segment mode

In many cases, regions in one predictor will contain useful content that is not

present in the other. The two inter predictors have a larger pixel difference

generally.

-   Inter-Inter Wedge mode

Boundaries of moving objects in a video often separate two regions with distinct

motions. Coding these regions with separate motion vector reference combinations

should be beneficial; however, finding exact object boundaries is not only

difficult, but expensive to communicate in the bitstream. Our approach is to

design a codebook of masks with only a few possible partitioning combinations

and signaling the codebook index in the bitstream.

The AV1 wedge codebook contains partition orientations that are either

horizontal, vertical or oblique with slopes: 2, -2, 0.5 and -0.5. The wedge

prediction mode is used for all square and rectangular blocks, using the 16-ary

shape codebooks.

| Index | Wedge direction  | Comments |

|-------|------------------|----------|

| 0     | WEDGE_HORIZONTAL |          |

| 1     | WEDGE_VERTICAL   |          |

| 2     | WEDGE_OBLIQUE27  |          |

| 3     | WEDGE_OBLIQUE63  |          |

| 4     | WEDGE_OBLIQUE117 |          |

| 5     | WEDGE_OBLIQUE153 |          |

Table 12. Wedge direction

-   Inter-Intra Gradual mode

Decay the weight gradually for the intra from the prediction boundary and

increase the weight of inter correspondingly. It support four modes, which

include horizontal mode, vertical mode, DC_PRED, and SMOOTH_PRED.

-   Inter-Intra Wedge mode

Blocks cannot always perfectly partition moving objects. For example, occlusion

can occur in the middle of a block, it is better to apply different prediction

techniques to different contents. Contents that are not occluded in reference

frame will prefer inter prediction, while newly revealed content could benefit

more from intra prediction using local reference.

Extended Reference frame Number

-------------------------------

Up to 7 reference frames out of 8 in the frame stored buffer are extended to be

used in the inter mode. The reference frames is allowed to come from the same

side or different side in the AV1.

LAST3_FRAME, LAST2_FRAME and LAST_FRAME are forward references and LAST_FRAME is

the near past frame. BWDREF_FRAME is a backward reference, similar to

ALTREF_FRAME.

Here is the table to show the reference frame type.

| Index | Ref frame Name |

|-------|----------------|

| 0     | INTRA_FRAME    |

| 1     | LAST_FRAME     |

| 2     | LAST2_FRAME    |

| 3     | LAST3_FRAME    |

| 4     | GOLDEN_FRAME   |

| 5     | BWDREF_FRAME   |

| 6     | ALTREF2_FRAME  |

| 7     | ALTREF_FRAME   |

Table 13 Reference frame type

In-loop Filter

==============

Several in-loop tools in AV1 are employed. De-blocking, CDEF and loop

restoration are cascaded.

De-blocking filter

------------------

AV1 support 4 filter levels per frame and VP9 only has one. Two levels are for

Luma component (horizontal and vertical levels). The other two levels are for U

and V component separately. In AV1, filter level is allowed to change superblock

by superblock.

CDEF (Constrained Directional Enhancement Filter)

-------------------------------------------------

CDEF is the combination of CLPF (Constrained Low Pass Filter) and Deringing

filter. The main goal of the in-loop CEDF is to filter the coding artifacts and

ringing while preserving the detail of image. It takes into account the

direction of edge and patterns in the image. It is the similar to the SAO of

HEVC.

The CDEF is based on the following observation. The amount of ringing artifacts

in a coded image tends to be roughly proportional to the quantization step size.

The amount of detail is a property of the input image, but the smallest detail

actually retained in the quantized image tends to also be proportional to the

quantization step size. For a given quantization step size, the amplitude of the

ringing is generally less than the amplitude of the details.

CDEF works as the following steps:

-   The frame is divided into filter blocks of 64x64 pixels. Some CDEF

    parameters are signaled at the frame level, and some may be signaled at the

    filter block level.

-   To identify the direction of edge or pattern in each filter block.

-   To adaptively filter along the identified direction and to a lesser degree

    along directions rotated 45 degrees from the identified direction. The

    filter strengths are signaled explicitly, which allows a high degree of

    control over the blurring.

The main reason for identifying the direction is to align the filter taps along

that direction to reduce ringing while preserving the directional edges or

patterns. CDEF defines primary taps and secondary taps filter. The primary taps

follow the direction and the secondary taps form a cross, oriented 45 off the

direction. Both primary and secondary taps filter have 8 types.

LR (In-loop Restoration) filter

-------------------------------

AV1 employ a set of in-loop image restoration tool after de-blocking to

generally de-noise and enhance the quality of the edge. In-loop restoration

scheme have two types of filter to remove blur artifacts due to block

processing. One is Wiener Filter. The other is Dual Self-Guided filter. These

tools are integrated into AV1 with a switchable framework, which trigger the

different tool in the different image region.

Multi-Symbol Entropy Coder

==========================

Multi-symbol adaptive arithmetic coding model is adopted in AV1. Both syntax

element and coefficient are coded with this model.

Most recent video codecs encode information using binary arithmetic coding, such

as CABA or CAVLC in AVC/HEVC, meaning that each symbol can only take two values.

The AV1 entropy encoder come from the Daala range coder and supports up to 16

values per symbol, making it possible to encode fewer symbols. This is

equivalent to coding up to four binary values in parallel and reduces serial

dependencies, allowing hardware implementations to use lower clock rates, and

thus less power.

Transform

=========

Transform type

--------------

For AV1, there is a richer set of transforms for coding Inter and Intra

prediction residues. Inter prediction residues do not have a well-defined

structure as in the Intra case, but using a bank of transforms, each adapted to

a specific type of residue profile within the block, is generally helpful.

In AV1, four types of transform are used mainly in the horizontal and vertical

direction separately. The total 16 different transforms are available.

| Transform type | Comments                                                                                                                                                 |

|----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------|

| DCT            | Inter and Intra modes continue to make use of DCT.                                                                                                       |

| ADST           | Asymmetric Discrete Sine Transform                                                                                                                       |

| Flip ADST      | It applies ADST in reverse order                                                                                                                         |

| IDTX           | Identity transform seems to be particularly useful for coding residue with sharp lines and edges. Identity transform is useful for screen content coding |

Table 14 The Main Transform Type in each of direction

For each small coded block (4x4 or 8x8), it is possible to choose one of up to

16 different transforms as follows(Detail in Table):

{DCT, ADST, FlipADST, IDTX} horizontal x {DCT, ADST, FlipADST, IDTX} vertical

As block sizes get larger, some of these transforms begin to act similarly.

Thus, a reduced set of transforms is used for 16x16, 32x32 and 64x64 block

sizes. In the transform selection process for Inter and Intra modes, the encoder

does a search over the entire set of transforms and selects the one that

produces the best rate-distortion cost. Once a transform is selected, a

transform type symbol from the set of types available at that size is used to

indicate the actual transform used in the bitstream.

There are 6 types of transform sets in the AV1 spec, which specify the transform

type of Intra and Inter blocks. The transform sets determine what subset of

transform types can be used, according to the following table.

| Inter or not | Set Number | Transform set  |

|--------------|------------|----------------|

| Don’t care   | 0          | TX_SET_DCTONLY |

| 0            | 1          | TX_SET_INTRA_1 |

| 0            | 2          | TX_SET_INTRA_2 |

| 1            | 1          | TX_SET_INTER_1 |

| 1            | 2          | TX_SET_INTER_2 |

| 1            | 3          | TX_SET_INTER_3 |

Table 15 Transform Set in the AV1 spec

| Transform type    | TX_SET_DCTONLY | TX_SET_INTRA_1 | TX_SET_INTRA_2 | TX_SET_INTER_1 | TX_SET_INTER_2 |

|-------------------|----------------|----------------|----------------|----------------|----------------|

| DCT_DCT           | X              | X              | X              | X              | X              |

| ADST_DCT          |                | X              | X              | X              | X              |

| DCT_ADST          |                | X              | X              | X              | X              |

| ADST_ADST         |                | X              | X              | X              | X              |

| FLIPADST_DCT      |                |                |                | X              | X              |

| DCT_FLIPADST      |                |                |                | X              | X              |

| FLIPADST_FLIPADST |                |                |                | X              | X              |

| ADST_FLIPADST     |                |                |                | X              | X              |

| FLIPADST_ADST     |                |                |                | X              | X              |

| IDTX              |                | X              | X              | X              | X              |

| V_DCT             |                | X              |                | X              | X              |

| H_DCT             |                | X              |                | X              | X              |

| V_ADST            |                |                |                | X              |                |

| H_ADST            |                |                |                | X              |                |

| V_FLIPADST        |                |                |                | X              |                |

| H_FLIPADST        |                |                |                | X              |                |

Table 16 Detailed Transform type supported in each transform set.

Transform Block Shape and Size

------------------------------

Both square and rectangle shape block are used in AV1. The transform block size

is less than the partition block size. The block size is very flexible and up to

64x64 and down to 4x4. Details see the table in the Block section.

Tiles

=====

AV1 support flexible tiles, which include uniform and non-uniform tile spacing.

Tile area is limited to a maximum 4096x2304. Tiles can be grouped into tile

group and each group can be decoded independently to achieve error resilience.

Loop filter can be enabled or disabled across tiles.

Segment

=======

Same as VP9, AV1 provides a means of segmenting the image and then applying

various adjustments at the segment level. Up to 8 segments may be specified for

any given frame. For each of these segments it is possible to specify:

- A quantizer (absolute value or delta).

- A loop filter strength (absolute value or delta).

- A prediction reference frame.

- A block skip mode that implies both the use of a (0,0) motion vector and that

no residual will be coded.

SVC (Scalable Video Coding)

===========================

AV1 support temporal and spatial layer coding. Temporal layer support up to 8

layers and spatial layer support up to 3 layers.

| Index | Scalability mode | Index  | Scalability mode  |

|-------|------------------|--------|-------------------|

| 0     | SCALABILITY_L1T2 | 8      | SCALABILITY_L2T2h |

| 1     | SCALABILITY_L1T3 | 9      | SCALABILITY_L2T3h |

| 2     | SCALABILITY_L2T1 | 10     | SCALABILITY_S2T1h |

| 3     | SCALABILITY_L2T2 | 11     | SCALABILITY_S2T2h |

| 4     | SCALABILITY_L2T3 | 12     | SCALABILITY_S2T3h |

| 5     | SCALABILITY_S2T1 | 13     | SCALABILITY_SS    |

| 6     | SCALABILITY_S2T2 | 14-255 | reserved          |

| 7     | SCALABILITY_S2T3 |        |                   |

Table 17. Temporal and Spatial Mode

| Scalability mode  | Spatial Layers | Resolution Ratio | Temporal Layers | Inter-layer-dependency |

|-------------------|----------------|------------------|-----------------|------------------------|

| SCALABILITY_L1T2  | 1              |                  | 2               |                        |

| SCALABILITY_L1T3  | 1              |                  | 3               |                        |

| SCALABILITY_L2T1  | 2              | 2:1              | 1               | Yes                    |

| SCALABILITY_L2T2  | 2              | 2:1              | 2               | Yes                    |

| SCALABILITY_L2T3  | 2              | 2:1              | 3               | Yes                    |

| SCALABILITY_S2T1  | 2              | 2:1              | 1               | No                     |

| SCALABILITY_S2T2  | 2              | 2:1              | 2               | No                     |

| SCALABILITY_S2T3  | 2              | 2:1              | 3               | No                     |

| SCALABILITY_L2T2h | 2              | 1.5:1            | 2               | Yes                    |

| SCALABILITY_L2T3h | 2              | 1.5:1            | 3               | Yes                    |

| SCALABILITY_S2T1h | 2              | 1.5:1            | 1               | No                     |

Table 17. Details in the Temporal and Spatial Mode

Other Tools

===========

Quantization Matric

-------------------

AV1 support 15 sets of QMs, which are based on the contrast-sensitive functions.

QMs are applied to a frame based on selectable scaling of its quantization

level, higher level of quantization imply flatter matrices. The matrices become

flatter as the quantization index value increases (and the quality decreases).

Inter matrices are slightly flatter than intra matrices.

Superblock Delta-quantization

-----------------------------

AV1 allow the per-superblock changes in quantization parameter to support

sub-frame rate control. At the same time it support the ROI level rate control

on the top of segmentation level parameter.

OBU (Open Bitstream Unit)

-------------------------

An AV1 bitstream consists of a number of OBUs that are normally held within a

container format alongside audio and timing information. Here the new tool OBU

is introduced in AV1 and it is similar to NAL (Network Abstract Layer) in

AVC/HEVC spec.

The OBU header is similar to the NAL header. In general the total 8 bits are

presented. The OBU extra 8 bits of extension header is used if temporal and

spatial layer exist in the bitstream. obu_type is the most important syntax to

describe the type of OBU .

| Index | obu_type                   |

|-------|----------------------------|

| 0     | Reserved                   |

| 1     | OBU_SEQUENCE_HEADER        |

| 2     | OBU_TD                     |

| 3     | OBU_FRAME_HEADER           |

| 4     | OBU_TILE_GROUP             |

| 5     | OBU_METADATA               |

| 6     | OBU_FRAME                  |

| 7     | OBU_REDUNDANT_FRAME_HEADER |

| 8-14  | Reserved                   |

| 15    | OBU_PADDING                |

Table 18. Type of OBU

Reference:

==========

1.