How HEVC/H.265 works, technical details & diagrams [Archive] - Page 2

ekaveera

30th January 2014, 11:02

Hi i have seen that the Rounding offset for Intra Mode is (2^r)/3 and inter mode it is (2^r)/6...can someone explain the reason behind these values

pieter3d

3rd February 2014, 19:17

I believe those values are mostly empirically derived.

sharlune

6th February 2014, 16:56

First of all thanks a lot, Pieter3d for your nice explanation.
I want to use HEVC to compress a set of openEXR files I have. OpenEXR is a high dynamic range (HDR) image format. These files are 32bit, 30 channels, i.e. an image has 30 different spectral(color) channels and a pixel in each channel is stored by 32bits.
I consider these images as frames in a 'video' and then I want to encode this video using HEVC, so I can benefit from high compression ratio. Next I want to have random access to this openEXR-HEVC-coded “video”, so that I can quickly and easily read any pixel of any image in my original dataset.
Obviously pixels' bit depth and color-space of my images are different from what is supported by HEVC by default.
As I've no experience in this field, I'm yet not able to see beforehand if there is a theoretical barrier to what I want to do or is it possible to extend, say x265 implementation, to be able to read and encode my 'video'. Preferably I want to do this as simple as possible: add support for reading my input format and some tweaks and change of values here and there, and not changing the whole encoder completely.
Now I'm asking you, the experts, if you see such a barrier or not. You are very kind to provide me with any kind of comments.

pieter3d

6th February 2014, 17:47

HEVC is probably not well-suited to compressing this format.

sharlune

6th February 2014, 20:35

HEVC is probably not well-suited to compressing this format.
Can you please, in a technical level as much as your time permits, tell me why? Or at least show me the right direction to find out the reason?

pieter3d

6th February 2014, 20:52

Well, many of the coding tools in HEVC are designed for 3-component YUV (1 luma channel and 2 chroma channels). Also, the quantization operations, transforms, and motion filters are not at all designed for crazy bitdepths like 32. The current draft range extension to HEVC goes up to 12 bits per channel, and still the same 3 channels, although it does include 4:4:4, which means no chroma sub-sampling.

Your data of 30 channels with 32-bits will need special purpose-built tools to effectively compress.

ricci

12th February 2014, 04:40

Hi, sry if this is the wrong place to ask but here goes. I have a question about the code which might be stupid though I'm completely stuck. Say I have a 32x32 CU block, how can I find the pixels of that block from the code. I'm finding the block using:

TComPic* pcPicTex = pcCU->getSlice()->getTexturePic();
TComDataCU* pcColTexCU = pcPicTex->getCU( pcCU->getAddr() );

This is taken from TEncSearch.cpp of 3d-hevc, though my question stands for any CU block. Any help would be greatly appreciate. Thanks

pieter3d

12th February 2014, 06:27

Is this for encoder or decoder? What are you trying to accomplish in the end?

ricci

12th February 2014, 06:39

Is this for encoder or decoder? What are you trying to accomplish in the end?

Hi, this is for the encoder. I want to try to implement a simple 1-d filter at each side of the block. Thus, I need the value of the pixels in that block so I can apply this filter to the top, bottom, left and right edges of the block. Thanks

pieter3d

12th February 2014, 06:47

I am assuming you are trying to modify the way a CU is predicted? In that case you will want to augment the functions in TComPrediction.cpp

ricci

12th February 2014, 06:57

I am assuming you are trying to modify the way a CU is predicted? In that case you will want to augment the functions in TComPrediction.cpp

Not really no. I should have explained better sorry. Im working on 3D-hevc, specifically with the depth maps. Now I'm trying to follow a paper (i.e. http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6611943&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D6611943)

This paper uses the collocated texture luma block to try to simplify one mode of wedgelet predictions. From the two lines of code I pasted before, (I think) I am finding this collocated texture luma block. Once this block is found, the paper applies a 1-d filter on each of its side. Now to this this, I am assuming that he must extract the pixels of this block in order to do so. That is where I am having difficulties. What I mainly want to do is extract the pixels of this block (and possible save them in this temp new block), apply the 1-D filter to each of its sides, and continue from there.

pieter3d

12th February 2014, 07:00

PM me your gtalk name, we can discuss tomorrow morning (Pacific time). Doesn't look like this will be as easy as you hope

mzso

25th February 2014, 17:09

So, if I have a Core 2 E6750 it's rather unlikely that I'll be able to play FullHD HEVC vides, right?
Tried one sample, which was really choppy.

benwaggoner

27th February 2014, 01:56

So, if I have a Core 2 E6750 it's rather unlikely that I'll be able to play FullHD HEVC vides, right?
Tried one sample, which was really choppy.
If encoded with Wavefront Parallel Processing, maybe. But you've only got two cores, I think without hyperthreading.

asif

28th February 2014, 10:11

Hi what is the significance of Quantization Scale Matrices in HEVC. Can any one explain in detail or provide any good References.

foxyshadis

1st March 2014, 00:25

Hi what is the significance of Quantization Scale Matrices in HEVC. Can any one explain in detail or provide any good References.

It's basically identical to H.264/AVC's quantization, scaled up, which is very similar to MPEG-1,2, and 4's, which are nearly the same as JPEG's quantization. Wikipedia has a very basic overview (https://en.wikipedia.org/wiki/Quantization_%28image_processing%29), then you can see page 8 of this H.264 overview (pdf) (http://www.fastvdo.com/spie04/spie04-h264OverviewPaper.pdf) for more specific information about it, and the mathematics behind it (http://www.h265.net/2009/06/quantization-techniques-in-jmkta-part-2.html). There were some great visual explanations from akupenguin and Dark Shikari here years ago, but I can't find them now.

asif

3rd March 2014, 14:07

Thanks, but in HM code he is using some Default matrices, for example all 16's etc. When i change some numbers still i am getting same bit rate and quality...what might be the reason

foxyshadis

7th March 2014, 00:38

Thanks, but in HM code he is using some Default matrices, for example all 16's etc. When i change some numbers still i am getting same bit rate and quality...what might be the reason

Do you mean that you changed g_quantTSDefault4x4? Did you set ScalingList to 1? Because unless you specify using a custom scaling list, it always scales by a scalar (flat "matrix") no matter what you change to g_quantTSDefault4x4 and g_quantIntraDefault8x8 to. Those lists are built-in custom quant matrices (even if they are flat), not the standard scalar. In rExt, it's a separate value, g_quantScales; in HM, it was a bit-shift, not sure if it uses g_quantscales now too.

asif

9th March 2014, 03:31

Yes Thanks, i have set ScalingList as 1 and now i got different values. Basically i need some already done Research papers on usage of user defined Scale Matrices for Good Compression Compared to Default Matrices. I do not have any idea of how to Manipulate the values to get better Results. Also they are using Same matrices for Chroma and Luma. One more Query is g_quantTSDefault4x4 and g_quantIntraDefault8x8 are Default matrices. But g_QuantScales is an array with 6 values, can you tell what for this array is used.

mandarinka

9th March 2014, 20:37

Related question: does H.265 support custom quantization matrices at all, like H.264 does in high profile?
/In the currently available Main and Main 10 profiles.../

x265_Project

9th March 2014, 23:29

Related question: does H.265 support custom quantization matrices at all, like H.264 does in high profile?
/In the currently available Main and Main 10 profiles.../

In the HM, you can customize your configuration in this section...
#=========== Quantization Matrix =================
ScalingList : 0 # ScalingList 0 : off, 1 : default, 2 : file read
ScalingListFile : scaling_list.txt # Scaling List file name. If file is not exist, use Default Matrix.

ricci

8th April 2014, 19:31

Hi, could someone confirm if the Group of Pictures Structure for the Random Access Configuration in HM is Hierarchical B Prediction (like the one displayed in: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.84.932&rep=rep1&type=pdf)

The GoP is as follows:

# Type POC QPoffset QPfactor tcOffsetDiv2 betaOffsetDiv2 temporal_id #ref_pics_active #ref_pics reference pictures predict deltaRPS #ref_idcs reference idcs
Frame1: B 8 1 0.442 0 0 0 4 4 -8 -10 -12 -16 0
Frame2: B 4 2 0.3536 0 0 0 2 3 -4 -6 4 1 4 5 1 1 0 0 1
Frame3: B 2 3 0.3536 0 0 0 2 4 -2 -4 2 6 1 2 4 1 1 1 1
Frame4: B 1 4 0.68 0 0 0 2 4 -1 1 3 7 1 1 5 1 0 1 1 1
Frame5: B 3 4 0.68 0 0 0 2 4 -1 -3 1 5 1 -2 5 1 1 1 1 0
Frame6: B 6 3 0.3536 0 0 0 2 4 -2 -4 -6 2 1 -3 5 1 1 1 1 0
Frame7: B 5 4 0.68 0 0 0 2 4 -1 -5 1 3 1 1 5 1 0 1 1 1
Frame8: B 7 4 0.68 0 0 0 2 4 -1 -3 -7 1 1 -2 5 1 1 1 1 0

Also, If it is indeed Hierarchical B, does it make a difference that all frames are as Type B, whilst the POC 8 from the link is a I/P frame?

Thanks.

puffpio

19th April 2014, 21:41

Does wavefront in concept like pipelining? Also your description makes it sound like there can only be 2 stages / threads in play for wavefront...will it scale to many-core?

pieter3d

19th April 2014, 21:44

Its very similar to pipelining. Each CTB row can be decoded in its own thread, so it scales to as many cores as there are CTB rows.

Parabola

20th April 2014, 07:34

Does wavefront in concept like pipelining? Also your description makes it sound like there can only be 2 stages / threads in play for wavefront...will it scale to many-core?

Hi puffpio, it sounds like you might be interested in our 4-thread slow-motion wavefront visualisation: http://www.parabolaresearch.com/blog/2013-12-01-hevc-wavefront-animation.html

Nox Metus

8th May 2014, 01:21

I'm trying to understand how DPB management and reference list construction work in H.265. The logic of the recommendation is hard to grasp.

Why short-term reference pictures for the purpose of RPS can only be identified in DPB by PicOrderCntVal, but long-term either by PicOrderCntVal or slice_pic_order_cnt_lsb?

slice_pic_order_cnt_lsb is not unique within a GOP. So there can be a situation that there are two pictures in DPB with the same slice_pic_order_cnt_lsb. How identify a picture for a purpose of a long-term reference then?

What is the reason at all of this complication for long-term reference pictures? Wouldn't it be easier just to use PicOrderCntVal always?

mas_np

22nd May 2014, 13:37

Hi,
I am working on my thesis as implementation of intra-prediction by MATLAB for an image.
I am almost stuck in this thesis.
First of all I need a clear process of how to traverse the image in z-order scan. (how z-scan order really works)
Second I couldn't find so far a very clear description of intra-prediction algorithm.
May someone help me in this regard?

xkfz007

26th May 2014, 09:48

How about the Reference Management of HEVC? Is there some detailed explanation on it?

benwaggoner

26th May 2014, 22:46

Hi,
I am working on my thesis as implementation of intra-prediction by MATLAB for an image.
I am almost stuck in this thesis.
First of all I need a clear process of how to traverse the image in z-order scan. (how z-scan order really works)
Second I couldn't find so far a very clear description of intra-prediction algorithm.
May someone help me in this regard?
When is your thesis due :)? I fear you're a long way from the hard parts.

The first post of this thread is a good place to start.

mas_np

31st May 2014, 03:40

When is your thesis due :)? I fear you're a long way from the hard parts.

The first post of this thread is a good place to start.

Actually I couldn't find in any documentation of this standard which clearly explain about following matters:
1- If I am right for detecting which mode should be used for predicting a PU is using the mode with least RD cost with the function: C = DHad + λ ·Rmode
How should I calculate the RD cost?
no specific declaration about what is λ and how should I obtain or calculate it, what is Dhad which has been just translated in "absolute sum of Hadamard transformed residual signal for a PU" and I cannot understand what it exactly is and how should I obtain it?
what is Rmode and how should I calculate or obtain it?

Is it possible to use SAE which was used in h.264 instead of RD cost or something more simpler than RD cost?

2- According to Fig.1 of this pdf:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.352.3008&rep=rep1&type=pdf

in HEVC there are one row and one buffer columns of samples which is used for prediction of a PU.
Do I always have to have the extension for above reference row from (RN+1,0.. R2N,0) and the extension for left reference column from (R0,N+1.. R0,2N) at the same time or with respect to the selected angle for the block prediction I should have one of these extension at a time for each block?

3- For some angles I should do linear interpolation and for some extrapolation. I couldn't find any method or function to implement these sample generation for reference samples.

4- How should I implement the prediction for angular modes except DC,vertical and Horizontal.

in this patent document:
http://www.google.com/patents/US20130016777

in the paragraph which contains this line:"FIG. 4 shows an embodiment of an intra prediction scheme in a vertical mode"
(Please Find this line in this document)
and the next paragraph there is a clear implementation of vertical and 45 degree modes. However, no clear description on how other angular modes should be implemented could be found neither in this document nor in the previous mentioned PDF in this post.
Moreover, the description for the vertical and 45 degree in this patent document is somehow different from interpolation or extrapolation, isn't it?

Thank you in advance for all further helps...

STaRGaZeR

31st May 2014, 19:22

Hi,
I am working on my thesis as implementation of intra-prediction by MATLAB for an image.
I am almost stuck in this thesis.
First of all I need a clear process of how to traverse the image in z-order scan. (how z-scan order really works)
Second I couldn't find so far a very clear description of intra-prediction algorithm.
May someone help me in this regard?

When you say "intra prediction", what kind do you need to implement? It's a very generic term. HEVC intra prediction is a lot more complex (and effective) than for example MPEG-2's. I did have to do intra prediction with MATLAB too, but it was very simple, only using DC mode, and I didn't have any problems implementing it. Don't ask me about it, I forgot everything about that horrible language :D

LigH

6th September 2016, 08:55

Could anyone please provide diagrams how to imagine motion search methods {dia|hex|umh|star|full}? The question came up which is more elaborate and comprehensive, but to explain why, one may need to see vector distribution diagrams on a coordinate system, I believe... I tried to use Google image search but could not find matching results. But I think to remember that I saw at least a diamond and a hexagonal motion search range once.

sdancer75

27th November 2017, 19:21

Very nice article ! Thanks

sdancer75

23rd April 2018, 16:02

>>HEVC has two tools that are specifically designed to enable a multi-threaded decoder to decode a single picture with threads: Tiles and Wavefront.

The multi-threaded tools are ONLY for the decoding part ? As I can see inside the source code of the HM Test Software v16.x there is prediction also for the encoding part. Am I wrong ?

pieter3d

23rd April 2018, 16:50

The multi-threaded tools are ONLY for the decoding part ? As I can see inside the source code of the HM Test Software v16.x there is prediction also for the encoding part. Am I wrong ?

You're right, the encoder can take advantage of this tool too. But encoders could already do this in a way. For example in principle you could design your encoder with a separate motion search thread at every block.

sdancer75

23rd April 2018, 18:18

You're right, the encoder can take advantage of this tool too. But encoders could already do this in a way. For example in principle you could design your encoder with a separate motion search thread at every block.

@Pieter3d Thanx for the quick response. The real question for me as a researcher is that : The HM Test Software have a wavefront sync flag which means that it has already take into account this type of encoding.

So, in this case If I need to implement a parellelization code, I need to know if the previous line upper top CTU is encoded to start encoding the CTU in the next line. Is that implemented inside HM Software? If yes where specifically can I find this ?

I already checked the functions TEncSlice::encodeSlice(...) and TEndCu::xEncodeCU() and are seem that are the right candidates for parallel implementation. What do you think ?

pieter3d

23rd April 2018, 18:25

The HM reference software is not multi threaded, at least not when I last looked at it. So it doesn't specifically take advantage of it, but it does produce a stream that a decoder can decode with multiple threads.
The HM reference software is meant as just that, a reference. It makes no claim about being suitable for any kind of production setting. For example a production software encoder will almost certainly want to take advantage of this feature.

sdancer75

23rd April 2018, 18:46

The HM reference software is not multi threaded, at least not when I last looked at it. So it doesn't specifically take advantage of it, but it does produce a stream that a decoder can decode with multiple threads.
The HM reference software is meant as just that, a reference. It makes no claim about being suitable for any kind of production setting. For example a production software encoder will almost certainly want to take advantage of this feature.

Hi thank you for your comment. I know that the reference software is not for production settings but what I really want to know is if I want to use the wavefront (not in parallel) encoding do I have to implement by myself or is really exist inside the code ? Do you know that ?

pieter3d

23rd April 2018, 19:02

Hi thank you for your comment. I know that the reference software is not for production settings but what I really want to know is if I want to use the wavefront (not in parallel) encoding do I have to implement by myself or is really exist inside the code ? Do you know that ?

You can enable it in the HM encoder. It will insert the correct syntax and coding structure so that it matches the spec. It just doesn't actually run in a multi-threaded way.

The CABAC state update you can see here:
https://hevc.hhi.fraunhofer.de/trac/hevc/browser/trunk/source/Lib/TLibEncoder/TEncSlice.cpp#L1058

You can see the slice entry points inserted into the stream here:
https://hevc.hhi.fraunhofer.de/trac/hevc/browser/trunk/source/Lib/TLibEncoder/TEncGOP.cpp#L1755

LigH

23rd April 2018, 19:05

WPP is the default parallelism mode in x265, and it runs multi-threaded.

sdancer75

23rd April 2018, 22:04

WPP is the default parallelism mode in x265, and it runs multi-threaded.

Not in HM Test Software...

LigH

23rd April 2018, 22:10

So is there any reason why you restrict yourself to the minimum reference implementation and avoid the practically usable and optimized implementation?

pieter3d

23rd April 2018, 22:14

So is there any reason why you restrict yourself to the minimum reference implementation and avoid the practically usable and optimized implementation?

It is not the JCT-VC group's goal to develop an encoder that is targeted to some production purpose. There are many ways that various companies and people want to use HEVC and they all have different design constraints. A highly optimized multi-threaded encoder for example is not appropriate for research or for hardware accelerator development.

The HM software is there as reference, an implementation that works and can demonstrate (nearly) all the features in the specification.

sdancer75

23rd April 2018, 22:16

You can enable it in the HM encoder. It will insert the correct syntax and coding structure so that it matches the spec. It just doesn't actually run in a multi-threaded way.

The CABAC state update you can see here:
https://hevc.hhi.fraunhofer.de/trac/hevc/browser/trunk/source/Lib/TLibEncoder/TEncSlice.cpp#L1058

You can see the slice entry points inserted into the stream here:
https://hevc.hhi.fraunhofer.de/trac/hevc/browser/trunk/source/Lib/TLibEncoder/TEncGOP.cpp#L1755

Hi

When wavefront is disabled the encoding is following the left to right and top to down (zig zag) scheme ? After a slice compression and encoding the data are send to cabac for each one of them one after the other ? In case the wavefront scheme is enabled (without multi threading) the encoding is the same zig zag (since no parallel lines are encoding at the same time)?

pieter3d

23rd April 2018, 22:22

Hi

When wavefront is disabled the encoding is following the left to right and top to down (zig zag) scheme ? After a slice compression and encoding the data are send to cabac for each one of them one after the other ? In case the wavefront scheme is enabled (without multi threading) the encoding is the same zig zag ?

The order of CTUs (64x64 blocks) is the same either case: left to right, top to bottom, also known as raster order (same as reading order).
The difference is in how the CABAC state is managed.
WPP off: CABAC state is reset at the start, and simply is updated as the encoder proceeds in raster order.
WPP on: The CABAC state is reset at the start of row 0 (same as before), but at the the start of every other row, the CABAC state is copied from the first CTU of the row above.

This means you can have a thread performing encode for each CTU row, as long as it starts after the row above has finished it's first CTU.

sdancer75

24th April 2018, 12:14

The order of CTUs (64x64 blocks) is the same either case: left to right, top to bottom, also known as raster order (same as reading order).
The difference is in how the CABAC state is managed.
WPP off: CABAC state is reset at the start, and simply is updated as the encoder proceeds in raster order.
WPP on: The CABAC state is reset at the start of row 0 (same as before), but at the the start of every other row, the CABAC state is copied from the first CTU of the row above.

This means you can have a thread performing encode for each CTU row, as long as it starts after the row above has finished it's first CTU.

Thank you for your responding.

So, If I want to use parallel wavefront encoding do I have to touch the CABAC code in the way it is written inside HM code ? My conclusions are that is not needed since the CABAC takes in mind the wavefront encoding and the only step is needed is the appropriate parallel synchronization of the shared data.

pieter3d

24th April 2018, 16:05

Are you writing your own encoder? Or using HM?

sdancer75

25th April 2018, 09:12

Are you writing your own encoder? Or using HM?

Hi,

No I am using HM Code and dont care about a production encoder just to make my own research about WPP. I want to adapt the implemented wavefront single threaded to multithreaded.

sdancer75

1st May 2018, 18:36

@pieter3d Please clarify this to me.

In your very first post you say "HEVC supports four transform sizes: 4x4, 8x8, 16x16 and 32x32.", but inside "JCT-VC High Efficiency Video Coding (HEVC) Test Model 16 (HM 16) Improved Encoder Description" at Paragraph 4.2.5 “Transform unit (TU) and transform tree structure” it says "The transform unit (TU) is a square region of size 8x8, 16x16 or 32x32 luma samples/pixels defined by a quadtree partitioning of a leaf CU.".

From the specification I understand that there is no 4x4 size TU size. Is that correct ?

Another question is that Transform Units co-exist with Prediction Units inside a CU ? For example is the graph below correct ? The data are keeping TUs and PUs are luma and 2 chroma values ?

https://thumb.ibb.co/i4xMin/CU_example.jpg (https://ibb.co/i4xMin)

pieter3d

1st May 2018, 19:07

@pieter3d Please clarify this to me.

In your very first post you say "HEVC supports four transform sizes: 4x4, 8x8, 16x16 and 32x32.", but inside "JCT-VC High Efficiency Video Coding (HEVC) Test Model 16 (HM 16) Improved Encoder Description" at Paragraph 4.2.5 “Transform unit (TU) and transform tree structure” it says "The transform unit (TU) is a square region of size 8x8, 16x16 or 32x32 luma samples/pixels defined by a quadtree partitioning of a leaf CU.".

From the specification I understand that there is no 4x4 size TU size. Is that correct ?

Another question is that Transform Units co-exist with Prediction Units inside a CU ? For example is the graph below correct ? The data are keeping TUs and PUs are luma and 2 chroma values ?

https://thumb.ibb.co/i4xMin/CU_example.jpg (https://ibb.co/i4xMin)

Because the smallest CU is 8x8, there are four 4x4 transform units in a 2x2 arrangement when TX size is set to 4x4. There is never a single 4x4 transform by itself in a CU.

TUs and PUs only match size in intra blocks. For example inter blocks can have non-square PUs with various different sized TUs.