How HEVC/H.265 works, technical details & diagrams [Archive] - Page 3

sdancer75

1st May 2018, 19:26

Because the smallest CU is 8x8, there are four 4x4 transform units in a 2x2 arrangement when TX size is set to 4x4. There is never a single 4x4 transform by itself in a CU.

TUs and PUs only match size in intra blocks. For example inter blocks can have non-square PUs with various different sized TUs.

Thank you !

I am a little confused with all this stuff.

1) Can you please make a simple text graph of the 4x4 TUs in 2x2 arrangement ?
2) Does CUs are consisting ONLY from TUs and PUs ?
3) Does TUs and PUs are containing with Luma and chroma data ?

pieter3d

1st May 2018, 19:42

Thank you !

I am a little confused with all this stuff.

1) Can you please make a simple text graph of the 4x4 TUs in 2x2 arrangement ?
2) Does CUs are consisting ONLY from TUs and PUs ?
3) Does TUs and PUs are containing with Luma and chroma data ?

1) The forum here doesn't let me get creative with ascii art,
but think of it similar to the way a 16x16 JPEG/MPEG-2 macroblock has four 8x8 DCT blocks (in luma):
http://slideplayer.com/slide/4759570/15/images/8/Macroblocks+Macroblock+is+basic+unit+for+compression.jpg

2) A CU always contains one or more PUs and one or more TUs.

3) Luma and chroma are grouped together when talking about TUs and PUs.

sdancer75

1st May 2018, 19:48

1) The forum here doesn't let me get creative with ascii art,
but think of it similar to the way a 16x16 JPEG/MPEG-2 macroblock has four 8x8 DCT blocks (in luma):
http://slideplayer.com/slide/4759570/15/images/8/Macroblocks+Macroblock+is+basic+unit+for+compression.jpg

2) A CU always contains one or more PUs and one or more TUs.

3) Luma and chroma are grouped together when talking about TUs and PUs.

Thank you

So for the question (2) the only data units a CU can contain is TUs and PUs, and for question (3) luma & chroma are grouped together in case we are talking about for TBs and PBs (i suppose not in case of blocks correct ?)

Regards,

pieter3d

1st May 2018, 20:16

"Block" is pretty generic, kind of depends on context. It's not an official term in the spec.

foxyshadis

2nd May 2018, 03:02

@pieter3d Please clarify this to me.

In your very first post you say "HEVC supports four transform sizes: 4x4, 8x8, 16x16 and 32x32.", but inside "JCT-VC High Efficiency Video Coding (HEVC) Test Model 16 (HM 16) Improved Encoder Description" at Paragraph 4.2.5 “Transform unit (TU) and transform tree structure” it says "The transform unit (TU) is a square region of size 8x8, 16x16 or 32x32 luma samples/pixels defined by a quadtree partitioning of a leaf CU.".

From the specification I understand that there is no 4x4 size TU size. Is that correct ?

That's not the spec, although later in that same paragraph the 4x4 transform blocks are mentioned. The spec is very clear:

transform block: A rectangular MxN block of samples on which the same transform is applied.
....
transform unit: A transform block of luma samples of size 8x8, 16x16, or 32x32 or four transform blocks of luma samples of size 4x4, two corresponding transform blocks of chroma samples of a picture in 4:2:0 colour format;

And following that, a bunch of special cases for 4:2:2 and 4:4:4. It's important to note the difference between TUs (which are either one large or 4 4x4 blocks, in luma) and the TBs; additionally, 4:2:2 chroma TUs always consist of two square TBs. (The "rectangular" wording is leftover from when non-square transforms were one of the proposals.)

sdancer75

2nd May 2018, 09:48

"Block" is pretty generic, kind of depends on context. It's not an official term in the spec.

Thank you for clarifying this. I just read the text below in the
https://codesequoia.wordpress.com/2012/10/28/hevc-ctu-cu-ctb-cb-pb-and-tb/
and when I read "logical unit" it sounded to me a little generic like logical units in Win32 API that takes a physical form somewhere in the implementation way.

We need to understand an important naming convention here. In HEVC standard, if something is called xxxUnit, it indicates a coding logical unit which is in turn encoded into an HEVC bit stream. On the other hand, if something is called xxxBlock, it indicates a portion of video frame buffer where a process is target to.

sdancer75

6th May 2018, 14:50

2) A CU always contains one or more PUs and one or more TUs.

pieter3d

The PUs and TUs are living together inside a CU in parallel or TUs are always inside the PUs like the image below ? I mean what's the hierarchical-block structures inside the CTU ?

https://imgur.com/a/VWMbWha

https://i.imgur.com/SrXvMzF.jpg

pieter3d

6th May 2018, 18:38

They are parallel. It is possible to have a tu larger than pu with inter CUs

foxyshadis

12th May 2018, 00:26

Looking inside the `Void TEncSlice::encodeSlice` of the official HM Test software, I found that in a loop the encoder is trying to encode every CTU inside a slice segment ie 512 CTUs in a unique slice in my script example.

`for( UInt ctuTsAddr = startCtuTsAddr; ctuTsAddr < boundingCtuTsAddr; ++ctuTsAddr )`

If I would like to encode the frame line by line (ie to implement wavefront encoding) should I modify the slice segment to be equal with the frame width or is a wrong approach ?

WaveFront is already implemented in HM, and you can't encode line by line, only CTU by CTU. HM's is not amazingly efficient, since it saves and reloads the context with every CTU row in a single thread, but actual thread synchronization is very difficult to get right. For actual threading, you would encode a row at a time, with a pool that gets released after the first CTU's context is released each row, but you'd need to rearchitect a lot more than just changing the for loop.

foxyshadis

13th May 2018, 19:06

Hi

Yes I mean CTU lines or rows and not pixel lines sorry..

You said that HM saves and reloads the context with every CTU row. Can you point me the code inside HM that this code exists ?

Sure. Line 744-759 of the same file, TEncSlice.cpp, loads the context:
744 else if ( ctuXPosInCtus == tileXPosInCtus && m_pcCfg->getWaveFrontsynchro())
745 {
746 // reset and then update contexts to the state at the end of the top-right CTU (if within current slice and tile).
747 m_pppcRDSbacCoder[0][CI_CURR_BEST]->resetEntropy();
748 // Sync if the Top-Right is available.
749 TComDataCU *pCtuUp = pCtu->getCtuAbove();
750 if ( pCtuUp && ((ctuRsAddr%frameWidthInCtus+1) < frameWidthInCtus) )
751 {
752 TComDataCU *pCtuTR = pcPic->getCtu( ctuRsAddr - frameWidthInCtus + 1 );
753 if ( pCtu->CUIsFromSameSliceAndTile(pCtuTR) )
754 {
755 // Top-Right is available, we use it.
756 m_pppcRDSbacCoder[0][CI_CURR_BEST]->loadContexts( &m_entropyCodingSyncContextState );
757 }
758 }
759 }
and line 861-864 saves it:
861 if ( ctuXPosInCtus == tileXPosInCtus+1 && m_pcCfg->getWaveFrontsynchro())
862 {
863 m_entropyCodingSyncContextState.loadContexts(m_pppcRDSbacCoder[0][CI_CURR_BEST]);
864 }

Wavefront synchro implemented or wavefront algorithm itself ? If the algorithm is actually implemented what's the point when it is used as a single thread ?

Wavefront synchro is HM's name for the Wavefront algorithm, they are one and the same. Why? Proof that it can work, and is decodable based on the spec, is all that's necessary for a proof of concept encoder. In theory, correct multithreading of the Wavefront code should produce identical output, but this does it without the complexity of threaded code. Only minimal efforts were ever put into making HM high-performance, even fewer than the speed overhauls that JM eventually had.

You can see early versions of x265 if you want to see HM code with actual multithreaded Wavefront processesing, before the HM code was ripped out and entirely reimplemented.

LigH

14th May 2018, 10:48

https://bitbucket.org/multicoreware/x265/wiki/Home

Clone an early revision using Mercurial (hg).

LigH

14th May 2018, 11:03

Mercurial doesn't care much about "version" tags. Use revision numbers or commit hashes. Revision 0 has (brief) commit hash 09fe40627f03 (https://bitbucket.org/multicoreware/x265/commits/09fe40627f03a0f9c3e6ac78b22ac93da23f9fdf).

sdancer75

17th September 2018, 19:52

hi there,

Is there anyway to create two different sets of TEncSbac classes just before the compressSlice(pcPic) ie one for the 1st half of the pic and the other for the 2nd half of the pic (I have already done this), and finally join them just after the end of the mentioned function?

The TEncSbac is a class and I wonder where the actual encoded data exists ! I need this to create two independent compression processes.

Regards,

sdancer75

30th October 2018, 20:16

Hi,

I noticed a change from HM Reference software v10 to the latest v16 in the Compressing/Encoding CUs. Inside the compressSlice, the older versions are calling compressCU/encodeCU while the newer versions are calling compressCtu/encodeCtu.

So, since both of them ie compressCU & CompressCtu are sequentially calling xCompressCU as well as encodeCU & encode Ctu are sequentially calling xEncodeCU respectivelly, is there any real difference in the compress and encoding procedure ?

sdancer75

20th December 2018, 17:54

How CUs are encoded at the tiles boundary since there is no information available from the neighboring sample (not encoded yet)?

pieter3d

20th December 2018, 17:56

It's the same process as on frame boundaries. there is also a flag that lets you optionally enable use of information from other tiles if those tiles were encoded previously (i.e. left or above tiles).

sdancer75

22nd December 2018, 18:16

It's the same process as on frame boundaries. there is also a flag that lets you optionally enable use of information from other tiles if those tiles were encoded previously (i.e. left or above tiles).

Thank you for you answer. Is that possible to point me this procedure inside the HEVC HM Test model ?

Regards,

pieter3d

23rd December 2018, 05:55

Check these two:
https://hevc.hhi.fraunhofer.de/trac/hevc/browser/trunk/source/Lib/TLibCommon/TComPicSym.cpp#L482
https://hevc.hhi.fraunhofer.de/trac/hevc/browser/trunk/source/Lib/TLibCommon/TComDataCU.cpp#L1009

That should give you a starting point to dig in to.

sdancer75

23rd December 2018, 12:22

Check these two:
https://hevc.hhi.fraunhofer.de/trac/hevc/browser/trunk/source/Lib/TLibCommon/TComPicSym.cpp#L482
https://hevc.hhi.fraunhofer.de/trac/hevc/browser/trunk/source/Lib/TLibCommon/TComDataCU.cpp#L1009

That should give you a starting point to dig in to.

thanks

sdancer75

19th January 2019, 20:45

Check these two:
https://hevc.hhi.fraunhofer.de/trac/hevc/browser/trunk/source/Lib/TLibCommon/TComPicSym.cpp#L482
https://hevc.hhi.fraunhofer.de/trac/hevc/browser/trunk/source/Lib/TLibCommon/TComDataCU.cpp#L1009

That should give you a starting point to dig in to.

Piter3d,

I am in a very strange condition.

if I replace the code at

https://hevc.hhi.fraunhofer.de/trac/hevc/browser/trunk/source/Lib/TLibCommon/TComDataCU.cpp#L499

with
if ((m_ctuRsAddr < 30) || (m_ctuRsAddr > 30)) {

if (m_ctuRsAddr / frameWidthInCtus)
{
m_pCtuAbove = pcPic->getCtu(m_ctuRsAddr - frameWidthInCtus);
}
}

restricting essentially ONLY the 1st CTU of the second row to use the CTU above I get distortion to all CTUs. How do you explain that ?

16692

KarthikTdk

27th May 2024, 06:07

How entropy works in HEVC, like input is output of Quantization, like how this data converts into bins how it varies to each value and output is constant ?

LigH

27th May 2024, 07:22

The entropy coding in HEVC is CABAC = Context-adaptive binary arithmetic coding (https://en.wikipedia.org/wiki/Context-adaptive_binary_arithmetic_coding), a specific variant of the general Arithmetic coding (https://en.wikipedia.org/wiki/Arithmetic_coding).

For more details, you may need to be able to read the source code.

KarthikTdk

31st May 2024, 05:20

Basically we have 3 frames in hevc, I, P, B frames
Initially it take 1st frames as I FRAME , and next p or b
Is this right?
And please tell me order of frame execution and is b frame is supported
And where can get source code of HEVC.
Thank you 🙏.

LigH

31st May 2024, 11:33

Yes, B frames are supported.
I frames exist in two variants: IDR (Intraframe with Decoder Reset) to start a GOP, and intermediate Intraframes for single frames with very different content compared to surrounding frames, so the decoding of P and B frames may skip that one (but they are displayed, just not referenced by other frames)
Reference sources: https://vcgit.hhi.fraunhofer.de/jvet/HM
Reference documents: https://hevc.hhi.fraunhofer.de/

benwaggoner

1st June 2024, 00:53

Basically we have 3 frames in hevc, I, P, B frames
Initially it take 1st frames as I FRAME , and next p or b
Is this right?
And please tell me order of frame execution and is b frame is supported
And where can get source code of HEVC.
Thank you 🙏.
We also have Reference B-frames and non-reference b-frames.

And the new x265 version supports multiple hierarchies of B-frames.

KarthikTdk

3rd June 2024, 05:23

Thanks you for replying @benwaggoner and LigH and
Can i get source code in verilog Or c languages
Thank you!

KarthikTdk

3rd June 2024, 05:36

After I frame done we go with p and b frame after that again I FRAME will come again?
If it will come then when like at which conditions!
Thank you!

rwill

3rd June 2024, 06:33

@KarthikTdk: I think you are way in over your head.

lvqcl

3rd June 2024, 10:10

Basically I am working on this protocol
What protocol?

KarthikTdk

3rd June 2024, 10:13

what protocol?

h265/hevc

LigH

3rd June 2024, 15:11

Can i get source code in verilog Or c languages

I already posted (https://forum.doom9.org/showthread.php?p=2002432#post2002432) you the link to "Reference sources" at Fraunhofer, they are in C/C++.

And highly optimized and enhanced sources for the x265 encoder in C/C++ and Assembler are available from Multicoreware's Bitbucket (https://bitbucket.org/multicoreware/x265_git/); more info at https://www.x265.org/

benwaggoner

3rd June 2024, 18:45

And highly optimized and enhanced sources for the x265 encoder in C/C++ and Assembler are available from Multicoreware's Bitbucket (https://bitbucket.org/multicoreware/x265_git/); more info at https://www.x265.org/
And for practical performance, you'll want to be using a version optimized for a given architecture with lots of SIMD usage. ARM and x86 performance are nearly equivalent these days, and there's some decent POWER support as well.

The reference encoders is, charitably, glacially slow.

KarthikTdk

17th June 2024, 10:34

Hi there,
How to identify 1st frame and last frame in H265
Thank you in advance.

Emulgator

17th June 2024, 13:30

Run an indexer on the file, like LWLibavVideoSource.
A .lwi file is generated. It tells the offsets of all frames.
An example:
Index=0,POS=0,PTS=-9223372036854775808,DTS=-9223372036854775808,EDI=0
Key=1,Pic=1,POC=0,Repeat=1,Field=0
Index=0,POS=49499,PTS=-9223372036854775808,DTS=-9223372036854775808,EDI=0
Key=0,Pic=2,POC=4,Repeat=1,Field=0
until
Index=0,POS=23035292,PTS=-9223372036854775808,DTS=-9223372036854775808,EDI=0
Key=0,Pic=3,POC=1731,Repeat=1,Field=0

The last frame: You want to look for ....POS=23035292

Example file size was 21,9 MB (23.048.192 Bytes)

KarthikTdk

18th June 2024, 05:13

Run an indexer on the file, like LWLibavVideoSource.
A .lwi file is generated. It tells the offsets of all frames.
An example:
Index=0,POS=0,PTS=-9223372036854775808,DTS=-9223372036854775808,EDI=0
Key=1,Pic=1,POC=0,Repeat=1,Field=0
Index=0,POS=49499,PTS=-9223372036854775808,DTS=-9223372036854775808,EDI=0
Key=0,Pic=2,POC=4,Repeat=1,Field=0
until
Index=0,POS=23035292,PTS=-9223372036854775808,DTS=-9223372036854775808,EDI=0
Key=0,Pic=3,POC=1731,Repeat=1,Field=0

The last frame: You want to look for ....POS=23035292

Example file size was 21,9 MB (23.048.192 Bytes)
Hi thanks for explaining
Can u tell me again, I didn't get.

LigH

18th June 2024, 10:51

You do not analyse the video stream yourself. You let a smart indexer containing decades of experience do that and then parse the index file it created.

Indexing a media file with MPEG HEVC video is quite complex. It depends at first on whether the video stream is contained in a container, so you would have to demultiplex that container first to get the raw video stream. And raw HEVC video streams would have to be parsed sequentially, from the beginning to the end, byte by byte, if you do not already know some smarter approach (e.g. using GOP index chunks in containers which do have some, like ISO Media, e.g. MP4).

KarthikTdk

20th June 2024, 10:55

Thanks for your answers
I have another doubt regarding
Intra mode decision
What is the formula used, and anyone know about mode gradient value calculation.

Thank you in advance.