rudyb
10th November 2014, 08:42
Hi
Can someone please validate if my understanding of the following statement is correct?
Based on reading about CABAC and its close dependency from one bin to another bin, and based on analyzing how CABAC is done in HM model, it seems that within one CTU, I can only process one bin per clock cycle. And everything is indeed serial inside one CTU, and parallelizing is not really trivial inside one CTU. For example, if I am parsing a TU block of 32x32, this will literally takes 32x32=1024 clock cycle to process the bins. And even so, I can not start processing the second TU block of 32x32 within the same CTU, unless I am done with parsing the first TU block completely.
In other words, parallelization within one CTU cannot happen. And, the only way I can use some parallelism is by using Tiles or WPP. And for the case of WPP, this implies that I should finish processing two CTUs of the first row, until I can start processing the first CTU of the second row.
So, parallelism cannot happen inside one CTU (or at least that is how HM achieves this by serial processing of bins inside each TU). And even when I use WPP or Tiles, still TU parsing is a serial process inside each CTU.
Please verify if my above understanding is correct?
Thanks,
--Rudy
Can someone please validate if my understanding of the following statement is correct?
Based on reading about CABAC and its close dependency from one bin to another bin, and based on analyzing how CABAC is done in HM model, it seems that within one CTU, I can only process one bin per clock cycle. And everything is indeed serial inside one CTU, and parallelizing is not really trivial inside one CTU. For example, if I am parsing a TU block of 32x32, this will literally takes 32x32=1024 clock cycle to process the bins. And even so, I can not start processing the second TU block of 32x32 within the same CTU, unless I am done with parsing the first TU block completely.
In other words, parallelization within one CTU cannot happen. And, the only way I can use some parallelism is by using Tiles or WPP. And for the case of WPP, this implies that I should finish processing two CTUs of the first row, until I can start processing the first CTU of the second row.
So, parallelism cannot happen inside one CTU (or at least that is how HM achieves this by serial processing of bins inside each TU). And even when I use WPP or Tiles, still TU parsing is a serial process inside each CTU.
Please verify if my above understanding is correct?
Thanks,
--Rudy