PDA

View Full Version : How difficult is it to decode high profile?


Chengbin
16th June 2009, 03:39
How much more CPU is required to decode something with high profile?

I find it strange that no PMP supports high profile (Archos, PSP, Cowon to name a few). I thought high profile is just 8x8dct.

Multiple b-frame, ref-frame, b-pyramid, deblock, CABAC, are supported, and they consume a huge amount of decoding power. I'm pretty sure 8x8dct can't come anywhere near the amount of CPU they need to be decoded.

Or is it something about the nature of integrated chipsets that doesn't support hardware acceleration of 8x8dct?

Just curious, is it possible to have different level of performance in decoding on the same chipset with different software? In other words, is it possible to optimize the decoding efficiency of hardware acceleration?

neuron2
16th June 2009, 03:58
I thought high profile is just 8x8dct. You thought wrong. Read the spec.

Dark Shikari
16th June 2009, 04:12
High profile is 8x8dct, CQMs, and a few other minor things (vs main profile). None of these add significantly to CPU cost.

Chengbin
16th June 2009, 04:20
You thought wrong. Read the spec.

Where can I find it?

Dark Shikari
16th June 2009, 05:00
Google is your friend (http://www.itu.int/rec/T-REC-H.264/en)

CruNcher
16th June 2009, 13:56
Fidelity Range Extensions (the official Engineering name of High Profile) where very late into the play for H.264 mainly to make Hollywood Engineers Happy to let H.264 become the next HD-DVD/Blu-Ray format @ the side of VC-1 (that was able back then to beat H.264 for Movies without it in Hollywoods Subjective High Bitrate tests) and now it seems also to totaly reduce costs (the last penny of development costs) and to have a another Consumer/Commercial differentiator, the industry decided that Main Profile is enough for the Average Consumer and High Profile should stay a Blu-Ray exclusive thing for a while now (at least it seems to me like that).
So i guess we gonna have High Profile HD Mobile PMPs maybe in the middle of next year maybe in the end though they wont be more expensive as the Main Profile HD PMPs that come out now will fall in price by then (when the next Generation Devices reach the market), though you can be sure High Profile will be sold by then as a new "exciting feature" maybe even with the words "Blu-Ray like experience" on the go. Marketing will be happy with this "new feature" and how to promote it right to the Avg joey new exciting stickers and names will come up for sure :D
Best of all most wont even have to develop a new device or buy a new Chip @ all they just gonna update the Firmware make a new Houseing and sell it as something brand new (that is efficency) ;)

Hope Ben wont forget this Chapter (the introduction of FRExt by Panasonic Research) in his book it's a very exciting thing to read :)

Manao
16th June 2009, 19:59
the industry decided that Main Profile is enough for the Average Consumer and High Profile should stay a Blu-Ray exclusive thing for a while now (at least it seems to me like that)Define average consumer. Because as far as I can see, a device supports either a partial main profile implementation (think ipod/quicktime/iphone or psp), or it supports everything up to high profile. And everything that gets broadcasted seems to be high profile, and so are every software decoder.

benwaggoner
17th June 2009, 04:11
Define average consumer. Because as far as I can see, a device supports either a partial main profile implementation (think ipod/quicktime/iphone or psp), or it supports everything up to high profile. And everything that gets broadcasted seems to be high profile, and so are every software decoder.
I concur; it seems like we're converging on Baseline and High 4:2:0 8-bit as the mainstream implementations.

Certainly for web video, we've got Silverlight, QuickTime, and Flash all doing High.

I'm not seing much being doing full Main but not full High 8-bit 4:2:0, nor much doing High past that. Baseline + B-frames seems to be increasingly common.

Archimedes
19th June 2009, 13:50
Shortly i was making some benchmark tests with CorePlayer on my HP iPAQ 214, to see, how x264 parameters affects the decoding power. Based on a reference command line (from MeGUI) i changed only one parameter and run the benchmark tests (two times, for sure). The values means, how fast the CorePlayer can play the content. 100 % means real time. The resolution was 640x352. The video was a one minute (deinterlaced) dv video clip. Audio was encoded with Nero AAC at 128 kbps.

program --pass 2 --bitrate 426 --stats ".stats" --ref 5 --mixed-refs --bframes 3 --b-adapt 2 --b-pyramid --weightb --direct auto --subme 7 --trellis 2 --partitions p8x8,b8x8,i4x4,i8x8 --8x8dct --me umh --threads auto --thread-input --progress --no-psnr --no-ssim --output "output" "input" --fullrange on

Reference: 111,53 % and 111,13 %

--subme 6: 104,51 % and 103,40 %
--subme 8: 111,06 % and 109,68 %
--subme 9: 110,96 % and 109,70 %

--no-cabac: 127,97 % and 126,29 %
--ref 3: 110,60 % and 110,07 %
--ref 1: 114,60 % and 114,94 %
Without --mixed-refs: 113,89 % and 112,48 %

--bframes 9: 109,72 % and 110,46 %
--bframes 5: 111,56 % and 110,80 %
--bframes 2: 110,82 % and 110,46 %
--bframes 1: 121,58 % and 120,26 %
Without --bframes: 133,45 % and 131,83 %
Without b-pyramid: 109,92 % and 108,93 %
Without weightb: 116,29 % and 114,53 %

Without --8x8dct: 117,89 % and 116,91 %


Note: AVC deblocking was disabled in all cases in CorePlayer.
With AVC deblocking i only get benchmarks 84,43 % and 84,53 %.
May be it's not 100 % comparable to other decoding environments, but it gives you an idea, how options are affected the decoding power.

benwaggoner
19th June 2009, 17:03
Note: AVC deblocking was disabled in all cases in CorePlayer.
With AVC deblocking i only get benchmarks 84,43 % and 84,53 %.
Do you mean you only see performance differences when deblocking is turned off? Since Deblocking will and should be on for playback that's really the main scenario that should be tested.

Beyond just CPU load of decode, it's also interesting to look a tthe impact of settings on startup time and latency for random access, which complex reference structures could potentially change things.

Archimedes
19th June 2009, 18:31
Do you mean you only see performance differences when deblocking is turned off? Since Deblocking will and should be on for playback that's really the main scenario that should be tested.
No, with deblocking on i have no chance to play any H.264 content on my iPAQ within reasonable bitrates and resolutions. I have to disable nearly all features of x264 to make a playback possible. The results are not very amusing. But with deblocking off, i'm able to play H.264 content with a resolution of 640x352 with 25 fps and a bitrate of about 384 kbps (benchmark nearly 120 %). The results looks good. On the small display i do not missing the deblocking filter so far.

A resolution of 640x480 is only usable with XviD. Here i can use bitrates up to 3000 kbps without any problems. So, the best playback quality can only be achived with XviD.

akupenguin
19th June 2009, 22:51
Problem with your test methodology: bits cost time too. The various options that improve quality-per-bitrate cost some time, but their corresponding reduction in bitrate saves some time. And not just in the entropy decoder; also fewer nonzero coefs to idct and fewer partitions to mc or predict. So it's better to compare options at equal quality, and then (if you care) separately measure the cost of increasing quality.

And even without that, when I encode with your settings but decode with lavc on conroe/x86_64, 8x8dct is faster than no-8x8dct. 1.9% faster at equal bitrate, or 5.5% faster if I reduce bitrate by 9% to get equal psnr.

I don't know how much of it is coreavc and how much of it is your arm cpu, but this gives an idea of how incomparable results may be.

Dark Shikari
19th June 2009, 22:55
I don't know how much of it is coreavc and how much of it is your arm cpu, but this gives an idea of how incomparable results may be.CoreAVC has no WiMMX, ARMv6, or NEON idct8, while libavcodec does have a NEON idct8.

Also, CoreAVC's x86 idct8 is rather crappy as well.

honai
19th June 2009, 23:33
CoreAVC has no WiMMX, ARMv6, or NEON idct8, while libavcodec does have a NEON idct8.

Also, CoreAVC's x86 idct8 is rather crappy as well.

That's interesting. So are you saying that CoreAVC's x86 idct8 is defective, or not according to specs, or both?

Dark Shikari
20th June 2009, 00:08
That's interesting. So are you saying that CoreAVC's x86 idct8 is defective, or not according to specs, or both?No, it's just slow.

Shevach
25th June 2009, 12:00
H.264 encoding/decoding performance strongly depends on a concrete processor one uses, especially on cache realization in a given processor (one-way, two-way or four-way associative cache).

From ASIC development perspective the most time-consuming H.264 block is CABAC. Encoding/decoding video in CABAC mode can cause severe performance peaks. This requires large performance peaks smoothing buffers on both encoder and decoder sides. Consequently encoding-decoding latency significantly icnreases, otherwise a decoder should drop pictures to stand real-time playback.

By nature CABAC encoding/decoding is strongly serial. The problem is that CABAC is not pipelined.
Indeed CABAC has three stages: binarization, context modelling and arithmetic coding. Context modeling and arithmetic coding can not be pipelined because context models of mb_type depend on the value of previous bins. This strong feedback disables CABAC pipelining.
Consequently CABAC can't be optimized and significant performance peaks are expected.

Dark Shikari
25th June 2009, 18:26
By nature CABAC encoding/decoding is strongly serial. The problem is that CABAC is not pipelined.
Indeed CABAC has three stages: binarization, context modelling and arithmetic coding. Context modeling and arithmetic coding can not be pipelined because context models of mb_type depend on the value of previous bins. This strong feedback disables CABAC pipelining.
Consequently CABAC can't be optimized and significant performance peaks are expected.You can still run each frame's CABAC decoding separately in parallel.

Archimedes
25th June 2009, 18:29
Sure, there are too many dependencies. However, i did the benchmark test again (with the same files as above). But now with AVC deblocking enabled in CorePlayer. All files have nearly the same size.

program --pass 2 --bitrate 426 --stats ".stats" --ref 5 --mixed-refs --bframes 3 --b-adapt 2 --b-pyramid --weightb --direct auto --subme 7 --trellis 2 --partitions p8x8,b8x8,i4x4,i8x8 --8x8dct --me umh --threads auto --thread-input --progress --no-psnr --no-ssim --output "output" "input" --fullrange on

Reference: 86.87 % and 87.05 %

--subme 6: 84.24 % and 83.40 %
--subme 8: 86.34 % and 85.87 %
--subme 9: 86.41 % and 85.88 %

--no-cabac: 99.19 % and 98.45 %
--ref 3: 86.26 % and 85.31 %
--ref 1: 89.17 % and 88.55 %
Without --mixed-refs: 87.21 % and 87.50 %

--bframes 9: 87.13 % and 86.24 %
--bframes 5: 86.83 % and 85.76 %
--bframes 2: 87.40 % and 86.82 %
--bframes 1: 93.12 % and 92.49 %
Without --bframes: 103.48 % and 102.61 %
Without b-pyramid: 86.23 % and 85.53 %
Without weightb: 89.92 % and 88.74 %

Without --8x8dct: 89.81 % and 89.22 %

Without AVC deblocking in CorePlayer, i have to lowering the bitrate up to 384 kbps (640x352), but i can use all features of x264. This results in a benchmark of about 120 %, which is a good value.

With AVC deblocking in CorePlayer, what can i do? I can disable the b frames and 8x8dct. But this is not enough for a bitrate of about 384 kbps. What next? CABAC? The more features i deactivate, the more bitrate i must spend. A vicious circle.

In my very special case (hardware limitations), best playback quality can only be achived with XviD (the iPAQ is able to play XviD videos with 3000 kbps at a resolution of 640x480 without any problems).

Sorry, for a little bit off topic.

Cyber-Mav
1st July 2009, 02:38
is it not possible to encode your videos with deblocking disabled in the encoder so that you dont need deblocking enabled on your ipaq hence videos will run smooth and wont have artifacts in them.

not sure if it is possible to disable deblocking when encoding using x264. i see an option for deblocking strength in megui thats all. also how much difference would there be in a encode that does not use deblocking on the required bitrate to maintain quality?

Dark Shikari
1st July 2009, 02:42
is it not possible to encode your videos with deblocking disabled in the encoder so that you dont need deblocking enabled on your ipaq hence videos will run smooth and wont have artifacts in them.

not sure if it is possible to disable deblocking when encoding using x264.--no-deblock

Cyber-Mav
1st July 2009, 02:56
is it true that disabling deblocking reduces x264's efficiency to the same level as xvids efficiency?

Dark Shikari
1st July 2009, 02:56
is it true that disabling deblocking reduces x264's efficiency to the same level as xvids efficiency?No, there are plenty of other things that make x264 good other than just deblocking.

Cyber-Mav
1st July 2009, 02:59
what sort of percentage is lost in encoding efficiency in x264 when deblocking is disabled.

also i have noticed that when the decoder has deblocking disabled there is a massive speedup in decoding on my old athlon 1.4ghz thunderbird cpu. more so than disabling cabac even.

Archimedes
1st July 2009, 17:59
is it not possible to encode your videos with deblocking disabled in the encoder so that you dont need deblocking enabled on your ipaq hence videos will run smooth and wont have artifacts in them.
Indeed, i encountered some problems with this combination (deblocking enabled during encoding and disabled while playback). From time to time there was something like a fog over some frames. I was wondering, from what it came about. Disabling deblocking at encoding times helps. Thank you.

also how much difference would there be in a encode that does not use deblocking on the required bitrate to maintain quality?
Normally, disable deblocking at such low bit rates is a very bad idea. The results looks terrible, when viewed on a computer monitor. But on the small display of the iPAQ you won't see much of the blocking artefacts. Zooming to 50 % (or may be higher) when viewing on a computer monitor gives you an idea about that.

Another option is, enable deblocking and disable CABAC, b frames and 8x8dct. This gives me nearly the same benchmark results as with disable deblocking. Disable deblocking gives me a benchmark of about 114 %, disable CABAC, b frames and 8x8dct gives me a benchmark of about 117 % (tested with a 3.5 minutes tv capture). Both videos have the same bit rate (384 kbps).

Disable deblocking results in a very blocking video. Disabling CABAC, b frames and 8x8dct results in a very smooth video. Both videos looks terrible, when viewed with a computer monitor. On the computer monitor i would prefer the smooth video, because the blocking artefacts of the other video are too heavy. On the iPAQ i would prefer the blocking video, because here, i'm not able to see much of the blocking artefacts. However, it's a personal taste. The bit rate is too low for such sources.

Die to the hardware limitations, i'm using XviD for my encodings. 1000 kbps for a resolution of 640x352 can't be beated with x264. You need a bitrate of at least 60 % (or may be more) of the XviD bitrate to become a comparable (in terms of quality) video (tested with tv captures). No chance to play this content (e. g. 600 kbps with all features enabled) on the iPAQ.

Sagekilla
1st July 2009, 22:55
The only thing is, 8x8dct on it's own helps decoding rather than hurt it. You can use a lower bitrate for the same quality, and the lower complexity gained by having a smaller bitstream offsets the cost of turning on 8x8dct. If you encode at the same bitrate though, it does slow things down slightly I think.

The only problem with your testing is you've tested CABAC, B frames AND 8x8dct off. What about just CABAC and B frames off?

Archimedes
2nd July 2009, 00:52
The only problem with your testing is you've tested CABAC, B frames AND 8x8dct off. What about just CABAC and B frames off?
For the statistic: With CABAC disabled and b frames disabled i get a benchmark of about 114 %. (tested with the same 3.5 minutes tv capture as above).

Deblocking disabled has the same benchmark as CABAC disabled and b frames disabled (at a bit rate of 384 kbps).

nurbs
2nd July 2009, 01:08
The time spent on cabac decoding scales with the bitrate, so with higher bitrates it will take more time than b-frames and deblocking.