View Single Post
Old 12th February 2023, 15:49   #7  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,074
Quote:
Originally Posted by benwaggoner View Post
And H.264 itself just doesn't have the same opportunities for really big SIMD functions to speed things up. There are only 4x4 and 8x8 blocks,
The total idea of SIMD is to process more data with single program of instructions.
4x4 partitions of 16x16 macroblock in x264 cause processing of 16 (4x4 array) very small blocks of 4x4 samples sized. So in current versions as I see it process 16 blocks of 4x4 in 16 loop passes scanning it 4x4 blocks array one by one.

So the task to put it to more SIMD looking is to process several blocks of single macroblock in single SIMD function. Like 2..4..8 blocks of 4x4 per one SIMD loop pass. It will decrease total CPU cycles for processing all 16 4x4 blocks and make better performance.

Current situation at the begining of 2023 year of dying this great tech civilization is really the next:
1. The more simplier program or algorithm the easier it to program for really complex 'internal paralleling' or 'wide vectoring' to more and more wide SIMD units in todays and future still promised CPUs. So as the complexity of MPEG encoders typically decreases to the lowering it version the best to put to AVX512 is sort of MPEG-1. Programmers typically like simple programs - easier to design/debug/support.

2. As the civilization is dying from degradation the available programmers resources become lower and lower. We already pass the phase when it was populair in general public to have a home PC and to understand it and have ability to create programs for it. Also education of the current young generation become poorer and poorer. I think about zero or very few of schools learn all childern how to optimize computer programs for todays AVX2/AVX512 chips. We possibly never more in this civilization will got again as many freeware opensource good educated and with still enough quality genome programmers as we have in 199x..200x years. But unfortunately hardware manufacturers can only provide SIMD about MMX/SSE(2) at that decades. That programmers are mostly lost nowdays (at least as I can see from opensource projects - may be go to the very good payment job).

3. The quality of MPEG-1 is poor enough so when selecting which version of MPEG encoder to put to current CPUs we definitely need to select something more higher in quality.

4. As was already noted in the https://forum.doom9.org/showthread.p...35#post1982735 thread there is cleary visible 'effect of saturation' of MPEG versions quality for everyday media view for general public at about MPEG-4 ASP (or may be somewhere between ASP and AVC).

5. As was noted in this thread the MPEG-HEVC and higher are much more complex in algorithms so harder to understand, create programming design for AVX2/AVX512 architecture, implement, debug and support.

So as followed from 1..5 the h.264 MPEG is about most probable candidate (near MPEG-4 ASP that is about xvid freeware and opensource (?) project) of putting some residual programming resources to make run better at current hardware chips (sort of 'Make x264 Great Again"). And really if it will show some notable improvements it may be x265 developers may take this implementations to 265 and more complex projects.

MPEG-2 is even simplier but provide too low quality for current general public usage so is lost in competition for residual programming resources. So developers need some balance between minimum acceptable by general public MPEG quality and maximum complexity of MPEG encoder to implement. Too complex MPEG encoders with possibly a bit better compressability may lost in C-references of very low performance of encoding. Or their workunit for SIMD parts will use less and less of the possible SIMD processing capability and performance gain from adding some more AVX to the very low workunit implementation will be close to invisible (as we see todays with enabling AVX512 at x26*).
Fast MPEG encoder at AVX2/AVX512 and possible future chips should have some somehow limited complexity (defining possible max compressability). It will make its chances to stay alive in the future when many experimental still slow codecs will left in the past. The example of xvid at todays internet shows this process. To be Fast and make acceptabe quality is a key to staying alive.

So the relative simplicity of algorithms of x264 is really helps it to have some programmers attention to make it more better everyday workhourse for media compression in compare with x265 and later. For example there is already old idea to try to add support of accepting motion estimation from hardware accelerators via current Microsoft DX12 API (independent of each hardware vendors API). It may also saves some time in IPB compression modes.

Most of MPEGs from MPEG1 and may be up to h.266 and more still small blocks-based internally and number of blocks in the frame to encode is much higher in compare with 'nominal workunit' of AVX512 and next SIMD units so any of these MPEG encoders may be put to AVX512 providing full load of dispatch ports.

Last edited by DTL; 12th February 2023 at 22:20.
DTL is offline   Reply With Quote