Doom9's Forum - View Single Post

benwaggoner · 1st December 2022, 01:27

Quote:

Originally Posted by Easy

This question may be naiv , but still I would like a professional answer to it if possible

Question is: Why not optimize x264 code with x265 code.
So here is the naiv part: x265 is e.g. known for using more and bigger partitions than x264. For example 32x32 and 64x64 , but why not use something like 'group of partitions' in x264 ; for example use four 8x8 to get 32x32.
Using more fake CTU parts in Macroblocks is only an example, may not be useful in the real world but atleast something that can be tested. Also there may be DCT optimizations that could be ported.
Also should it technical be possible to use CABAC and CAVLC simultaneous to reduce decoding spikes.
I know that one can not touch the decoder side of things as it should stay compatible with existing decoders. So not to many 'features' can be ported without changing also how to decode them.

An answer from especially people how looked at the source code would be very helpfull. I don't know c++ or asm, but I know how a function looks and some rough porting theory. So maybe this could be my weekends hobby for the next year or so.

Interesting ideas. I'm not either if your examples would provide practical benefit, but you're thinking about the problem in a good, creative way.

MultiCoreWare licensed all of x264, and x265 started out largely in mixing x264 performance and psychovisual optimization stuff with the HEVC reference encoder. After a few years, they started going into all-new directions not based on either code base. But the deal was they would contribute everything back to x264 so relevant features could be added. Not much actually got backported, though. Probably because x265 development really got into gear as x264 was sliding into a more maintenance mode.

There's lower-hanging fruit in adding features to x264 that I know MCW contributed in x264-compatible ways, like x265's very awesome csv logging. I don't know how easy it would be to find old MCW contributions that never made it into the main branch, but if you can, I bet there's a lot of fun stuff that could be made functional with a couple weeks' work.

Assembly stuff would be a whole lot harder, as it's a lot harder to work with in general. And H.264 itself just doesn't have the same opportunities for really big SIMD functions to speed things up. There are only 4x4 and 8x8 blocks, while HEVC can go from 4x4 to 32x32, including 4x16 and many other power-of-two variations. x265 gets bigger chunks of data to work with at once, and has enormously more mode decisions to make. In a 64x64 block of pixels, x265 has many dozens (hundreds) of different ways it can break down texture units, especially if --amp and --rect are used. For x264, it only needs to figure out whether to use 4x4 or 8x8 for each 16x16 macroblock. And each of those TU's can use one of 33 (IIRC) predictions modes instead of H.264's eight, including whole new mode types like lossless and transform skip.

A big share of x265 assembly optimization is around mode selection, and most of that simply isn't applicable to x264.

I imagine there are some cases that are algorithmically identical to x264 that x265 simply has better optimized implementations of that could be backported. Or x264 subsets of algorithms that could have the x265-only parts of stripped out.