View Single Post
Old 27th May 2022, 11:16   #105  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
AVX1024 is coming in mid of 202x if this civilization will not die too fast. I hope it will have even larger register file size to perform more operations in fastest available memory in chip. AMD chips with at least AVX512 support is also good.

From https://clickthis.blog/en/sluhi-o-pr...e-k-2025-godu/


Currently expected the most of performance boost from AVX2/AVX512 operations strategic redesign of MVtools are internal sub-sample shift and scale for both MAnalyse (onCPU processing) and MDegrain (onCPU processing) for pel 2 and 4. It will greatly reduce memory requirement for pre-calculated super clip and it looks very important for new massive multicore chips and current cache AVS design. Also as core become more and more faster in compare with main host memory it should finally make internal scale/shift processing faster in compare with pre-calculated upscaled planes for pel 2 and 4.

I also interested when intel hardware will provide DX12-ME API. It looks intel UHD Graphics 750 still do not have either drivers designed or required hardware features to expose this interface from MPEG encoder.

"I didn't think about this but it makes perfect sense."

Same is about HFR feature of new video systems - the max possible frame accumulation time with HFR is lower so number of accumulated photons per frame per object's view also lower.

So MDegrainN works as 'secondary level video camera' allowing to extend photon's flux accumulation time over the frame's exposure time in 'first level time sampling video camera'. It is only about real optical video cameras (not about digitally synth images). The more image data carriers (photons) accumulated per object's view - the more precision got. The object's view less distorted (noised) by natural photon's shot noise. So the possible tr value is in best case unlimited. Practically may be limited to typical cutscene duration that is several seconds minimum and with even old 24/25 fps it is about many 10'ths or 100'ths.

So the host RAM-based solution allow much higher tr values in compare with full onAccelerator processing (the max tr multiplied to threads number is very limited by accelerator's memory size to hold all required frames). Only sending current+ref pairs to accelerator to ME-process as it work now. Actually it now upload to accelerator current frame once and only send new refs per one output frame of MDegrainN.

Last edited by DTL; 27th May 2022 at 13:09.
DTL is offline   Reply With Quote