Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
10th December 2021, 08:51 | #1 | Link |
Acid fr0g
Join Date: May 2002
Location: Italy
Posts: 2,823
|
New DirectX 12 Video APIs
Today DirectX 12 provides APIs to support GPU acceleration for several video applications such as Video Decoding, Video Processing and Motion estimation as detailed in Direct3D 12 Video Overview.
Announcing new DirectX 12 feature – Video Encoding! Direct3D video motion estimation I hope it can be usefull for some new plugins or to modernize current ones.
__________________
@turment on Telegram |
10th December 2021, 10:24 | #2 | Link |
Registered User
Join Date: Jul 2018
Posts: 1,209
|
Good news. But how to know which video cards or embedded video accelerator support motion estimation ? Before buying it and testing by special software. Even for developing and debugging it hardware is required because no software emulation ?
|
10th December 2021, 18:17 | #3 | Link | |
Acid fr0g
Join Date: May 2002
Location: Italy
Posts: 2,823
|
Quote:
Beside that, my last programming skills are earlier than Microsoft even invented APIs, on a Bull mainframe and a Fortran77 compiler I think that digging into the APIs, you can find everything.
__________________
@turment on Telegram Last edited by tormento; 10th December 2021 at 18:21. |
|
11th December 2021, 00:09 | #4 | Link |
Registered User
Join Date: Jul 2018
Posts: 1,209
|
"NVIDIA GeForce GTX 10xx "
We have GTX 1060 6 GB accesible at work. Can try to make simple test software to check if motion estimation API available. Windows10 system resources monitor shows Video Encoding performance graph. "HEVC has some sort of motion estimation during encoding." All MPEGs are based on the motion estimation. Only not any hardware encoder have open API for using it by external software. And may be hardware of motion estimation is not accessible by external data input and output - it may be closed MPEG encoder engine. With only uncompressed frames input and MPEG elementary stream output. It was some work of Microsoft DirectX developers with hardware manufacturers about creating and standartizing some API for motion estimation hardware processing (data format, etc) to be accessible via DirectX API. Last edited by DTL; 11th December 2021 at 00:11. |
11th December 2021, 11:14 | #5 | Link |
Registered User
Join Date: Jul 2018
Posts: 1,209
|
The Motion Estimator interface looks like working at GTX 1060 . Size of frame max 4096x4096 and min 32x32. Block size looks like only 8x8 and 16x16.
Testing tool is https://drive.google.com/file/d/1-lK...ew?usp=sharing . It is based on Microsoft D3D12 color triangle 'hello world' sample and try to init motion estimator at startup and emit messageboxes with queried data. 0-result mean S_OK. Works at Win10 build 19043 . Though ID3D12VideoDevice1::CreateVideoMotionEstimator require Minimum supported client Windows 10 Build 20348 Minimum supported server Windows 10 Build 20348 https://docs.microsoft.com/en-us/win...otionestimator Do not know how it corresponds. But the output is only vector field without SAD data. So to feed the MDegrainN the SAD calculation will be needed. It need again read src and ref frames from memory to CPU. Though getting motion vectors field already saves about half of MAnalyse current time (at pel=1 and much more time at qpel). So the possible speedup from using hardware motion estimator is not very great. Or it is need to develop (shader-based ?) the SAD computation with given vector field inside GPU and output separate 'texture' as field of SADs and combine it into format for MDegrainN. So the sort of add-on to mvtools is possible. Need experienced in shader-creation developer ? Unfortunately I even can not set development enviroment at the system with GTX 1060 for debugging - so the possibe development time is undefined. May be need to found the cheapest possble second-hand PCI-E video card with this support and buy it to home system. GTX 1060 is too expensive. Which may be the cheapest ? Last edited by DTL; 11th December 2021 at 11:29. |
11th December 2021, 12:51 | #7 | Link | |
HeartlessS Usurer
Join Date: Dec 2009
Location: Over the rainbow
Posts: 11,023
|
Is Microsoft API the way to go, it kind of cuts out linux a bit.
An Introduction to the NVIDIA Optical Flow SDK:- https://developer.nvidia.com/blog/an...ical-flow-sdk/ Says implemented for Turing+ cards [whatever that means]. Also, Apparently implemented in OpenCV [see Resources]:- https://developer.nvidia.com/opticalflow-sdk EDIT: Seems Turing a lot more recent than my 1070:- https://en.wikipedia.org/wiki/Turing_(microarchitecture) Quote:
__________________
I sometimes post sober. StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace "Some infinities are bigger than other infinities", but how many of them are infinitely bigger ??? Last edited by StainlessS; 11th December 2021 at 13:04. |
|
11th December 2021, 13:16 | #9 | Link |
Registered User
Join Date: Jul 2018
Posts: 1,209
|
List of NVIDIas with encoder support - https://developer.nvidia.com/video-e...ort-matrix-new . Still not know which minimum generation of NVenc will have Motion Estimation API.
May be slowest GTX 750 will work too ? But it is used here also about $66 that is too much for experiments. May be something like $10..15 ? "I don't know where you are " I from big poor cold country. And now the price of its 'wood money' is low. For the price of used GTX 750 I can buy firewood to heat my living construction about half of winter with 2 wood stoves. And now we have exessively cold beginning of December from the beginning of a century - night is as low as -26C 2 times already. Typically it is about 0 Celsium untill beginning of a January. " Turing+ cards [whatever that means]" It looks FAMILY column in that table. But Turing+ mean Ampere only ? That is what enough for mvtools - "NVIDIA GPUs from Maxwell, Pascal, and Volta generations include one or more video encoder (NVENC) engines which provided a mode called Motion-Estimation-only mode. This mode allowed users to run only motion estimation on NVENC and retrieve the resulting motion vectors (MVs).". So even Maxwell 1st Gen FAMILIY and 4th Gen NVenc should support Motion Estimation API ? That is GTX 750 card minimum listed in that table. " it kind of cuts out linux a bit." I think if NVIDIA cards have driver for Linux it have to expose same Motion Estimation API if hardware support. "my 1070" It should support Motion Estimation and have 2 NVENC chips (may be run in parallel and have 2x performance of 1060 ?) . You can run that test software - it have to display 0 return on getting resources (!=0 mean HRESULT error code) and non-zero max/min width/height of frame to process. That executable part for MotionEstimation check is Code:
ComPtr<IDXGIAdapter1> hardwareAdapter; GetHardwareAdapter(factory.Get(), &hardwareAdapter); ThrowIfFailed(D3D12CreateDevice( hardwareAdapter.Get(), D3D_FEATURE_LEVEL_11_0, IID_PPV_ARGS(&m_device) )); ComPtr<ID3D12VideoDevice> vid_dev; HRESULT query_device1 = m_device->QueryInterface(IID_PPV_ARGS(&vid_dev)); char str[2048]; sprintf_s(str, "Query ID3D12VideoDevice return %d\n", query_device1); MessageBoxA(NULL, str, "Test HW motion estimation", MB_OK); D3D12_FEATURE_DATA_VIDEO_MOTION_ESTIMATOR MotionEstimatorSupport = { 0u, DXGI_FORMAT_NV12 }; HRESULT feature_support = vid_dev->CheckFeatureSupport(D3D12_FEATURE_VIDEO_MOTION_ESTIMATOR, &MotionEstimatorSupport, sizeof(MotionEstimatorSupport)); sprintf_s(str, "CheckFeatureSupport return: %d\n", feature_support); MessageBoxA(NULL, str, "Test HW motion estimation", MB_OK); sprintf_s(str, "MEstimator Feature support: DXGI_FORMAT InputFormat %d\n \n", MotionEstimatorSupport.InputFormat); MessageBoxA(NULL, str, "Test HW motion estimation", MB_OK); sprintf_s(str, "MEstimator Feature support: D3D12_VIDEO_MOTION_ESTIMATOR_SEARCH_BLOCK_SIZE_FLAGS BlockSizeFlags %d", MotionEstimatorSupport.BlockSizeFlags); MessageBoxA(NULL, str, "Test HW motion estimation", MB_OK); sprintf_s(str, "MEstimator Feature support: D3D12_VIDEO_MOTION_ESTIMATOR_VECTOR_PRECISION_FLAGS PrecisionFlags %d", MotionEstimatorSupport.PrecisionFlags); MessageBoxA(NULL, str, "Test HW motion estimation", MB_OK); sprintf_s(str, "MEstimator Feature support: D3D12_VIDEO_SIZE_RANGE SizeRange \n MaxW %d MaxH %d MinW %d MinH %d", MotionEstimatorSupport.SizeRange.MaxWidth, \ MotionEstimatorSupport.SizeRange.MaxHeight, MotionEstimatorSupport.SizeRange.MinWidth, MotionEstimatorSupport.SizeRange.MinHeight); MessageBoxA(NULL, str, "Test HW motion estimation", MB_OK); ComPtr<ID3D12VideoDevice1> vid_dev1; HRESULT query_vid_device1 = m_device->QueryInterface(IID_PPV_ARGS(&vid_dev1)); sprintf_s(str, "QueryInterface ID3D12VideoDevice1 return: %d\n", query_vid_device1); MessageBoxA(NULL, str, "Test HW motion estimation", MB_OK); D3D12_VIDEO_MOTION_ESTIMATOR_DESC motionEstimatorDesc = { 0, //NodeIndex DXGI_FORMAT_NV12, D3D12_VIDEO_MOTION_ESTIMATOR_SEARCH_BLOCK_SIZE_8X8, D3D12_VIDEO_MOTION_ESTIMATOR_VECTOR_PRECISION_QUARTER_PEL, {1920, 1080, 1280, 720} // D3D12_VIDEO_SIZE_RANGE }; ComPtr<ID3D12VideoMotionEstimator> spVideoMotionEstimator; HRESULT vid_est = vid_dev1->CreateVideoMotionEstimator( &motionEstimatorDesc, nullptr, IID_PPV_ARGS(&spVideoMotionEstimator)); sprintf_s(str, "ID3D12VideoMotionEstimator return: %d\n", vid_est); MessageBoxA(NULL, str, "Test HW motion estimation", MB_OK); 4096x4096 looks natural limit of 16bit signed output MV format with qpel precision - it is 16384/4. So for pel=1 processing the data need to be /4. Having hardware-accelerated half pel and quaterpel is very valueable because with CPU search they are very and extremely very slow. Last edited by DTL; 11th December 2021 at 13:52. |
11th December 2021, 13:28 | #10 | Link |
Registered User
Join Date: Nov 2009
Posts: 2,367
|
Reading around it seems GTX 1650 Super is the "cheapest" with 7th Gen NVENC. There are some no-SUPER with chips TU116/TU106, but that's hard to know when buying.
__________________
i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread |
11th December 2021, 14:27 | #11 | Link | |||
HeartlessS Usurer
Join Date: Dec 2009
Location: Over the rainbow
Posts: 11,023
|
Quote:
Quote:
On my 1070 Founders edition, with that Testing Tool, I get Code:
Test HW Motion estimation Query ID3D12VideoDevice return 0 CheckFeatureSupport return 0 MEstimator Feature support: DXGI_FORMAT InputFormat 103 MEstimator Feature support: D3D12_VIDEO_MOTION_ESTIMATOR_SEARCH_BLOCK_SIZE_FLAGS BlocksizeFlags 3 MEstimator Feature support: D3D12_VIDEO_MOTION_ESTIMATOR_VECTOR_PRECISION_FLAGS PrecisionFlags 1 MEstimator Feature support: D3D12_VIDEO_SIZE_RANGE SizeRange MaxW 4096 MaxH 4096 MinW 32 MinH 32 Queryinterface ID3D12VideoDevice1 return 0 ID3D12VideoMotionEstimator return 0 Quote:
__________________
I sometimes post sober. StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace "Some infinities are bigger than other infinities", but how many of them are infinitely bigger ??? Last edited by StainlessS; 11th December 2021 at 16:02. |
|||
11th December 2021, 14:56 | #12 | Link | |
Acid fr0g
Join Date: May 2002
Location: Italy
Posts: 2,823
|
Quote:
DTL, perhaps with a CLI version could be easier to paste results. About cheaper cards: the question is not if they exist but if they are supported by drivers. I think the lower supported driver should have WDDM 2.9: WDDM 2.9 in Windows 10 Insider Preview "Iron" will bring support for GPU hardware acceleration to the Windows Subsystem for Linux 2 (WSL 2) and support for feature level 12_2 and HLSL Shader Model 6.6.3.x.
__________________
@turment on Telegram Last edited by tormento; 11th December 2021 at 15:01. |
|
11th December 2021, 15:00 | #13 | Link |
Registered User
Join Date: Jul 2018
Posts: 1,209
|
"MEstimator Feature support: D3D12_VIDEO_SIZE_RANGE SizeRange MaxW 4096 MaxH 4096 MinW 32 MinH 32
Queryinterface ID3D12VideoDevice1 return 0 ID3D12VideoMotionEstimator return 0" It works OK as expected. We need a hero to try to make version of MAnalyse with this API . Also good to found some 'simplest' card Maxwell (1st Gen) and check too. The announcement of HW encoding of https://devblogs.microsoft.com/direc...ideo-encoding/ looks like about HEVC high quality encoding. But Motion Estimation API is documented at Microsoft site from the middle of 2021 (or may be beginning - Direct3D video motion estimation Article 02/05/2021). Also at the reddit site there is a comments - the very old NVenc were worse in the mpeg encoding quality in compare with newer (Turing and Turing+), but as documented should support Motion Estimating and hopefully not (significantly) worse in compare with newer chips. " perhaps with a CLI version could be easier to paste results." Unfortunately printf() function do not output text from that program so the use of MessageBox() was fastest way to create portable tool. Or need to sit and search why printf() do not output text to CLI. May be something like 'console support' not connected/disabled/etc. The base of the program were taken from 'hello-world d3d12' sample of https://github.com/microsoft/DirectX...HelloWorld/src Last edited by DTL; 11th December 2021 at 15:13. |
11th December 2021, 16:01 | #14 | Link | |
HeartlessS Usurer
Join Date: Dec 2009
Location: Over the rainbow
Posts: 11,023
|
Code:
Query ID3D12~VideoDevice return 0 EDIT: Quote:
and would have major problems just understanding mvtools code.
__________________
I sometimes post sober. StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace "Some infinities are bigger than other infinities", but how many of them are infinitely bigger ??? Last edited by StainlessS; 11th December 2021 at 16:10. |
|
11th December 2021, 16:33 | #15 | Link |
Registered User
Join Date: Jul 2018
Posts: 1,209
|
"just understanding mvtools code."
The simplest version is easy enough - just take 2 planes (ref and src) at GetFrame of MAnalyse - convert plane to the required DXGI_FORMAT_NV12 (or may be try to feed MAnalyse with this format if Avisynth can do this conversion or may be use some CUDA/DX conversion after loading YV12 or..). upload to GPU mem and load and execute that ME program. Then download (read back to host mem) the result MV buffer - set it as level1 predictor result and perform 'SAD check' only SearchMVs processing in PlaneofBlocks once for level0 plane to get SADs of the MVs do not use any more predictors or refining (close or equal to PredictorType=3 already in new MAnalyse). And return result to MDegrain(). The 'second generation' is faster but more advanced - try to perform SAD check inside DirectX-GPU and load back to host memory 2 arrays - MVs and SADs and interleave to feed back to MDegrainN. Third-generation - perform MDegrainN inside GPU (possibly no sense because not enough GPU memory typical onboard for large enough tr, though may work for MDegrain1,2,3 (6?) and not large frame sizes). In 'scalar' SAD function version the PT=3 PseudoEPZ_search is very simple as Code:
template<typename pixel_t> void PlaneOfBlocks::PseudoEPZSearch_no_refine(WorkingArea& workarea) // no refine - only predictor check { typedef typename std::conditional < sizeof(pixel_t) == 1, sad_t, bigsad_t >::type safe_sad_t; sad_t sad; if (smallestPlane) // never get here - normal use is sequence of params with 'real' search like optPredictorsType="3,x" where x < 3. { workarea.bestMV = zeroMV; workarea.nMinCost = verybigSAD + 1; } else { workarea.bestMV = workarea.predictor; // already ClipMV() processed in the search_mv_slice // only recalculate sad for interpolated predictor to be compatible with old/typical thSAD setting in MDegrain sad_t sad = LumaSAD<pixel_t>(workarea, GetRefBlock(workarea, workarea.bestMV.x, workarea.bestMV.y)); sad_t saduv = (chroma) ? ScaleSadChroma(SADCHROMA(workarea.pSrc[1], nSrcPitch[1], GetRefBlockU(workarea, workarea.bestMV.x, workarea.bestMV.y), nRefPitch[1]) + SADCHROMA(workarea.pSrc[2], nSrcPitch[2], GetRefBlockV(workarea, workarea.bestMV.x, workarea.bestMV.y), nRefPitch[2]), effective_chromaSADscale) : 0; workarea.bestMV.sad = sad + saduv; } // we store the result vectors[workarea.blkIdx].x = workarea.bestMV.x; vectors[workarea.blkIdx].y = workarea.bestMV.y; vectors[workarea.blkIdx].sad = workarea.bestMV.sad; workarea.planeSAD += workarea.bestMV.sad; // for debug, plus fixme outer planeSAD is not used } Last edited by DTL; 11th December 2021 at 18:57. |
11th December 2021, 19:45 | #16 | Link |
Acid fr0g
Join Date: May 2002
Location: Italy
Posts: 2,823
|
When resizable bar will be widespread enough, cpu or gpu memory will have less or no impact at all, given DDR5 bandwidth and lower latencies on transfer between the two kinds.
__________________
@turment on Telegram |
11th December 2021, 23:32 | #17 | Link |
Registered User
Join Date: Jul 2018
Posts: 1,209
|
Well - the better for speed is 'gen 1.5' way: Do not calculate SAD in MAnalyse and output only MVs to re-writtten version of MDegrainN. Because it is faster to check SAD in MDegrainN at time of weight calculating. It will reduce read traffic from memory to about 1/2 and increase speed to about 2x.
But special 'compatibility' mode in MAnalyse with standard mv output (mv + sad) still may be required for other functions using MAnalyse output. The cheapest possible Maxwell-based card is looks GTX 745 - low profile OEM card with slow memory in compare with GTX 750 and lower performance. I start to bargaining it secondhand from about $33 here. And may be will test for ME feature support next week - one seller provide testing capability before purchasing. Though it is an idea to test 'remote debugging mode' - if it is not too dangerous to install 'remote debugger' at system with GTX 1060 it will be very helpful. Still not have experience with remote debugging of applications via IP. Last edited by DTL; 11th December 2021 at 23:46. |
12th December 2021, 09:28 | #18 | Link |
Acid fr0g
Join Date: May 2002
Location: Italy
Posts: 2,823
|
If memory helps me, it has CUDA too. I have heard that the CUDA toolkit converts almost flawlessly from C++ to CUDA instruction. Perhaps you could give a look at it too.
__________________
@turment on Telegram |
13th December 2021, 19:28 | #19 | Link | |
Registered User
Join Date: Jan 2014
Posts: 2,330
|
Quote:
https://github.com/pinterf/AviSynthC...r/KTGMC/MV.cpp And the code actually in CUDA https://github.com/pinterf/AviSynthC...MC/MVKernel.cu with MDegrainN https://github.com/pinterf/AviSynthC...ernel.cu#L2600 |
|
13th December 2021, 19:37 | #20 | Link |
Registered User
Join Date: Jul 2018
Posts: 1,209
|
I see the remote debugging is simple enough and not required to setup awful software like visual studio on remote system. So will try to start adding hardware dx12-me to MAnalyse this week at work.
Anyway even with infinite speed me engine it is required to re-write MDegrainN from scattering received mvs array to supplemental Fake* structure because it uses almost same time as MDegrainN processing at Core2 Duo E7500. Though the first version will mostly probably uses standard old MDegrainN. It may be inheritance from old past times when MDegrain was interconnected with motion search in one filter and the later developers simply put the received mvs via AVS file to Fake structure to keep working MDegrain without rewriting. And with typically very slow MAnalyse the speed penalty was not very visible. But nowdays with fast MAnalyse modes and hardware-accelerated MAnalyse it really visible speedlimit (in the MVClip::Update()) and need to be removed. MDegrainN processing should access the received MVs array directly. Last edited by DTL; 13th December 2021 at 19:41. |
Thread Tools | Search this Thread |
Display Modes | |
|
|