Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 10th December 2021, 08:51   #1  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,542
New DirectX 12 Video APIs

Today DirectX 12 provides APIs to support GPU acceleration for several video applications such as Video Decoding, Video Processing and Motion estimation as detailed in Direct3D 12 Video Overview.

Announcing new DirectX 12 feature – Video Encoding!

Direct3D video motion estimation

I hope it can be usefull for some new plugins or to modernize current ones.
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 10th December 2021, 10:24   #2  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
Good news. But how to know which video cards or embedded video accelerator support motion estimation ? Before buying it and testing by special software. Even for developing and debugging it hardware is required because no software emulation ?
DTL is offline   Reply With Quote
Old 10th December 2021, 18:17   #3  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,542
Quote:
Originally Posted by DTL View Post
Good news. But how to know which video cards or embedded video accelerator support motion estimation?
I think that they are the same listed for video encoding API support:
  • AMD Radeon RX 5000 series or greater, Ryzen 2xxxx series or greater
  • Intel Tiger Lake, Ice Lake, Alder Lake (from early 2022)
  • NVIDIA GeForce GTX 10xx and above, GeForce RTX 20xx and above, Quadro RTX NVIDIA RTX
AFAIK HEVC has some sort of motion estimation during encoding.

Beside that, my last programming skills are earlier than Microsoft even invented APIs, on a Bull mainframe and a Fortran77 compiler

I think that digging into the APIs, you can find everything.
__________________
@turment on Telegram

Last edited by tormento; 10th December 2021 at 18:21.
tormento is offline   Reply With Quote
Old 11th December 2021, 00:09   #4  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
"NVIDIA GeForce GTX 10xx "

We have GTX 1060 6 GB accesible at work. Can try to make simple test software to check if motion estimation API available. Windows10 system resources monitor shows Video Encoding performance graph.

"HEVC has some sort of motion estimation during encoding."

All MPEGs are based on the motion estimation. Only not any hardware encoder have open API for using it by external software. And may be hardware of motion estimation is not accessible by external data input and output - it may be closed MPEG encoder engine. With only uncompressed frames input and MPEG elementary stream output. It was some work of Microsoft DirectX developers with hardware manufacturers about creating and standartizing some API for motion estimation hardware processing (data format, etc) to be accessible via DirectX API.

Last edited by DTL; 11th December 2021 at 00:11.
DTL is offline   Reply With Quote
Old 11th December 2021, 11:14   #5  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
The Motion Estimator interface looks like working at GTX 1060 . Size of frame max 4096x4096 and min 32x32. Block size looks like only 8x8 and 16x16.

Testing tool is https://drive.google.com/file/d/1-lK...ew?usp=sharing . It is based on Microsoft D3D12 color triangle 'hello world' sample and try to init motion estimator at startup and emit messageboxes with queried data. 0-result mean S_OK. Works at Win10 build 19043 . Though ID3D12VideoDevice1::CreateVideoMotionEstimator require
Minimum supported client Windows 10 Build 20348
Minimum supported server Windows 10 Build 20348
https://docs.microsoft.com/en-us/win...otionestimator
Do not know how it corresponds.

But the output is only vector field without SAD data. So to feed the MDegrainN the SAD calculation will be needed. It need again read src and ref frames from memory to CPU. Though getting motion vectors field already saves about half of MAnalyse current time (at pel=1 and much more time at qpel). So the possible speedup from using hardware motion estimator is not very great. Or it is need to develop (shader-based ?) the SAD computation with given vector field inside GPU and output separate 'texture' as field of SADs and combine it into format for MDegrainN.

So the sort of add-on to mvtools is possible. Need experienced in shader-creation developer ?

Unfortunately I even can not set development enviroment at the system with GTX 1060 for debugging - so the possibe development time is undefined. May be need to found the cheapest possble second-hand PCI-E video card with this support and buy it to home system. GTX 1060 is too expensive. Which may be the cheapest ?

Last edited by DTL; 11th December 2021 at 11:29.
DTL is offline   Reply With Quote
Old 11th December 2021, 12:50   #6  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,542
Quote:
Originally Posted by DTL View Post
Which may be the cheapest ?
A used one. I don't know where you are from but on eBay you can have one with ~100€.
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 11th December 2021, 12:51   #7  |  Link
StainlessS
HeartlessS Usurer
 
StainlessS's Avatar
 
Join Date: Dec 2009
Location: Over the rainbow
Posts: 10,980
Is Microsoft API the way to go, it kind of cuts out linux a bit.

An Introduction to the NVIDIA Optical Flow SDK:- https://developer.nvidia.com/blog/an...ical-flow-sdk/
Says implemented for Turing+ cards [whatever that means].

Also, Apparently implemented in OpenCV [see Resources]:- https://developer.nvidia.com/opticalflow-sdk

EDIT: Seems Turing a lot more recent than my 1070:- https://en.wikipedia.org/wiki/Turing_(microarchitecture)
Quote:
Products using Turing

GeForce 16 series
GeForce GTX 1650
GeForce GTX 1650 (Mobile)
GeForce GTX 1650 Max-Q (Mobile)
GeForce GTX 1650 (GDDR6)
GeForce GTX 1650 Super
GeForce GTX 1650 Ti (Mobile)
GeForce GTX 1660
GeForce GTX 1660 (Mobile)
GeForce GTX 1660 Super
GeForce GTX 1660 Ti
GeForce GTX 1660 Ti (Mobile)
GeForce GTX 1660 Ti Max-Q (Mobile)
GeForce 20 series
GeForce RTX 2060
GeForce RTX 2060 12GB
GeForce RTX 2060 (Mobile)
GeForce RTX 2060 Max-Q (Mobile)
GeForce RTX 2060 Super
GeForce RTX 2060 Super (Mobile)
GeForce RTX 2070
GeForce RTX 2070 (Mobile)
GeForce RTX 2070 Max-Q (Mobile)
GeForce RTX 2070 Max-Q Refresh (Mobile)
GeForce RTX 2070 Super
GeForce RTX 2070 Super (Mobile)
GeForce RTX 2070 Super Max-Q (Mobile)
GeForce RTX 2080
GeForce RTX 2080 (Mobile)
GeForce RTX 2080 Max-Q (Mobile)
GeForce RTX 2080 Super
GeForce RTX 2080 Super (Mobile)
GeForce RTX 2080 Super Max-Q (Mobile)
GeForce RTX 2080 Ti
Titan RTX
Nvidia Quadro
Quadro RTX 3000 (Mobile)
Quadro RTX 4000
Quadro RTX 5000
Quadro RTX 6000
Quadro RTX 8000
Nvidia Tesla
Tesla T4
__________________
I sometimes post sober.
StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace

"Some infinities are bigger than other infinities", but how many of them are infinitely bigger ???

Last edited by StainlessS; 11th December 2021 at 13:04.
StainlessS is offline   Reply With Quote
Old 11th December 2021, 12:55   #8  |  Link
kedautinh12
Registered User
 
Join Date: Jan 2018
Posts: 2,153
I think GTX 10xx very cheap now if it's compaire with latest gen of Nvidia GPU. Just compaire powerful and price between them
kedautinh12 is offline   Reply With Quote
Old 11th December 2021, 13:16   #9  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
List of NVIDIas with encoder support - https://developer.nvidia.com/video-e...ort-matrix-new . Still not know which minimum generation of NVenc will have Motion Estimation API.
May be slowest GTX 750 will work too ? But it is used here also about $66 that is too much for experiments. May be something like $10..15 ?

"I don't know where you are "

I from big poor cold country. And now the price of its 'wood money' is low. For the price of used GTX 750 I can buy firewood to heat my living construction about half of winter with 2 wood stoves. And now we have exessively cold beginning of December from the beginning of a century - night is as low as -26C 2 times already. Typically it is about 0 Celsium untill beginning of a January.

" Turing+ cards [whatever that means]"

It looks FAMILY column in that table. But Turing+ mean Ampere only ?

That is what enough for mvtools - "NVIDIA GPUs from Maxwell, Pascal, and Volta generations include one or more video encoder (NVENC) engines which provided a mode called Motion-Estimation-only mode. This mode allowed users to run only motion estimation on NVENC and retrieve the resulting motion vectors (MVs).".

So even Maxwell 1st Gen FAMILIY and 4th Gen NVenc should support Motion Estimation API ? That is GTX 750 card minimum listed in that table.

" it kind of cuts out linux a bit."

I think if NVIDIA cards have driver for Linux it have to expose same Motion Estimation API if hardware support.

"my 1070"

It should support Motion Estimation and have 2 NVENC chips (may be run in parallel and have 2x performance of 1060 ?) . You can run that test software - it have to display 0 return on getting resources (!=0 mean HRESULT error code) and non-zero max/min width/height of frame to process.

That executable part for MotionEstimation check is
Code:
        ComPtr<IDXGIAdapter1> hardwareAdapter;
        GetHardwareAdapter(factory.Get(), &hardwareAdapter);

        ThrowIfFailed(D3D12CreateDevice(
            hardwareAdapter.Get(),
            D3D_FEATURE_LEVEL_11_0,
            IID_PPV_ARGS(&m_device)
            ));
   

	ComPtr<ID3D12VideoDevice> vid_dev;

	HRESULT query_device1 = m_device->QueryInterface(IID_PPV_ARGS(&vid_dev));

	char str[2048];
	sprintf_s(str, "Query ID3D12VideoDevice return %d\n", query_device1);
	MessageBoxA(NULL, str, "Test HW motion estimation", MB_OK);

	D3D12_FEATURE_DATA_VIDEO_MOTION_ESTIMATOR MotionEstimatorSupport = { 0u, DXGI_FORMAT_NV12 };
	HRESULT feature_support = vid_dev->CheckFeatureSupport(D3D12_FEATURE_VIDEO_MOTION_ESTIMATOR, &MotionEstimatorSupport, sizeof(MotionEstimatorSupport));

	sprintf_s(str, "CheckFeatureSupport return: %d\n", feature_support);
	MessageBoxA(NULL, str, "Test HW motion estimation", MB_OK);

	sprintf_s(str, "MEstimator Feature support: DXGI_FORMAT InputFormat %d\n \n", MotionEstimatorSupport.InputFormat);
	MessageBoxA(NULL, str, "Test HW motion estimation", MB_OK);

	sprintf_s(str, "MEstimator Feature support:  D3D12_VIDEO_MOTION_ESTIMATOR_SEARCH_BLOCK_SIZE_FLAGS BlockSizeFlags %d", MotionEstimatorSupport.BlockSizeFlags);
	MessageBoxA(NULL, str, "Test HW motion estimation", MB_OK);

	sprintf_s(str, "MEstimator Feature support:  D3D12_VIDEO_MOTION_ESTIMATOR_VECTOR_PRECISION_FLAGS PrecisionFlags %d", MotionEstimatorSupport.PrecisionFlags);
	MessageBoxA(NULL, str, "Test HW motion estimation", MB_OK);

	sprintf_s(str, "MEstimator Feature support:  D3D12_VIDEO_SIZE_RANGE SizeRange \n MaxW %d  MaxH %d MinW %d MinH %d", MotionEstimatorSupport.SizeRange.MaxWidth, \
		MotionEstimatorSupport.SizeRange.MaxHeight, MotionEstimatorSupport.SizeRange.MinWidth, MotionEstimatorSupport.SizeRange.MinHeight);
	MessageBoxA(NULL, str, "Test HW motion estimation", MB_OK);


	ComPtr<ID3D12VideoDevice1> vid_dev1;

	HRESULT query_vid_device1 = m_device->QueryInterface(IID_PPV_ARGS(&vid_dev1));

	sprintf_s(str, "QueryInterface ID3D12VideoDevice1 return: %d\n", query_vid_device1);
	MessageBoxA(NULL, str, "Test HW motion estimation", MB_OK);


	D3D12_VIDEO_MOTION_ESTIMATOR_DESC motionEstimatorDesc = {
	0, //NodeIndex
	DXGI_FORMAT_NV12,
	D3D12_VIDEO_MOTION_ESTIMATOR_SEARCH_BLOCK_SIZE_8X8,
	D3D12_VIDEO_MOTION_ESTIMATOR_VECTOR_PRECISION_QUARTER_PEL,
	{1920, 1080, 1280, 720} // D3D12_VIDEO_SIZE_RANGE
	};

	ComPtr<ID3D12VideoMotionEstimator> spVideoMotionEstimator;
	HRESULT vid_est = vid_dev1->CreateVideoMotionEstimator(
		&motionEstimatorDesc,
		nullptr,
		IID_PPV_ARGS(&spVideoMotionEstimator));

	sprintf_s(str, "ID3D12VideoMotionEstimator return: %d\n", vid_est);
	MessageBoxA(NULL, str, "Test HW motion estimation", MB_OK);

4096x4096 looks natural limit of 16bit signed output MV format with qpel precision - it is 16384/4. So for pel=1 processing the data need to be /4. Having hardware-accelerated half pel and quaterpel is very valueable because with CPU search they are very and extremely very slow.

Last edited by DTL; 11th December 2021 at 13:52.
DTL is offline   Reply With Quote
Old 11th December 2021, 13:28   #10  |  Link
Dogway
Registered User
 
Join Date: Nov 2009
Posts: 2,352
Reading around it seems GTX 1650 Super is the "cheapest" with 7th Gen NVENC. There are some no-SUPER with chips TU116/TU106, but that's hard to know when buying.
__________________
i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread
Dogway is offline   Reply With Quote
Old 11th December 2021, 14:27   #11  |  Link
StainlessS
HeartlessS Usurer
 
StainlessS's Avatar
 
Join Date: Dec 2009
Location: Over the rainbow
Posts: 10,980
Quote:
It looks FAMILY column in that table. But Turing+ mean Ampere only ?
by turing+, I meant turing and later.

Quote:
I think if NVIDIA cards have driver for Linux it have to expose same Motion Estimation API if hardware support.
Lovely, thanks.

On my 1070 Founders edition, with that Testing Tool, I get

Code:
Test HW Motion estimation

Query ID3D12VideoDevice return 0
CheckFeatureSupport return 0
MEstimator Feature support: DXGI_FORMAT InputFormat 103
MEstimator Feature support: D3D12_VIDEO_MOTION_ESTIMATOR_SEARCH_BLOCK_SIZE_FLAGS BlocksizeFlags 3
MEstimator Feature support: D3D12_VIDEO_MOTION_ESTIMATOR_VECTOR_PRECISION_FLAGS PrecisionFlags 1
MEstimator Feature support: D3D12_VIDEO_SIZE_RANGE SizeRange MaxW 4096 MaxH 4096 MinW 32 MinH 32
Queryinterface ID3D12VideoDevice1 return 0
ID3D12VideoMotionEstimator return 0
EDIT: Typo tilde removed from above.

Quote:
night is as low as -26C 2 times already.
WOW, cold enough to freeze the balls off a brass monkey.
__________________
I sometimes post sober.
StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace

"Some infinities are bigger than other infinities", but how many of them are infinitely bigger ???

Last edited by StainlessS; 11th December 2021 at 16:02.
StainlessS is offline   Reply With Quote
Old 11th December 2021, 14:56   #12  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,542
Quote:
Originally Posted by StainlessS View Post
Code:
Test HW Motion estimation

Query ID3D12~VideoDevice return 0
CheckFeatureSupport return 0
MEstimator Feature support: DXGI_FORMAT InputFormat 103
MEstimator Feature support: D3D12_VIDEO_MOTION_ESTIMATOR_SEARCH_BLOCK_SIZE_FLAGS BlocksizeFlags 3
MEstimator Feature support: D3D12_VIDEO_MOTION_ESTIMATOR_VECTOR_PRECISION_FLAGS PrecisionFlags 1
MEstimator Feature support: D3D12_VIDEO_SIZE_RANGE SizeRange MaxW 4096 MaxH 4096 MinW 32 MinH 32
Queryinterface ID3D12VideoDevice1 return 0
ID3D12VideoMotionEstimator return 0
Same here on a 1060 3GB.

DTL, perhaps with a CLI version could be easier to paste results. About cheaper cards: the question is not if they exist but if they are supported by drivers.

I think the lower supported driver should have WDDM 2.9: WDDM 2.9 in Windows 10 Insider Preview "Iron" will bring support for GPU hardware acceleration to the Windows Subsystem for Linux 2 (WSL 2) and support for feature level 12_2 and HLSL Shader Model 6.6.3.x.
__________________
@turment on Telegram

Last edited by tormento; 11th December 2021 at 15:01.
tormento is offline   Reply With Quote
Old 11th December 2021, 15:00   #13  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
"MEstimator Feature support: D3D12_VIDEO_SIZE_RANGE SizeRange MaxW 4096 MaxH 4096 MinW 32 MinH 32
Queryinterface ID3D12VideoDevice1 return 0
ID3D12VideoMotionEstimator return 0"

It works OK as expected. We need a hero to try to make version of MAnalyse with this API . Also good to found some 'simplest' card Maxwell (1st Gen) and check too.

The announcement of HW encoding of https://devblogs.microsoft.com/direc...ideo-encoding/ looks like about HEVC high quality encoding. But Motion Estimation API is documented at Microsoft site from the middle of 2021 (or may be beginning - Direct3D video motion estimation Article 02/05/2021). Also at the reddit site there is a comments - the very old NVenc were worse in the mpeg encoding quality in compare with newer (Turing and Turing+), but as documented should support Motion Estimating and hopefully not (significantly) worse in compare with newer chips.

" perhaps with a CLI version could be easier to paste results."

Unfortunately printf() function do not output text from that program so the use of MessageBox() was fastest way to create portable tool. Or need to sit and search why printf() do not output text to CLI. May be something like 'console support' not connected/disabled/etc.

The base of the program were taken from 'hello-world d3d12' sample of https://github.com/microsoft/DirectX...HelloWorld/src

Last edited by DTL; 11th December 2021 at 15:13.
DTL is offline   Reply With Quote
Old 11th December 2021, 16:01   #14  |  Link
StainlessS
HeartlessS Usurer
 
StainlessS's Avatar
 
Join Date: Dec 2009
Location: Over the rainbow
Posts: 10,980
Code:
Query ID3D12~VideoDevice return 0
just small note, that tilde [~] is typo, should not be there.

EDIT:
Quote:
We need a hero to try to make version of MAnalyse with this API
That would be way too heroic for me I'm afraid, not much of a CPP programmer, and almost no Windows or DirectX experience,
and would have major problems just understanding mvtools code.
__________________
I sometimes post sober.
StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace

"Some infinities are bigger than other infinities", but how many of them are infinitely bigger ???

Last edited by StainlessS; 11th December 2021 at 16:10.
StainlessS is offline   Reply With Quote
Old 11th December 2021, 16:33   #15  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
"just understanding mvtools code."

The simplest version is easy enough - just take 2 planes (ref and src) at GetFrame of MAnalyse - convert plane to the required DXGI_FORMAT_NV12 (or may be try to feed MAnalyse with this format if Avisynth can do this conversion or may be use some CUDA/DX conversion after loading YV12 or..). upload to GPU mem and load and execute that ME program. Then download (read back to host mem) the result MV buffer - set it as level1 predictor result and perform 'SAD check' only SearchMVs processing in PlaneofBlocks once for level0 plane to get SADs of the MVs do not use any more predictors or refining (close or equal to PredictorType=3 already in new MAnalyse). And return result to MDegrain().

The 'second generation' is faster but more advanced - try to perform SAD check inside DirectX-GPU and load back to host memory 2 arrays - MVs and SADs and interleave to feed back to MDegrainN.

Third-generation - perform MDegrainN inside GPU (possibly no sense because not enough GPU memory typical onboard for large enough tr, though may work for MDegrain1,2,3 (6?) and not large frame sizes).

In 'scalar' SAD function version the PT=3 PseudoEPZ_search is very simple as
Code:
template<typename pixel_t>
void PlaneOfBlocks::PseudoEPZSearch_no_refine(WorkingArea& workarea) // no refine - only predictor check
{
  typedef typename std::conditional < sizeof(pixel_t) == 1, sad_t, bigsad_t >::type safe_sad_t;

  sad_t sad;

  if (smallestPlane) // never get here - normal use is sequence of params with 'real' search like optPredictorsType="3,x" where x < 3.
  {
    workarea.bestMV = zeroMV;
    workarea.nMinCost = verybigSAD + 1;
  }
  else
  {
    workarea.bestMV = workarea.predictor; // already ClipMV() processed in the search_mv_slice
    // only recalculate sad for interpolated predictor to be compatible with old/typical thSAD setting in MDegrain
      sad_t sad = LumaSAD<pixel_t>(workarea, GetRefBlock(workarea, workarea.bestMV.x, workarea.bestMV.y));
      sad_t saduv = (chroma) ? ScaleSadChroma(SADCHROMA(workarea.pSrc[1], nSrcPitch[1], GetRefBlockU(workarea, workarea.bestMV.x, workarea.bestMV.y), nRefPitch[1])
        + SADCHROMA(workarea.pSrc[2], nSrcPitch[2], GetRefBlockV(workarea, workarea.bestMV.x, workarea.bestMV.y), nRefPitch[2]), effective_chromaSADscale) : 0;
      workarea.bestMV.sad = sad + saduv;
  }


  // we store the result
  vectors[workarea.blkIdx].x = workarea.bestMV.x;
  vectors[workarea.blkIdx].y = workarea.bestMV.y;
  vectors[workarea.blkIdx].sad = workarea.bestMV.sad;

  workarea.planeSAD += workarea.bestMV.sad; // for debug, plus fixme outer planeSAD is not used
}
For 'multi-block' SIMD processing (faster) it is a bit more complex - first got length of coherency in vector stream and call the appropriate multi-block SAD function from the list of available for current CPU SIMD architecture or pass a mask of coherent blocks for each vector offset value.

Last edited by DTL; 11th December 2021 at 18:57.
DTL is offline   Reply With Quote
Old 11th December 2021, 19:45   #16  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,542
Quote:
Originally Posted by DTL View Post
possibly no sense because not enough GPU memory typical onboard for large enough tr, though may work for MDegrain1,2,3 (6?) and not large frame sizes)
When resizable bar will be widespread enough, cpu or gpu memory will have less or no impact at all, given DDR5 bandwidth and lower latencies on transfer between the two kinds.
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 11th December 2021, 23:32   #17  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
Well - the better for speed is 'gen 1.5' way: Do not calculate SAD in MAnalyse and output only MVs to re-writtten version of MDegrainN. Because it is faster to check SAD in MDegrainN at time of weight calculating. It will reduce read traffic from memory to about 1/2 and increase speed to about 2x.

But special 'compatibility' mode in MAnalyse with standard mv output (mv + sad) still may be required for other functions using MAnalyse output.

The cheapest possible Maxwell-based card is looks GTX 745 - low profile OEM card with slow memory in compare with GTX 750 and lower performance. I start to bargaining it secondhand from about $33 here. And may be will test for ME feature support next week - one seller provide testing capability before purchasing.

Though it is an idea to test 'remote debugging mode' - if it is not too dangerous to install 'remote debugger' at system with GTX 1060 it will be very helpful. Still not have experience with remote debugging of applications via IP.

Last edited by DTL; 11th December 2021 at 23:46.
DTL is offline   Reply With Quote
Old 12th December 2021, 09:28   #18  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,542
Quote:
Originally Posted by DTL View Post
And may be will test for ME feature support next week - one seller provide testing capability before purchasing.
If memory helps me, it has CUDA too. I have heard that the CUDA toolkit converts almost flawlessly from C++ to CUDA instruction. Perhaps you could give a look at it too.
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 13th December 2021, 19:28   #19  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,309
Quote:
Originally Posted by tormento View Post
If memory helps me, it has CUDA too. I have heard that the CUDA toolkit converts almost flawlessly from C++ to CUDA instruction. Perhaps you could give a look at it too.
Like here in Nekopanda's CUDA Mvtools light
https://github.com/pinterf/AviSynthC...r/KTGMC/MV.cpp
And the code actually in CUDA
https://github.com/pinterf/AviSynthC...MC/MVKernel.cu
with MDegrainN
https://github.com/pinterf/AviSynthC...ernel.cu#L2600
pinterf is offline   Reply With Quote
Old 13th December 2021, 19:37   #20  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
I see the remote debugging is simple enough and not required to setup awful software like visual studio on remote system. So will try to start adding hardware dx12-me to MAnalyse this week at work.

Anyway even with infinite speed me engine it is required to re-write MDegrainN from scattering received mvs array to supplemental Fake* structure because it uses almost same time as MDegrainN processing at Core2 Duo E7500. Though the first version will mostly probably uses standard old MDegrainN.

It may be inheritance from old past times when MDegrain was interconnected with motion search in one filter and the later developers simply put the received mvs via AVS file to Fake structure to keep working MDegrain without rewriting. And with typically very slow MAnalyse the speed penalty was not very visible. But nowdays with fast MAnalyse modes and hardware-accelerated MAnalyse it really visible speedlimit (in the MVClip::Update()) and need to be removed. MDegrainN processing should access the received MVs array directly.

Last edited by DTL; 13th December 2021 at 19:41.
DTL is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 11:06.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.