View Full Version : L-SMASH Source
Sparktank
19th September 2019, 02:39
Sure, let's do both. But we should all use the same stream.
Ooh, this sounds fun.
new thread? get everyone involved.
GTX 1060 3GB checking in.
Sparktank
19th September 2019, 02:39
Bonus x265
https://i.imgsafe.org/28/286ce86c97.png
Also, how do you make these graphs?
MeteorRain
19th September 2019, 03:59
Which downscaler should I use? I can make tests. I prefer spline64 for 4k to 1080p, but maybe there are something better?
Has anyone tried GPU resizer from DGSource?
DJATOM
19th September 2019, 11:04
I've tried it for downscaling of some movie and it was fine. Didn't tested on anime.
Atak_Snajpera
19th September 2019, 14:04
Ah, data. Sweet.
If you make available that 10-minute source clip I'd like to try this with a high-end nVidia GPU (2080 Ti) and DGSource() under both Avisynth+ and Vapoursynth (native). Can you please do that for us? Thank you. If you need access to my FTP just send me a PM.
DJATOM says he can't duplicate your effect so it would be nice to get some other data points.
Uploading in progress... Will take probably few hours.
New test with added very useful MDegrain2 (can reduce file size in Constant Quality by factor of 1.7x). I did only 4 tests because it takes way to much time!
ScriptSW.avs
LoadPlugin("C:\Program Files (x86)\RipBot264\Tools\AviSynth plugins\lsmash\LSMASHSource.dll")
video=LWLibavVideoSource("C:\Temp\Video.mkv",cachefile="C:\Temp\Video.mkv.lwi",prefer_hw=0)
Loadplugin("C:\Program Files (x86)\RipBot264\Tools\AviSynth plugins\mvtools\mvtools2.dll")
super=MSuper(video,pel=2)
fv1=MAnalyse(super,isb=false,delta=1,overlap=4)
bv1=MAnalyse(super,isb=true,delta=1,overlap=4)
fv2=MAnalyse(super,isb=false,delta=2,overlap=4)
bv2=MAnalyse(super,isb=true,delta=2,overlap=4)
video=MDegrain2(video,super,bv1,fv1,bv2,fv2,thSAD=400)
video=Prefetch(video,4)
return video
ScriptHW.avs
LoadPlugin("C:\Program Files (x86)\RipBot264\Tools\AviSynth plugins\lsmash\LSMASHSource.dll")
video=LWLibavVideoSource("C:\Temp\Video.mkv",cachefile="C:\Temp\Video.mkv.lwi",prefer_hw=1)
Loadplugin("C:\Program Files (x86)\RipBot264\Tools\AviSynth plugins\mvtools\mvtools2.dll")
super=MSuper(video,pel=2)
fv1=MAnalyse(super,isb=false,delta=1,overlap=4)
bv1=MAnalyse(super,isb=true,delta=1,overlap=4)
fv2=MAnalyse(super,isb=false,delta=2,overlap=4)
bv2=MAnalyse(super,isb=true,delta=2,overlap=4)
video=MDegrain2(video,super,bv1,fv1,bv2,fv2,thSAD=400)
video=Prefetch(video,4)
return video
https://i.imgsafe.org/37/37bfdbdafa.png
videoh
19th September 2019, 14:07
Atak,
Any chance you can share your stream as I requested?
Atak_Snajpera
19th September 2019, 14:13
Also, how do you make these graphs?
LibreOffice calc
Atak_Snajpera
19th September 2019, 14:14
Atak,
Any chance you can share your stream as I requested?
It is being uploaded right now to mediafire...will take few hours.
poisondeathray
19th September 2019, 15:04
What was the "proper" way to MT GPU filters ?
I remember some issue in the KNLMeans discussion, or videoh mentioning something , but I can't find the posts
If you test just the source filter alone with cuvid, it seems abnormally slow , regardless of the filter MT Mode or prefetch >1 value . Thrashing or something . But seems ok in vapoursynth with native threading
Groucho2004
19th September 2019, 15:27
What was the "proper" way to MT GPU filters ?There is none. There is no speed benefit and it just eats more graphics memory. Modern cards have their own "multi-threading".
Atak_Snajpera
19th September 2019, 15:38
What was the "proper" way to MT GPU filters ?
I remember some issue in the KNLMeans discussion, or videoh mentioning something , but I can't find the posts
If you test just the source filter alone with cuvid, it seems abnormally slow , regardless of the filter MT Mode or prefetch >1 value . Thrashing or something . But seems ok in vapoursynth with native threading
I recommend putting GPU filters after Prefetch.
Atak_Snajpera
19th September 2019, 20:50
Atak,
Any chance you can share your stream as I requested?
http://www.mediafire.com/file/dh86soca2m66n6b/video.mkv/file
videoh
20th September 2019, 04:16
Got it. Thank you, Atak.
hydra3333
20th September 2019, 07:45
GTX 1060 3GB checking in.
Well, I have a 3900X/cheapie-2060-Super/vapoursynth-portable/DG-tools/Win10x64, if someone wants to provide a script and a couple of lines outlining how to measure the time. The 3900X is a tad crippled though, with only 2666 RAM awaiting some 3600 to arrive (hence the 3900X infinity fabric is running at 3/4 of "normal").
DJATOM
20th September 2019, 10:32
I also have 3900X with Asus ROG Crosshair Hero 7 mobo and 2x 16 GB RAM (OC to 3333 MHz) + RTX2070 (Gigabyte Gaming OC, 3 fans). Unfortunately I don't have much time to tinker with scripts this week, but can run some batches before I'll go to sleep.
Selur
24th September 2019, 14:27
Using 20190917 I have some problem with a mkv(interlaced avc,wav) using:
core.lsmas.LWLibavSource(source="Il Silenzio (Melissa Venema) [Live in Maastricht II] (N).mkv", format="YUV420P8", cache=0, prefer_hw=1)
only one frame is returned
using:
core.lsmas.LWLibavSource(source="Il Silenzio (Melissa Venema) [Live in Maastricht II] (N).mkv", format="YUV420P8", cache=0, prefer_hw=0)
vspipe and vsviewer simply close. :)
DGSource and FFVideoSource (ffms2 and ffms2k) both work.
Same happens with AvsPmod and avs2yuv when using:
LWLibavVideoSource("C:\Users\Selur\Desktop\Il Silenzio (Melissa Venema) [Live in Maastricht II] (N).mkv",cache=false,format="YUV420P8", prefer_hw=1)
(only one Frame)
and
LWLibavVideoSource("C:\Users\Selur\Desktop\Il Silenzio (Melissa Venema) [Live in Maastricht II] (N).mkv",cache=false,format="YUV420P8", prefer_hw=0)
program closes.
(as expected using prefer_hw=3 also returns one frame)
Uploaded the source to my GoogleDrive (https://drive.google.com/open?id=1D77BWzCD0UWs2S6188rLFbut5_4O7pgo).
(Using Ryzen 7 1800X, Windows 10, 32GB RAM, GeForce GTX 1070Ti in case it helps.)
Cu Selur
Atak_Snajpera
24th September 2019, 15:02
Using 20190917 I have some problem with a mkv(interlaced avc,wav) using:
core.lsmas.LWLibavSource(source="Il Silenzio (Melissa Venema) [Live in Maastricht II] (N).mkv", format="YUV420P8", cache=0, prefer_hw=1)
only one frame is returned
using:
core.lsmas.LWLibavSource(source="Il Silenzio (Melissa Venema) [Live in Maastricht II] (N).mkv", format="YUV420P8", cache=0, prefer_hw=0)
vspipe and vsviewer simply close. :)
DGSource and FFVideoSource (ffms2 and ffms2k) both work.
Same happens with AvsPmod and avs2yuv when using:
LWLibavVideoSource("C:\Users\Selur\Desktop\Il Silenzio (Melissa Venema) [Live in Maastricht II] (N).mkv",cache=false,format="YUV420P8", prefer_hw=1)
(only one Frame)
and
LWLibavVideoSource("C:\Users\Selur\Desktop\Il Silenzio (Melissa Venema) [Live in Maastricht II] (N).mkv",cache=false,format="YUV420P8", prefer_hw=0)
program closes.
(as expected using prefer_hw=3 also returns one frame)
Uploaded the source to my GoogleDrive (https://drive.google.com/open?id=1D77BWzCD0UWs2S6188rLFbut5_4O7pgo).
(Using Ryzen 7 1800X, Windows 10, 32GB RAM, GeForce GTX 1070Ti in case it helps.)
Cu Selur
Indeed. Instant crash in my SeekTester tool
https://i.imgsafe.org/a2/a21dadf4db.png
Update: It looks like video has to be demuxed to raw .264 in order to be correctly decoded by LSMASH.
Update2: Demuxed .264 does not crash LSMASH but some frames are missing after seeking.
https://i.imgsafe.org/a2/a25398f5fb.png
videoh
24th September 2019, 22:11
Check this out! Can you please give me the x264 version you used and a link to get it? Thank you.
BTW, DGSource() on a 2080 Ti gives 500 fps for your stream compared to your 124 fps for SW, and 122 fps for your ancient GT 710.
Atak_Snajpera
24th September 2019, 23:16
Just download latest. Ask uncle google for directions.
videoh
24th September 2019, 23:37
First test with your stream to x64:
DGSource + 2080 Ti: 4:30
That is to be compared to your 13:21 for SW.
Looks like a fine boost to me.
poisondeathray
25th September 2019, 00:11
First test with your stream to x64:
DGSource + 2080 Ti: 4:30
That is to be compared to your 13:21 for SW.
Looks like a fine boost to me.
What about compared to your SW?
Comparing 2 different CPU's setups is less relevant if his is an old Q8200 and you have something probably 10 years newer and with more cores
videoh
25th September 2019, 02:19
Oy, OK, I never wanted lsmash or any of that on my PC but I'll see what I can do for you. BTW, I have only a lowly 7700K @4.2GHz. Your request is fair and reasonable.
MeteorRain
25th September 2019, 02:56
Actually this is the first time I noticed that's a q8200, which was launched 11 years ago.
This brings me the question. What kind of hardware spec range should we expect in daily benchmarks.
Shall we even care about how the performance is to encode x265 1080p on an 11 years old CPU?
Shall we focus on some more "modern" hardware?
IMHO, even E5v1 and SNB desktop series are considered a bit outdated now.
poisondeathray
25th September 2019, 03:13
Actually this is the first time I noticed that's a q8200, which was launched 11 years ago.
This brings me the question. What kind of hardware spec range should we expect in daily benchmarks.
Shall we even care about how the performance is to encode x265 1080p on an 11 years old CPU?
Shall we focus on some more "modern" hardware?
IMHO, even E5v1 and SNB desktop series are considered a bit outdated now.
It really doesn't matter, as long as the relevant testing and background info is provided for context. Some people have older than 10 year old hardware, some older 3rd or 4th computers laying around. It might be useful info for them. But given limited time constraints - I would put higher priority testing on newer hardware if possible
But the question here was measuring the effect of off loading "GPU" decoding on actual encoding speed. So the delta on the same hardware needs to be tested, not comparing different CPU hardware. Obviously the Q8200 will be slower for encoding in every situation
poisondeathray
25th September 2019, 03:18
Oy, OK, I never wanted lsmash or any of that on my PC but I'll see what I can do for you. BTW, I have only a lowly 7700K @4.2GHz.
haha, not even to check out the emerging "competition" ? :D
What if you have some non supported video format by Nvidia/CUVID ?
Thanks
Atak_Snajpera
25th September 2019, 09:35
First test with your stream to x64:
DGSource + 2080 Ti: 4:30
That is to be compared to your 13:21 for SW.
Looks like a fine boost to me.
Facepalm.jpg . You can't even do proper testing like i did.
DJATOM
25th September 2019, 09:58
Actually this is the first time I noticed that's a q8200, which was launched 11 years ago.
This brings me the question. What kind of hardware spec range should we expect in daily benchmarks.
Shall we even care about how the performance is to encode x265 1080p on an 11 years old CPU?
Shall we focus on some more "modern" hardware?
IMHO, even E5v1 and SNB desktop series are considered a bit outdated now.
I have Sandy Bridge based notebook and it feels much slower in browsing or doing stuff in programs (in comparison to R9 3900X). Not to mention it's slow af for heavy encoding or filtering.
Atak_Snajpera
25th September 2019, 10:40
I have Sandy Bridge based notebook and it feels much slower in browsing or doing stuff in programs (in comparison to R9 3900X). Not to mention it's slow af for heavy encoding or filtering.
I have Sandy bridge xeon e5-2690 and IT is as good as ryzen 1700. Comparing low clocked CPU in notebook with desktop 105w CPU is just silly.
DJATOM
25th September 2019, 10:50
Indeed it's just i7-2670QM with 2.2-3.1 (boost) GHz, but user experience differs even if I will set saving plan on Ryzen. 2.2 GHz at 3900X feels much faster in browser (that's my most common usage for notebook). Since both devices using SSD, I think it differs due to RAM frequency.
Atak_Snajpera
25th September 2019, 12:17
Indeed it's just i7-2670QM with 2.2-3.1 (boost) GHz, but user experience differs even if I will set saving plan on Ryzen. 2.2 GHz at 3900X feels much faster in browser (that's my most common usage for notebook). Since both devices using SSD, I think it differs due to RAM frequency.
Im also sure that you disabled those extra cores/threats...
videoh
25th September 2019, 14:33
Second test:
DGSource() 2080 Ti + x264 7700K: 4:28
LWLibavVideoSource() + x264 7700K: 4:57
Preset medium
That's about an 11% improvement on GPU versus doing everything on my fairly strong CPU.
I have always pointed out that the gain you can get will depend on how strong your processor is. And I have also pointed out that DG tools bring other things to the table, such as robust random access, GPU resizing, some CUDA filters, DGIndexNV as a useful analysis tool in its own right, etc. With over 15000 paid users, I'm not worried about justifying the existence of DG tools.
I'll do another test with UHD. It will be interesting to see how things scale with frame size.
Atak_Snajpera
25th September 2019, 14:49
I have always pointed out that the gain you can get will depend on how strong your processor is.
Not really. On my weak PC delta is ~9% in default medium x264 preset.
https://i.imgsafe.org/28/286ce8b657.png
On your much more powerful CPU (7700k@4.2Ghz) and GPU (RTX 2080TI) delta is ~11%. Your ultra expensive ~$1200 GPU with 4xfaster hardware decoding than ancient GT 710 kepler for $40 gives you only +2 percent points extra in video encoding. That's just pathetic.
Atak_Snajpera
25th September 2019, 15:03
710 is garbage so your results are pointless. So sorry you're too poor to buy decent hardware.
Oh finally you said it! That's what I wanted to hear from you! Old good Neuron2 (aka banned doom9 moderator) is back!
PS. HA! got ya! I was faster this time and I managed to quote your deleted comment! Not this time amigo!
videoh
25th September 2019, 15:07
I don't care what you quote, and your childish behavior doesn't move me at all. Grow up.
To be fair, I would encode on my snazzy 2080 Ti also. And the 2080 Ti is bringing me other things that are quite useful for me, such as tensor cores, massive number of CUDA cores for my physics simulations, etc. No matter how you blow smoke, I get an 11% improvement for your use case. It would be brain-dead for me not to choose DGSource() over LWLibavVideoSource().
You sound bitter because you can't afford decent hardware, although your post above suggests simple DG derangement syndrome.
DJATOM
25th September 2019, 15:16
Actually GT 710 is almost deprecated card. You will lose driver updates soon (from April 2020). Not to mention it's only ok for decoding 8 bit AVC, while RTX cards (or GTX 1660/1660Ti cards) has bleeding edge decoder, which significantly faster and supports more formats (HEVC 8/10/12 bit, VP9 8 and 10 bit).
Atak_Snajpera
25th September 2019, 15:20
Actually GT 710 is almost deprecated card. You will lose driver updates soon (from April 2020). Not to mention it's only ok for decoding 8 bit AVC, while RTX cards (or GTX 1660/1660Ti cards) has bleeding edge decoder, which significantly faster and supports more formats (HEVC 8/10/12 bit, VP9 8 and 10 bit).
You do realize that ancient GT 710 is just for showing desktop right? It was not bought for games or 4k HEVC movies. Instead of changing topic we should focus on facts. 4xfaster video engine gives you in real life encoding scenarios only +2 extra percent points versus something what costs $40.
Ps. And yes If you want you can also call me poor like videoh aka Neuron2 did in his post. I'm too old for this childish talking.
videoh
25th September 2019, 15:41
That's "neuron2", pal. Get it right!
poisondeathray
25th September 2019, 15:42
Thanks for the testing results,
@videoh -
How about LWLibavVideoSource(decoder="h264_cuvid") on the same hardware? I'd expect similar speeds as DGSource
DJATOM
25th September 2019, 15:51
You do realize that ancient GT 710 is just for showing desktop right? It was not bought for games or 4k HEVC movies. Instead of changing topic we should focus on facts. 4xfaster video engine gives you in real life encoding scenarios only +2 extra percent points versus something what costs $40.
While your ancient card only good for showing desktop, RTX card (for example, mine RTX 2070) can do OpenCL/CUDA computations, offloading a lot of work (for example, NNEDI3CL or EEDI3CL) from CPU. I've tried GTX750, 760, 1050, 2060 and 2070 cards with those filters: with decent CPU GTX 750, 760 and 1050 was a bottleneck for my filtering chain (thus slowing down encoder).
If you don't have to use heavy filtering, there is not much benefit from better card or GPU decoding at all (with slower presets), but when you have to do a lot of encodes, it definitely wins some time for you. Even your tests shows a benefit from using GPU decoder.
MeteorRain
25th September 2019, 15:53
On your much more powerful CPU (7700k@4.2Ghz) and GPU (RTX 2080TI) delta is ~11%. Your ultra expensive ~$1200 GPU with 4xfaster hardware decoding than ancient GT 710 kepler for $40 gives you only +2 percent points extra in video encoding. That's just pathetic.
People buy a house not because they want to have a slightly bigger closet in their room than your condo. People buy a RTX 2080Ti not because they want to show how 1% of its processing power improves 2% of the portion.
Your $1200 GPU argument makes no sense because its just a card he has. I have a $60 worth of GTX 950 and I can get exactly the same decoding speed that he got, and thus the same result. Saying people are pathetic buying a 1 million dollar house by comparing how big the closet is, is IMHO so funny that I literally laughed out load when I saw your words.
If you really want to see how big the difference that graphics card can do, decode and scale down some UHD to 1080P or 720P and check how big of the portion those card can take over. Even worse, I've got a 8K 60fps HEVC 10bit TV broadcasting stream that my freshly bought Ryzen 3600 can barely playback at 5 fps full CPU load. You want to see how $1200 GPU works compared to your $40 card? Try those. It can decode, downscale, process the image and then output to AviSynth at > 60 fps, comparing to, let's say, 3 fps on a 6 core CPU sold this year.
videoh
25th September 2019, 15:58
How about LWLibavVideoSource(decoder="h264_cuvid") on the same hardware? I'd expect similar speeds as DGSource Yes, indeed:
4:30
So maybe need to retract my point about being brain-dead not to choose DGSource() here, but the main point still applies, I'd be brain-dead not to choose GPU decoding.
To answer your earlier question...I haven't run into any files that DGSource() cannot open and that I needed to do anything with. Surely, if such files became important for me or my users I would first want to add support in DGDecNV but failing that, of course would revert to an appropriate SW decoder.
DJATOM
25th September 2019, 15:59
Thanks for the testing results,
@videoh -
How about LWLibavVideoSource(decoder="h264_cuvid") on the same hardware? I'd expect similar speeds as DGSource
I've measured some fps for certain sources (with vsedit's benchmark).
8bit AVC (.m2ts from BD)
Time elapsed: 1:05.696 - 547.85574001840120672568 FPS # DGSource
Time elapsed: 1:04.261 - 560.09428731059801975789 FPS # LWLibAvSource
10bit HEVC (.hevc ES)
Time elapsed: 1:17.304 - 465.58912373251257577067 FPS # DGNV
Time elapsed: 1:24.014 - 428.40612414457933709855 FPS # LWLibAvSource
Atak_Snajpera
25th September 2019, 16:02
Saying people are pathetic buying a 1 million dollar house by comparing how big the closet is, is IMHO so funny that I literally laughed out load when I saw your words.
Where did I write that?! Show me! You are now just spreading fake news. Not cool bro!
MeteorRain
25th September 2019, 16:04
Where did I write that?! Show me! You are now just spreading fake news. Not cool bro!
You are so funny.
videoh
25th September 2019, 16:04
I've got a 8K 60fps HEVC 10bit TV broadcasting stream that my freshly bought Ryzen 3600 can barely playback at 5 fps full CPU load. You want to see how $1200 GPU works compared to your $40 card? Try those. It can decode, downscale, process the image and then output to AviSynth at > 60 fps, comparing to, let's say, 3 fps on a 6 core CPU sold this year. Amen, bro!
Atak_Snajpera
25th September 2019, 16:08
You are so funny.
Unfortunately I can't say the same about your lies. They are not funny at all.
videoh
25th September 2019, 16:13
Cool down Atak. You did use the epithet "pathetic". MeteorRain was merely analogizing: obsessing over a video decoder when the card brings so much more to the table is like obsessing over a closet in a house. I suppose you did understand that but calling someone a liar is so much cooler, right? From the guy that is too old to be childish!
To be honest, I'm having trouble understanding what your basic point is here. Is it that using GPU power is useless? Is it that 2080 Ti etc. are poor value? DG tools suck? DG sucks? What is your overall point?
DJATOM
25th September 2019, 16:17
People buy a house not because they want to have a slightly bigger closet in their room than your condo. People buy a RTX 2080Ti not because they want to show how 1% of its processing power improves 2% of the portion.
Your $1200 GPU argument makes no sense because its just a card he has. I have a $60 worth of GTX 950 and I can get exactly the same decoding speed that he got, and thus the same result. Saying people are pathetic buying a 1 million dollar house by comparing how big the closet is, is IMHO so funny that I literally laughed out load when I saw your words.
If you really want to see how big the difference that graphics card can do, decode and scale down some UHD to 1080P or 720P and check how big of the portion those card can take over. Even worse, I've got a 8K 60fps HEVC 10bit TV broadcasting stream that my freshly bought Ryzen 3600 can barely playback at 5 fps full CPU load. You want to see how $1200 GPU works compared to your $40 card? Try those. It can decode, downscale, process the image and then output to AviSynth at > 60 fps, comparing to, let's say, 3 fps on a 6 core CPU sold this year.
Yeah, good real word scenario. I've tried to decode such stream on my card and it's playing smoothly without any stutters while stuttering a lot with SW decoder and 3900X (40-60% utilization, so a believe it hits PCIe Gen3 x16 max throughput capacity).
MI from that video: https://pastebin.com/xB8CHFiS.
videoh
25th September 2019, 16:25
so a believe it hits PCIe Gen3 x16 max throughput capacity That's a great point. If GPU processing has an achilles heel, that is it. We can look forward to greater bandwidth with future generations. Also, to mitigate this I have been experimenting with my CUDASynth framework, which allows many full frame transfers over PCIe to be eliminated for a script with multiple filters. You can read about it at the DG forum.
MeteorRain
25th September 2019, 16:52
SW decoder and 3900X (40-60% utilization, so a believe it hits PCIe Gen3 x16 max throughput capacity).
That means the video renderer can't keep up because of the limited PCIe bandwidth? Very interesting.
Maybe try reducing the frame rate down to 30 on MKV, and pin the video player process to half of the cores, and see how that works? That should tell us whether it's due to the bandwidth or due to, like, CPU scheduler.
vBulletin® v3.8.11, Copyright ©2000-2025, vBulletin Solutions Inc.