Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > MPEG-4 Encoder GUIs

Reply
 
Thread Tools Search this Thread Display Modes
Old 19th January 2025, 14:51   #21041  |  Link
Ryushin
Registered User
 
Ryushin's Avatar
 
Join Date: Mar 2011
Posts: 470
Quote:
Originally Posted by rlev11 View Post
My only real question was if the cache value that is set now by default to 8096 to fix the 16 cores 4k encoding, was if that would be enough when you threw a bunch more cores into the mix. I am assuming you did not or you would have mentioned that??
Actually, I was looking at just the new EPYC CPU. I'm not doing any 4K content yet, just HD with my SMDegrain Medium script.

Looking at the 16 core CPU, the same problem was happening as it had 16 avisynth threads and 32 h265 threads. Raising the avisynth threads to 20 seems to keep the threads filled and not starving. I'll have to monitor the 16 core server a bit more as I have Topaz Video AI also running on it, but that mostly uses the GPU.

When I get these jobs done in a week or so, I'll take my 4K Blade Runner clip that I used for testing last time and run some benchmarks with it and see what the results are

The default 8096 sure seems to have fixed the problem of having to run multiple encoding servers to fully utilize the CPU, so I like that. Having to run only a single enconding server is nice.

Not sure if I fully understand your question though. I haven't tested any 4K yet so I don't know if I can fully answer it yet.
Ryushin is offline   Reply With Quote
Old 19th January 2025, 15:12   #21042  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 7,915
Quote:
Originally Posted by Ryushin View Post
Actually, I was looking at just the new EPYC CPU. I'm not doing any 4K content yet, just HD with my SMDegrain Medium script.

Looking at the 16 core CPU, the same problem was happening as it had 16 avisynth threads and 32 h265 threads. Raising the avisynth threads to 20 seems to keep the threads filled and not starving. I'll have to monitor the 16 core server a bit more as I have Topaz Video AI also running on it, but that mostly uses the GPU.

When I get these jobs done in a week or so, I'll take my 4K Blade Runner clip that I used for testing last time and run some benchmarks with it and see what the results are

The default 8096 sure seems to have fixed the problem of having to run multiple encoding servers to fully utilize the CPU, so I like that. Having to run only a single enconding server is nice.

Not sure if I fully understand your question though. I haven't tested any 4K yet so I don't know if I can fully answer it yet.
if you had 32 core cpu and 4k video would you also have to raise memory from 8192 to 16384 to avoid poor cpu utilization like in the past with 4096?
Atak_Snajpera is offline   Reply With Quote
Old 19th January 2025, 17:07   #21043  |  Link
rlev11
Registered User
 
Join Date: Aug 2020
Location: Pennsylvania
Posts: 172
Quote:
Originally Posted by Atak_Snajpera View Post
if you had 32 core cpu and 4k video would you also have to raise memory from 8192 to 16384 to avoid poor cpu utilization like in the past with 4096?
Exactly what I am getting at, the 8192 may not be enough once we start hitting "Ludicrous Core " counts when doing 4k. The only issue I see if we have to bump that up for one high core machine, that cache memory value also appears to get sent to all the servers in the farm, so they all have to have enough memory to support the higher cache value. That was why it was good to find the lowest value before we got diminished returns and came up with the 8192.

Perhaps when doing an update in the future, it may be beneficial to add that cache memory value to the main ripbot settings. Default it at 8192, but allow us to change that. A tool tip for what it does and why to change it would also be helpful
rlev11 is offline   Reply With Quote
Old 19th January 2025, 18:35   #21044  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 7,915
or maybe it should scale automatically with number of cores? That's why I'm asking.
Atak_Snajpera is offline   Reply With Quote
Old 19th January 2025, 21:35   #21045  |  Link
Ryushin
Registered User
 
Ryushin's Avatar
 
Join Date: Mar 2011
Posts: 470
Well, I spoke too soon. It looks like if I'm just doing pure x265 with no filters I'll only get about 40% processor usage. Running two encoding servers with 24 cores each gets it to 85% CPU. I tried both 8192 and 16384. I'm only doing 2K right, now, 2560x1440. Same thing was happening with just x265 with the 16 core processor and I had to move back to using two encoding servers.

I should be able to test my Blade Runner 4K clip with and without SMDegrain in a couple of days.
Ryushin is offline   Reply With Quote
Old 19th January 2025, 21:41   #21046  |  Link
rlev11
Registered User
 
Join Date: Aug 2020
Location: Pennsylvania
Posts: 172
Quote:
Originally Posted by Atak_Snajpera View Post
or maybe it should scale automatically with number of cores? That's why I'm asking.
Would it be possible to set the cache individually on the encoding server side for each server, or does the same cache setting need to go out to each distributed server?

My only concern would be say you have a 32 core as your client. Just to pick a number, say 16 gig is optimum for the cache on that. If that 16 gig cache number is sent to a lower powered server, say an 8 core with only 16 gig total memory, whats going to happen to the 8 core machine that gets the memory pegged just for the encoding cache set at 16 gig?? A machine having to go back to using a swap file even with ssd's would not be a good thing IMO.

Scaling automatically (and we still need to wait and see if we need to scale the cache up with higher than 16 core machines while doing full frame 4k) would be ideal, but my thinking is that the cores and especially total individual memory of an entire distributed encoding farm would need to be taken into consideration and might be difficult to accomplish.
rlev11 is offline   Reply With Quote
Old 19th January 2025, 23:15   #21047  |  Link
slalom
Registered User
 
slalom's Avatar
 
Join Date: Jan 2010
Posts: 480
I understood that the cache was set automatically, according to the number of cpus. What is the problem you have?
__________________
E5 2697 v2 @ 3.0GHz on P9X79 Deluxe 24GB
Xeon E5-2680 v2 @ 3.1GHz 16GB
Sony Vaio VPC-F13Z1E/B
slalom is offline   Reply With Quote
Old 19th January 2025, 23:50   #21048  |  Link
rlev11
Registered User
 
Join Date: Aug 2020
Location: Pennsylvania
Posts: 172
Quote:
Originally Posted by slalom View Post
I understood that the cache was set automatically, according to the number of cpus. What is the problem you have?
I believe, and i may be incorrect, that the update Atak did where he adds in automatically the "SetMemoryMax(8192)" to the avisynth command which fixes the performance issue doing 4k on 16 core machines is set on everything now, it is not based on any cpu count calculation.

We are discussing now that Ryushin is running an Epyc system and creating a much larger core count computer if that 8192 for the cache is going to be enough with all the added cores and avisynth-prefetch-threads. Before figuring out what was going on and the cache update, anything above 12 for the avisynth-prefetch-threads just killed 4k performance on 16 core computers. It may or not be depending on some future testing. If it needs to be bumped up, just throwing out ideas on how the best way to handle that will be in a distributed environment.
rlev11 is offline   Reply With Quote
Old 20th January 2025, 20:03   #21049  |  Link
slalom
Registered User
 
slalom's Avatar
 
Join Date: Jan 2010
Posts: 480
Quote:
Originally Posted by rlev11 View Post
Before figuring out what was going on and the cache update, anything above 12 for the avisynth-prefetch-threads just killed 4k performance on 16 core computers. It may or not be depending on some future testing. If it needs to be bumped up, just throwing out ideas on how the best way to handle that will be in a distributed environment.
So in his case, the program maxed the memory usage of the system?
__________________
E5 2697 v2 @ 3.0GHz on P9X79 Deluxe 24GB
Xeon E5-2680 v2 @ 3.1GHz 16GB
Sony Vaio VPC-F13Z1E/B
slalom is offline   Reply With Quote
Old 20th January 2025, 20:33   #21050  |  Link
rlev11
Registered User
 
Join Date: Aug 2020
Location: Pennsylvania
Posts: 172
Quote:
Originally Posted by slalom View Post
So in his case, the program maxed the memory usage of the system?
Not really maxing the memory, it may max out the avisynth cache when doing 4k encoding.

Setting the cache to 8192 works for avisynth-prefetch-threads of 16 when doing 4k so we don't have to "cripple" the 16 core ryzens either using a lower prefetch thread of 12 or using an affinity mask to only use 12 cores for ripbot. It is yet to be determined if that will need to be a higher number with avisynth-prefetch-threads of 24 or 32 on a higher core cpu.
rlev11 is offline   Reply With Quote
Old 21st January 2025, 19:43   #21051  |  Link
slalom
Registered User
 
slalom's Avatar
 
Join Date: Jan 2010
Posts: 480
So avisynth can't handle the memory required above a number of processors. Even if there are 2 DE servers?
__________________
E5 2697 v2 @ 3.0GHz on P9X79 Deluxe 24GB
Xeon E5-2680 v2 @ 3.1GHz 16GB
Sony Vaio VPC-F13Z1E/B
slalom is offline   Reply With Quote
Old 21st January 2025, 21:37   #21052  |  Link
rlev11
Registered User
 
Join Date: Aug 2020
Location: Pennsylvania
Posts: 172
Quote:
Originally Posted by slalom View Post
So avisynth can't handle the memory required above a number of processors. Even if there are 2 DE servers?
Each DE server will use the same cache size sent to it.

The default cache size for avisynth (and i do not know exactly what it is, but it is not enough) is not large enough when doing all the extra bits when doing 4k encoding and using avisynth-prefetch-threads setting once that number hits 16 or larger. Encoding performance drops off a cliff if:
A: the cache is not specified to at least 8192 or
B: we tell ripbot to only use 12 of the 16 cores either by reducing the avisynth-prefetch-threads or disabling cores using an affinity mask. Either way turns your system into a 12 core machine for ripbot performance wise.

Before we figured out the cache was the issue, I had to do 4k encoding with my 5950 and 7950's essentially performing as 5900 and 7900's. The 16 core Ryzens would perform at about half the fps of a 12 core Ryzen without crippling them (hence falling off the cliff). Of course a crippled 16 core would not fully perform like it should for 1080p either, so I had to have 2 different encoding server profiles, one for 1080 and below, and another crippled one for 4k and switch them depending on what I was encoding. Now with the cache being set to 8192 automatically in Ripbot, I no longer have to worry about anything, The 16 core Ryzens perform like they should automatically using all the cores regardless of the resolution.
rlev11 is offline   Reply With Quote
Old 22nd January 2025, 20:25   #21053  |  Link
hardkhora
Registered User
 
Join Date: Mar 2014
Posts: 11
Quote:
Originally Posted by Atak_Snajpera View Post
\\192.168.0.1\Ripbot264temp instead of \\MY-PC\Ripbot264temp

Is there any disadvantage of using IP address in path? What are your thoughts?
This might actually help in some of the issues I've had in Windows.
I don't see it making an impact in Linux (still getting issues with Wine binding the port though for DE mode).
hardkhora is offline   Reply With Quote
Old 22nd January 2025, 20:27   #21054  |  Link
hardkhora
Registered User
 
Join Date: Mar 2014
Posts: 11
Quote:
Originally Posted by Ryushin View Post
Since I pass through the cores from Linux to the RB virtual machine, I can mix and match.
Do you use any kind of PCIe or GPU passthrough for KNLMeansCL?
If so, which Distro are you using?
hardkhora is offline   Reply With Quote
Old 23rd January 2025, 00:10   #21055  |  Link
Ryushin
Registered User
 
Ryushin's Avatar
 
Join Date: Mar 2011
Posts: 470
High Core Count

EPYC Turnin 9355P 32-Core Processor
24 Cores and 48 Threads Passed to VM
Blade Runner 4K 15 Minute Clip - 3840x1600

Encoding Servers SetMemoryMax() Prefetch x265-threads SMDerain Chunk Size CPU Time
1 8192 24 48 None 20 70% 22m:29s
1 16384 24 48 None 20 70% 22m:49s
1 8192 28 48 None 20 70% 22m:46s
1 16384 28 48 None 20 70% 23m:03s
1 16384 28 48 None 1 70% 24m:50s
1 8192 24 48 Hard 20 75% 27m:23s
1 16384 24 48 Hard 20 88% 21m:56s
1 16384 27 48 Hard 20 90% 20m:10s
1 16384 28 48 Hard 20 93% 19m:45s
1 16384 29 48 Hard 20 98% 19m:51s
1 16384 32 48 Hard 20 98% 20m:22s
2 8192 12 24 None 1 85% 24m:16s
2 16384 12 24 None 1 85% 22m:34s
2 8192 12 24 Hard 1 83% 24m:38s
2 16384 12 24 Hard 1 85% 24m:30s
2 8192 14 24 None 1 88% 20m:46s
2 16384 14 24 None 1 88% 21m:16s
2 8192 14 24 Hard 1 95% 24m:08s
2 16384 14 24 Hard 1 95% 24m:25s
1 12288 28 48 Hard 20 90% 20m:40s

Okay, I think I've finished my testing. Summary is SetMemoryMax to 16384 is a big help with 48 threads. Using a single encoding server was the faster compared to using two when encoding 4K. So it would be beneficial to have the option to SetMemoryMax per machine or at a minimum the option to set it for global. The Prefetch also needs to be increased from half of thread count to about 60% of thread count for optimum performance.

I guess for me, I'll move to a single encoding server for 4K content, and run dual encoding servers for HD content, though I still need to test HD content. I know for 2560x1400, two encoding servers for x265 was about 30-50% faster. Another test to do later.

Last edited by Ryushin; 23rd January 2025 at 15:22.
Ryushin is offline   Reply With Quote
Old 23rd January 2025, 01:28   #21056  |  Link
rlev11
Registered User
 
Join Date: Aug 2020
Location: Pennsylvania
Posts: 172
So looks to me like a cache estimate of 8 gig per 16 avisynth-prefetch-threads is reasonable as it scales up. I would imagine when you had prefetch threads set to 24 moving the cache down to 12 gig probably would have been fine as well.

I'll do some testing over the next couple of days bumping the cache up to 16384 and even beyond and see how the couple servers I only have 16 gig on them react. I have a 5950x and a 5900x with 16 gig total system memory, so it will be interesting to see the results using prefetch threads of 16 and 12 respectively. Having 16gig cache sent down to them may work without any issues, or may bog the system down.

Just curious with the hard SMdegrain what were your appx best fps numbers you were seeing in the encoding server window?

Last edited by rlev11; 23rd January 2025 at 01:33.
rlev11 is offline   Reply With Quote
Old 23rd January 2025, 15:19   #21057  |  Link
Ryushin
Registered User
 
Ryushin's Avatar
 
Join Date: Mar 2011
Posts: 470
Quote:
Originally Posted by rlev11 View Post
So looks to me like a cache estimate of 8 gig per 16 avisynth-prefetch-threads is reasonable as it scales up. I would imagine when you had prefetch threads set to 24 moving the cache down to 12 gig probably would have been fine as well.

I'll do some testing over the next couple of days bumping the cache up to 16384 and even beyond and see how the couple servers I only have 16 gig on them react. I have a 5950x and a 5900x with 16 gig total system memory, so it will be interesting to see the results using prefetch threads of 16 and 12 respectively. Having 16gig cache sent down to them may work without any issues, or may bog the system down.

Just curious with the hard SMdegrain what were your appx best fps numbers you were seeing in the encoding server window?
Was not really watching the fps very much. But the number of frames in the 15 minute source (15*60*23.796) video is 21416 and with 19m:45s being the fastest encoding time we get 18.072 fps.

8 GB per 16 threads is probably a good method. I just did another run with 12288 MB and the CPU graph was more spiky as some threads were waiting (added it to the previous post). So for 24 cores, 16384 MB is better. 32 cores might need another 8 GB. I suppose for testing sake, I can run two more tests giving the full 32 cores to the VM and see what that gives us.
Ryushin is offline   Reply With Quote
Old 23rd January 2025, 15:56   #21058  |  Link
Ryushin
Registered User
 
Ryushin's Avatar
 
Join Date: Mar 2011
Posts: 470
Encoding Server Crash with CPU Cores 26 or Higher

So it looks like I encountered a limit to RB when trying to select 32 cores in my VM. So I narrowed it down and selecting 26 cores and two threads (52 threads total) results in RB crashing.

Starting Ripbot gives me a Division by Zero error, clicking okay on the error then shows the Ripbot window.

Starting the Encoding server I get a Invalid Pointer Operation then another two error windows:

---------------------------
Encodingserver
---------------------------
Access violation at address 00401EBA in module 'EncodingServer.exe'. Write of address 00000000.
---------------------------

and

---------------------------
Application Error
---------------------------
Exception EAccessViolation in module EncodingServer.exe at 00001EBA.
Access violation at address 00401EBA in module 'EncodingServer.exe'. Write of address 00000000.
---------------------------

It's almost like Atak couldn't imagine all those years ago we would have so many cores in a chip.

With the latest EPYC having 192 cores 384 threads, might as well use a 16 bit integer now. LOL
Ryushin is offline   Reply With Quote
Old 23rd January 2025, 19:55   #21059  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 7,915
That sucks. Does EncodingClient.exe also crash in the same way?

If you disable SMT (26 cores/26 Threads) will it crash as well?

Last edited by Atak_Snajpera; 23rd January 2025 at 20:04.
Atak_Snajpera is offline   Reply With Quote
Old 23rd January 2025, 23:04   #21060  |  Link
Ryushin
Registered User
 
Ryushin's Avatar
 
Join Date: Mar 2011
Posts: 470
Quote:
Originally Posted by Atak_Snajpera View Post
That sucks. Does EncodingClient.exe also crash in the same way?

If you disable SMT (26 cores/26 Threads) will it crash as well?
I disabled SMT and just passed cores to the VM. As soon as I enabled 51 cores the problem occurs. I had one job in the queue and it does not show up since it tried to divide by zero, so I can't test the Encoding Client.
Ryushin is offline   Reply With Quote
Reply

Tags
264, 265, appletv, avchd, bluray, gui, iphone, ipod, ps3, psp, ripbot264, x264 2-pass, x264 gui, x264_64, x265, xbox360

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 16:41.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2025, vBulletin Solutions Inc.