View Full Version : HEVC CPU load with Threadripper - "threading" it right?!
blublub
13th March 2019, 19:43
Hi
it has been posted a couple of times that HEVC doesn't fully utilize high core count CPUs in 1080p and even 2160p.
Today I did some testing and I was really surprised at the outcome:
x265 "defaults":
numa-pools=32
frame threads=6
CPU load: 60-75%
Encoding at FPS = 5,9
x265 settings I had used for the last months to improve quality:
numa-pools=32
frame threads=2
CPU load: 65%
Encoding at FPS = 6,2
Since the CPU load was far from utilized 100% I thought it is time to RTFM again. After reading it I set "numa pools = 1" since I have only a 1 Socket CPU.
x265 "numa 1":
numa-pools=1
frame threads=1
CPU load = 2-3%
Encoding at FPS = 0,3
x265 "numa 48":
numa-pools=48
frame threads=1
CPU load = 88%
Encoding at FPS = 7,3
x265 "numa 48 - II":
numa-pools=48
frame threads=2
CPU load = 100%
Encoding at FPS = 7,8
x265 "numa 48 - III":
numa-pools=48
frame threads=3
CPU load = 100%
Encoding at FPS = 8,2
So increasing numa-pools does help with CPU utilization and speed. With frame-thread=2 the load was 100%.
Increasing numa-pools over 48 did not increase CPU load or speed in FPS any further for me.
Also when using standard numa-pools of 32 increasing frame-threads over 1 only seems to speed up the encode until a value of 2 as it can be seen that the encode with frame-thread 6 is a tad slower than with 2.
But a frame-thread value of 2 or 3 can see a real benefit after increasing the number of numa-pool to 48 since FPS maximum was 8,2 with frame-threads=3 and numa-pools=48.
Further increasing frame-threads to 5 did not result in higher FPS.
So question is: Is there any disadvantage with using high numa-pools as there is quality degradation when using higher frame-threads?
cheers
sneaker_ger
13th March 2019, 20:22
Which Threadripper do you have? What OS?
Have you tried something like --numa-pools "12,12,12,12" ? --pmode? --pme? Reduced CTU max size?
I guess with that many threads it might be time to switch to running parallel instances running on different parts of a movie (like RipBot implements).
blublub
13th March 2019, 20:43
Which Threadripper do you have? What OS?
Have you tried something like --numa-pools "12,12,12,12" ? --pmode? --pme? Reduced CTU max size?
I guess with that many threads it might be time to switch to running parallel instances running on different parts of a movie (like RipBot implements).
Hi I have the 16c 2950x and use Win 10 Prof.
I already used CTU=32 and qg-size=16 in all my tests and all previous encodes.
I tried pme and pmode but I did neither observe more FPS or or a really higher CPU load.
That's why I started fiddling with numa-pools.
What does "--numa-pools "12,12,12,12" do?
EDIT: found it in the PDF. But I still have no idea which setting is a wise choice ;-)
EDIT2: "--numa-pools "12,12,12,12" only results in a CPU load around 50%
Selur
13th March 2019, 21:36
@blublub: out of curiosity: What CPU usage do you get when not specifying numa-pools and frame threads? (not sure whether the 'defaults' referred to this or if you explicitly specified the values)
blublub
13th March 2019, 22:09
@blublub: out of curiosity: What CPU usage do you get when not specifying numa-pools and frame threads? (not sure whether the 'defaults' referred to this or if you explicitly specified the values)
The defaults I posted in my 1st post are the x265 default for those2 options ;-)
blublub
13th March 2019, 22:59
Ok after a little more testing:
numa-pools = 36 or 38 does max out the CPU with frame-threads=2. With frame-threads=1 it pretty much takes numa-pools to be 48.
When I lower numa-pools and enable "pme" the CPU load does go up by about 5-8% but speed seems to be reduced by a lot - so that hurts encoding speed it seems.
Is anyone using high core count Xeons, maybe a 3175x? I am interested what other users do to max out the CPU. I just have hard time to believe that encoding multiple jobs at the same time is the only solution.
BLKMGK
27th April 2019, 23:12
Ok after a little more testing:
numa-pools = 36 or 38 does max out the CPU with frame-threads=2. With frame-threads=1 it pretty much takes numa-pools to be 48.
When I lower numa-pools and enable "pme" the CPU load does go up by about 5-8% but speed seems to be reduced by a lot - so that hurts encoding speed it seems.
Is anyone using high core count Xeons, maybe a 3175x? I am interested what other users do to max out the CPU. I just have hard time to believe that encoding multiple jobs at the same time is the only solution.
Windows or Linux?
I have a dual 10core V2 Xeon machine (20 threads) that I ran into this issue with running Linux. Since I also have several other machines in my home with spare cores also running Linux I began researching how best to distribute the load. The end result has been a Docker Swarm and ReddisQueue application with some custom code. I had the concept, I built a bash script PoC, and then I worked with a talented friend to try and build the rest on my hardware. It’s working (across multiple machines) but we’re still tweaking and bug stomping right now. We expect to release the code publicly fairly soon as aside from RipBot I’ve seen nothing public like it and was pretty frustrated. Frankly we’re hoping others help us improve it and that this is a good headstart :D It’s not flashy.
RipBot works well with Windows and can use AVIsynth filters too which is very nice but it won’t interleave jobs. I use this on my Windows boxes and when I need filters, I expect I’ll be trying containers on them in the future to use the custom code described above. If your desire is no fuss no muss for Windows RipBot is pretty kickass with only a few annoyances IMO. It gives good status feedback and won’t kill user performance so it can be run across even somewhat heavily utilized machines (my headless surveillance machine runs it for instance). RipBot can also leverage video hardware if that’s of interest. It gets around the core utilization issue by running multiple encoders in parallel btw.
P.S. This machine and another will get the new Ryzen 16core CPUs when released, both will be leveraged for encoding.
FranceBB
28th April 2019, 00:34
Ok after a little more testing:
Is anyone using high core count Xeons, maybe a 3175x? I am interested what other users do to max out the CPU. I just have hard time to believe that encoding multiple jobs at the same time is the only solution.
Intel Xeon 28c/56th over here, without AVX512, anyway I never managed to max it out. Please note that it's a dual socket CPU and 28c/56th is the sum of those two. This is basically why I use to run multiple files at the same time. For instance, when we gotta encode MPEG-2 files for broadcast usage, I run up to 56 encodes at the same time, 'cause MPEG-2 encoders were meant to be used on monocore CPUs. Anyway, one of the reason why I never managed to max it out with x265 was that I'm using Avisynth on 4K 12bit contents which are brought up to 16bit for post-processing, but this kind of files are very poorly handled by Avisynth when it comes to speed and going through a pipe before reaching x265 doesn't help.
Anyway, if you want and if I'll have spare time I'll try to play with that a little bit more again with your settings.
Forteen88
28th April 2019, 10:17
So question is: Is there any disadvantage with using high numa-pools as there is quality degradation when using higher frame-threads?Since no one answers that question, you should do a VMAF or SSIM-test to see if it generates different values.
benwaggoner
30th April 2019, 05:14
Since no one answers that question, you should do a VMAF or SSIM-test to see if it generates different values.
There can be quality regressions when using frame threading. Although that can be turned off directly via -F 1.
Jamaika
30th April 2019, 07:44
Since no one answers that question, you should do a VMAF or SSIM-test to see if it generates different values.
If we use ffmpeg/x265 & vmaf threads is default zero.
{"n_threads", "Set number of threads to be used when computing vmaf.", OFFSET(n_threads), AV_OPT_TYPE_INT, {.i64=0}, 0, UINT_MAX, FLAGS},
For x265 you can change adding values to threads, but for what? This is only json charts in X265. SVT codecs have already built-in VMAF metrics and working.
double x265_calculate_vmaf_framelevelscore(x265_param *param, x265_vmaf_framedata *vmafframedata)
{
double score;
int (*read_frame)(float *reference_data, float *distorted_data, float *temp_data,
int stride, void *s);
if (vmafframedata->internalBitDepth == 8)
{
read_frame = read_frame_8bit;
if (vmafframedata->internalCsp == X265_CSP_I420) compute_vmaf(&score, vcd_yuv420p->format, vmafframedata->width, vmafframedata->height, read_frame, vmafframedata, vcd_yuv420p->model_path, vcd_yuv420p->log_path, vcd_yuv420p->log_fmt, vcd_yuv420p->disable_clip, vcd_yuv420p->disable_avx, vcd_yuv420p->enable_transform, vcd_yuv420p->phone_model, vcd_yuv420p->psnr, vcd_yuv420p->ssim, vcd_yuv420p->ms_ssim, vcd_yuv420p->pool, param->frameNumThreads, vcd_yuv420p->subsample, vcd_yuv420p->enable_conf_interval);
}
return score;
}
benwaggoner
1st May 2019, 18:51
There can be quality regressions when using frame threading. Although that can be turned off directly via -F 1.
Also, frame-threading artifacts are typically around the GOP boundary, and with a long GOP that can get buried in the overall VMAF. Comparing the minimum frame VMAF or the lowest 1% of frame VMAF values can find regressions much more effectively.
Forteen88
1st May 2019, 20:20
There can be quality regressions when using frame threading. Although that can be turned off directly via -F 1.Thanks, but I've already read that you've written that before here at Doom9. I was just wondering, like the first poster did, if there is any disadvantage (quality degradation) with using a high number of numa-pools.
@Jamaika. I'll rather use your VMAF-compile of x265 that you released not long ago :)
~ VEGETA ~
3rd May 2019, 03:22
Someone suggested getting these for encoding x264/x265, being a cheap and effective build:
https://www.aliexpress.com/item/Buy-discount-motherboard-HUANAN-ZHI-dual-X79-LGA2011-motherboard-with-M-2-slot-dual-Giga-LAN/32886314322.html
https://www.aliexpress.com/item/Intel-Xeon-Processor-E5-2680-V2-CPU-2-8-LGA-2011-SR1A6-Ten-Cores-Server-processor/32831123192.html
So it is dual xeon rig, what do you think?
After some digging on myself for fun reasons, I found this:
https://www.techspot.com/review/1218-affordable-40-thread-xeon-monster-pc/page7.html
Looking forward to your opinions. I wanna know what affordable build or set up one can use to encode. I am now using my dedicated server for this, it is a lousy ATOM 2750 which is not suitable for encoding.
RanmaCanada
3rd May 2019, 04:48
Someone suggested getting these for encoding x264/x265, being a cheap and effective build:
https://www.aliexpress.com/item/Buy-discount-motherboard-HUANAN-ZHI-dual-X79-LGA2011-motherboard-with-M-2-slot-dual-Giga-LAN/32886314322.html
https://www.aliexpress.com/item/Intel-Xeon-Processor-E5-2680-V2-CPU-2-8-LGA-2011-SR1A6-Ten-Cores-Server-processor/32831123192.html
So it is dual xeon rig, what do you think?
After some digging on myself for fun reasons, I found this:
https://www.techspot.com/review/1218-affordable-40-thread-xeon-monster-pc/page7.html
Looking forward to your opinions. I wanna know what affordable build or set up one can use to encode. I am now using my dedicated server for this, it is a lousy ATOM 2750 which is not suitable for encoding.
Not worth the money. I recently moved from an E5-2670 to a Ryzen 2700 and I literally almost doubled my frame rate while cutting my power usage by a good third (150 watts while encoding vs 230 watts). If I overclock the 2700 to 2700x speeds, I do double my performance from my old Xeon at the same power usage. The sheer lack of AVX 2.0 makes them horrible for x265. There is a thread by Sagitarre? that properly benches processors for x265. If you check the thread you'll see that all AVX processors are beaten by similarly spec'd AVX 2.0 processors. Even my old 8 core was beaten easily by a quad core with AVX 2.0.
I'd strongly suggest you wait till Zen2 comes out before you make your purchase. You will either be able to get a new system that will destroy the dual Xeon and sip power, or someone's old stuff because they wanted to upgrade. Should be anywhere from 1-2 months at this point.
Same issue here:
/ffmpeg -loglevel verbose -i 35543_1080p_25.mov -vf scale=960:540 -pix_fmt yuv420p10le -codec:v libx265 -x265-params keyint=100:min-keyint=200:no-open-gop=1:nal-hrd=VBR:force-cfr -level 5.1 -profile:v main10 -preset slow -crf 23 -maxrate 3M -bufsize 3M -force_key_frames "expr:eq(mod(n,100),0)" -c:a:0 aac -ac:a:0 2 -ab:a:0 128k -y foo_crf23_3.mp4
OS: RHEL 7.3 x64
CPU : Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz x 2
So 40 "cores"
ffmpeg 4.1.3 static
Can never get over 10% usage with 1 encode job..
Any tips? Ive tried ffmpeg threads = 20... didnt help much..
Mov file is local
sneaker_ger
3rd May 2019, 08:34
keyint=100:min-keyint=200
:confused:
-maxrate 3M -bufsize 3M
That's quite low. Is this for streaming via slow Internet/wifi?
Can never get over 10% usage with 1 encode job..
Any tips? Ive tried ffmpeg threads = 20... didnt help much..
Mov file is local
How is CPU usage if you remove -vf scale? How fast is simple decoding of source?
ffmpeg -i 35543_1080p_25.mov -benchmark -f null -
I guess the low resolution makes it especially difficult.
excellentswordfight
3rd May 2019, 09:03
Same issue here:
/ffmpeg -loglevel verbose -i 35543_1080p_25.mov -vf scale=960:540 -pix_fmt yuv420p10le -codec:v libx265 -x265-params keyint=100:min-keyint=200:no-open-gop=1:nal-hrd=VBR:force-cfr -level 5.1 -profile:v main10 -preset slow -crf 23 -maxrate 3M -bufsize 3M -force_key_frames "expr:eq(mod(n,100),0)" -c:a:0 aac -ac:a:0 2 -ab:a:0 128k -y foo_crf23_3.mp4
OS: RHEL 7.3 x64
CPU : Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz x 2
So 40 "cores"
ffmpeg 4.1.3 static
Can never get over 10% usage with 1 encode job..
Any tips? Ive tried ffmpeg threads = 20... didnt help much..
Mov file is local
540p is rather low, so you might not get great utilization, but 10% sounds a bit low either way, with --ctu 32 --merange 26 I have no issue to saturate 24threads. and without it i can saturate arround 16threads for 1080p video.
Imo the multithread scaling with x265 is already very impressive, note sure why people think there is some issue cause it doesnt scale infinitely, this is still an complex GOP-based codec so unless you do chunk encoding there will never be infinite scaling.
@sneaker
25p x 4 sec gop = 100
Middle profile of a long stack/ladder
bench : 14.4x Realttime
This is just irritating ;)
40 "cores", with 2 ffmpeg instances,
https://imgur.com/a/SHlMrJ8
excellentswordfight
3rd May 2019, 10:48
@sneaker
25p x 4 sec gop = 100
Middle profile of a long stack/ladder
bench : 14.4x Realttime
Well I think he understood that part, but why is the min-keyint higher then keyint. And why use level 5.1?
Anyway, did you try lowering ctu and merange?
edit:
"ffmpeg.exe" -i "1080p25_input" -vf "scale=960:540" -f yuv4mpegpipe -strict -1 - | "x265.exe" --y4m --input-depth 8 --output-depth 10--preset slow --profile main10 --level-idc 31 --crf 23 --ctu 32 --merange 26 --keyint 100 --min-keyint 25 --rc-lookahead 100 --vbv-maxrate 3000 --vbv-bufsize 3000 --no-open-gop --colorprim bt709 --transfer bt709 --colormatrix bt709 --range limited - -o NUL
I guess this is some what simlar to the video settings you are going for, I get 60% CPU usage on 24T with those.
edit2.
Wait what, 14x realtime!? Are you hitting 350fps? I'm hitting close 2x realtime with the settings above, ~45fps.
~ VEGETA ~
3rd May 2019, 11:28
Not worth the money. I recently moved from an E5-2670 to a Ryzen 2700 and I literally almost doubled my frame rate while cutting my power usage by a good third (150 watts while encoding vs 230 watts). If I overclock the 2700 to 2700x speeds, I do double my performance from my old Xeon at the same power usage. The sheer lack of AVX 2.0 makes them horrible for x265. There is a thread by Sagitarre? that properly benches processors for x265. If you check the thread you'll see that all AVX processors are beaten by similarly spec'd AVX 2.0 processors. Even my old 8 core was beaten easily by a quad core with AVX 2.0.
I'd strongly suggest you wait till Zen2 comes out before you make your purchase. You will either be able to get a new system that will destroy the dual Xeon and sip power, or someone's old stuff because they wanted to upgrade. Should be anywhere from 1-2 months at this point.
It is about 220$ only which is half the price of 2 of those xeon cpus, are you sure it gives the same processing power for x264 and x265? since I encode mostly in x264 10-bit and it is what is important to me the most.
Also, I liked the xeon build because it allows me to run engineering software (solidworks, rendering stuff,etc...) better than normal cpu. I also do some light gaming (league of legends) so although xeons are not suitable for gaming, they are enough for that if I pair them with a good gpu (say 1080ti gtx or something). that was the plan and it won't be applied before 6 months or so.
So is this ryzen 2700 any good for gaming and big demanding software? how much fps do you get while encoding (just a general estimate) 1080p material? rough figures are welcome.
edit: kindly give me the link to that benchmarking thread.
excellentswordfight
3rd May 2019, 12:14
Also, I liked the xeon build because it allows me to run engineering software (solidworks, rendering stuff,etc...) better than normal cpu.
There is no magic on the xeon line for that stuff, its basically the ECC support (which is not cricital for those workloads) and the fact that in past you almost had to use them to get the core count up. But in reality you could actually lose performance because they lacked in singel thread performance compared to the consumer line, cause most workstations are not used for rendering 24/7 so interactive workloads are faster on consumer chips a lot of the time. This have been somewhat fixed with the new Xeon-W line though.
So is this ryzen 2700 any good for gaming and big demanding software? how much fps do you get while encoding (just a general estimate) 1080p material? rough figures are welcome.
edit: kindly give me the link to that benchmarking thread.
The most promising part of the upcomming Ryzen3 is that they should no longer have a big AVX disadvantage compared to intel (9700k beats or is close to 2700X in x264 and x265 because of it for example). So they should give very nice value for both encoding and general workloads like gaming and professional software suits cause you get an high thread count (for an consumer line) and decent singelthread performance.
~ VEGETA ~
3rd May 2019, 15:05
There is no magic on the xeon line for that stuff, its basically the ECC support (which is not cricital for those workloads) and the fact that in past you almost had to use them to get the core count up. But in reality you could actually lose performance because they lacked in singel thread performance compared to the consumer line, cause most workstations are not used for rendering 24/7 so interactive workloads are faster on consumer chips a lot of the time. This have been somewhat fixed with the new Xeon-W line though.
The most promising part of the upcomming Ryzen3 is that they should no longer have a big AVX disadvantage compared to intel (9700k beats or is close to 2700X in x264 and x265 because of it for example). So they should give very nice value for both encoding and general workloads like gaming and professional software suits cause you get an high thread count (for an consumer line) and decent singelthread performance.
I am planning to get a nice desktop computer later on, which is to be used for everything including encoding, engineering software (solidworks, autocad plant 3d, etc...), little gaming (not too huge ones) and so on.
Some people recommended Ryzen 2700x to me as the best for all-purpose PC, coupled with good nvidia GPU (1060 or more).
In encoding, I use vapoursynth with stuff like bm3d, f3kdb, line stuff, etc.. and encode 1080p material mostly and in x264-10bit. I have online.net atom 2750 dedicated server which is no were near good for encoding but I get it done since I don't really care for speed on a dedicated server.
However, I really want some nice fps if I wanna do it on my desktop right? I'd say 10 fps is nice and +20 fps is good for such a script. ofc, encoding to lossless then directly to 1080p and 720p seems faster.
Forteen88
3rd May 2019, 15:32
I...and encode 1080p material mostly and in x264-10bit.x264-10bit encodes are not GPU-decodable att all, so I would not recommend doing those encodes. x265-10bit encodes are fully GPU-decodable on Nvidia GF 960 graphicscards or better.
~ VEGETA ~
3rd May 2019, 19:51
x264-10bit encodes are not GPU-decodable att all, so I would not recommend doing those encodes. x265-10bit encodes are fully GPU-decodable on Nvidia GF 960 graphicscards or better.
I am not sure I understood what you meant.
I use x264-10bit so I need good cpu to do it, but also should be good enough for gaming and stuff. is ryzen 2700x the one?
Nico8583
3rd May 2019, 20:04
Perhaps you should wait Ryzen 3 ?
~ VEGETA ~
3rd May 2019, 22:53
The price is king too xD.
Well I think he understood that part, but why is the min-keyint higher then keyint. And why use level 5.1?
The first one is a trick to get forced IDR at the start and dynamic Iframes on scenecut. Only syntax ive found that actually works..
5.1 was just some leftover code from UHD.. I can lower it since its not needed..
Anyway, did you try lowering ctu and merange?
Nope, i have zero clue what those 2 mean.
edit:
"ffmpeg.exe" -i "1080p25_input" -vf "scale=960:540" -f yuv4mpegpipe -strict -1 - | "x265.exe" --y4m --input-depth 8 --output-depth 10--preset slow --profile main10 --level-idc 31 --crf 23 --ctu 32 --merange 26 --keyint 100 --min-keyint 25 --rc-lookahead 100 --vbv-maxrate 3000 --vbv-bufsize 3000 --no-open-gop --colorprim bt709 --transfer bt709 --colormatrix bt709 --range limited - -o NUL
I guess this is some what simlar to the video settings you are going for, I get 60% CPU usage on 24T with those.
Strange that im not getting that..
edit2.
Wait what, 14x realtime!? Are you hitting 350fps? I'm hitting close 2x realtime with the settings above, ~45fps.[/QUOTE]
Well thats what the benchmark result said. Prores file is on a NFS share with 10gig interface
excellentswordfight
4th May 2019, 00:04
The first one is a trick to get forced IDR at the start and dynamic Iframes on scenecut. Only syntax ive found that actually works..
So I guess you want the IDR frames to be fixed at 4s intervals and scenecut I frames not to be IDRs?
Nope, i have zero clue what those 2 mean.
From the docs:
"Maximum CU size (width and height). The larger the maximum CU size, the more efficiently x265 can encode flat areas of the picture, giving large reductions in bitrate. However this comes at a loss of parallelism with fewer rows of CUs that can be encoded in parallel,"
The benefit of the defaut value of 64 isnt that big with lower resolutions, so it can be a very nice trade off performance wise to lower it if you are seing low thread utilization. And with a lower CTU merange can also be lowered.
Well thats what the benchmark result said. Prores file is on a NFS share with 10gig interface
Sorry, I missed that it was an decoder benchmark, I thought that it was your encoder speed! And in that case, the bottleneck shoudlnt be on the decoder.
mparade
18th June 2019, 21:03
I cannot saturate the cores of my threadripper 1950X even by running 4 pcs of 2160p encodes in parallel. At first, my machine is running out memory (16GB).
I am using preset slow and 1 frame thread in my encodes. My threadripper behaves completely differently
as the one of blublub's.
Any advice would be appreciated.
Atak_Snajpera
21st June 2019, 11:18
I cannot saturate the cores of my threadripper 1950X even by running 4 pcs of 2160p encodes in parallel. At first, my machine is running out memory (16GB).
I am using preset slow and 1 frame thread in my encodes. My threadripper behaves completely differently
as the one of blublub's.
Any advice would be appreciated.
Running four x265 encoders with 2160p source is just insane! No wonder you are running out of memory. One instance should be enough for 32 threads. What GUI are you using? Show us cpu usage in Process Hacker for each process spawned by your application (ffmpeg.exe , x265.exe and so on).
Forteen88
21st June 2019, 12:13
@Atak_Snajpera. He's probably doing that for better image quality on the encodes.
Using many threads in x265/x264 equals worse image quality.
If the encode had a lower resolution than 2160p, I'd suggest lowering --me-range and --ctu.
Asmodian
21st June 2019, 19:29
I always use --frame-threads 1 myself but I only need two encodes for my 20 threads and I have 32GB of memory.
Two frame threads has very little impact of quality, it may be worth it. Either that or more RAM. :)
Blue_MiSfit
21st June 2019, 20:18
There are occasional issues with more than 1 frame thread, I highly suggest anyone using vbv to only use 1 frame thread :)
There are also still edge cases where using vbv will cause a few frames of explosive blocking, mostly with HDR. I'm actually working right now on isolating the issue.
mparade
23rd June 2019, 21:35
Thanks for the answers.
I solved this "memory" issue by:
- switched memory mode to local from distributed;
- switched off SMT in AMD Ryzen Master;
- installing 4x8Gbyte RAM instead of 1x16GB;
- using 8 threads per encode (4 encodes are being run in parallel to reach 100% processor utilization and ~60% level of memory);
- using 1 frame thread per encode with preset slower;
- this way my 1950X System is working stable while encoding time is quite acceptable until I my new PC arrives. :thanks:
Atak_Snajpera
24th June 2019, 09:43
Thanks for the answers.
I solved this "memory" issue by:
- switched memory mode to local from distributed;
- switched off SMT in AMD Ryzen Master;
- installing 4x8Gbyte RAM instead of 1x16GB;
- using 8 threads per encode (4 encodes are being run in parallel to reach 100% processor utilization and ~60% level of memory);
- using 1 frame thread per encode with preset slower;
- this way my 1950X System is working stable while encoding time is quite acceptable until I my new PC arrives. :thanks:
You are losing a lot of performance by disabling SMT! NEVER DO THAT ON AMD's CPUS!
mparade
24th June 2019, 17:13
You are losing a lot of performance by disabling SMT! NEVER DO THAT ON AMD's CPUS!
Thanks for the advise. As far as I remember, I experienced exactly the opposite when using several encodes in parallel with pool feature and -- frame threads=1.
But I am going to stop my encodes to see if you are right indeed. :)
…..I couldn't prove that smt has more than no effect on fps on my System
maybe SMT is only useful in enoding with specific settings
Atak_Snajpera
24th June 2019, 20:20
Thanks for the advise. As far as I remember, I experienced exactly the opposite when using several encodes in parallel with pool feature and -- frame threads=1.
But I am going to stop my encodes to see if you are right indeed. :)
…..I couldn't prove that smt has more than no effect on fps on my System
maybe SMT is only useful in enoding with specific settings
Just run this benchmark without SMT and then with SMT
http://forum.pclab.pl/topic/1184884-x265-FHD-Benchmark/
If you disable SMT then you lose ~15%
https://i.imgsafe.org/12/1246a18f41.png
mparade
24th June 2019, 22:07
I see but this 15% is not realized in fps during encode on my 1950X. No improvement is actually realized, just try it. If it is not the case by you, then your settings must be different than those of mine.
Atak_Snajpera
25th June 2019, 11:35
I see but this 15% is not realized in fps during encode on my 1950X. No improvement is actually realized, just try it. If it is not the case by you, then your settings must be different than those of mine.
I use default settings. You must be doing something wrong. BTW. Using single stick 16GiB RAM on CPU with Quad Channel was extremely weird! I hope that you used atleast fast memory sticks because Threadrippers really need high bandwidth.
mparade
25th June 2019, 12:26
I use default settings. You must be doing something wrong. BTW. Using single stick 16GiB RAM on CPU with Quad Channel was extremely weird! I hope that you used atleast fast memory sticks because Threadrippers really need high bandwidth.
Must be. 2666Mhz as it was proposed by the company that sold the compilation. Default settings use frame threads > 1 most of time.
Atak_Snajpera
25th June 2019, 12:33
Must be. 2666Mhz as it was proposed by the company that sold the compilation. Default settings use frame threads > 1.
Was that the same company which installed only single stick and thus forced 1950x to work in single channel mode?
mparade
25th June 2019, 12:37
Was that the same company which installed only single stick and thus forced 1950x to work in single channel mode?
Yes. I didn't say that they are on top of the theme together with myself. They told they were not so familiar with AMD.
I told them to consider 3200Mhz as well but they answered that I could be sure that the rig would not be able to benefit from that.
Atak_Snajpera
25th June 2019, 13:38
Yes. I didn't say that they are on top of the theme together with myself. They told they were not so familiar with AMD.
I told them to consider 3200Mhz as well but they answered that I could be sure that the rig would not be able to benefit from that.
LOL! What a pros ;) Everybody knows that Infinity Fabric in Zen is tied with memory speed.
NikosD
26th June 2019, 12:01
Yes. I didn't say that they are on top of the theme together with myself. They told they were not so familiar with AMD.
I told them to consider 3200Mhz as well but they answered that I could be sure that the rig would not be able to benefit from that. A tech company that builds systems around Threadripper using 1 DIMM of RAM (single channel mode) and suggests 2666MHz instead of 3200MHz SHOULD STOP building Threadripper's rigs and just sell Intels.
They must be paid by Intel even when they are selling Threadripper CPUs that way, I think.
mparade
26th June 2019, 20:25
A tech company that builds systems around Threadripper using 1 DIMM of RAM (single channel mode) and suggests 2666MHz instead of 3200MHz SHOULD STOP building Threadripper's rigs and just sell Intels.
They must be paid by Intel even when they are selling Threadripper CPUs that way, I think.
An other problem is that they are not going to pay back anything. I keep now in my hands (due to this issue) 2x16GByte 2666Mhz + 4x8GByte 2666Mhz sticks if I really wanted ending up with some "proper" RAM configuration with my 1950X...at least 2 incompetent person needed for any jumbo trouble to realize. :)
mparade
28th June 2019, 22:02
Is it recommended to overclock threadripper for encoding using x265?
Atak_Snajpera
29th June 2019, 12:23
Is it recommended to overclock threadripper for encoding using x265?
NO. Energy consumption will go up much faster than fps.
http://www.nagorak.com/imagedump/R7-1700-Efficiency.jpg
mparade
29th June 2019, 17:35
Very interesting! Thank you very much.
vBulletin® v3.8.11, Copyright ©2000-2026, vBulletin Solutions Inc.