Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
20th January 2023, 01:29 | #19921 | Link | |
Formally known as .......
Join Date: Sep 2021
Location: Down Under.
Posts: 1,000
|
Quote:
If it turned out similar to do what disabling all the "E" Cores on a 13900KF, then it's just SLOW !! We buy these CPU's for the power, not to "cut them in half" !! As rlev11, Ryushin & myself have determined, the sweet spot for the 7950X, is this (with about 95% of encodes) :- Code:
/avisynth-prefetch-threads 12 /x264-threads 16 /x265-threads 16 I am going to do some tests with an older x265 build, that is supposed to be optimised for different CPU's, so I will post some results, soon |
|
20th January 2023, 05:43 | #19922 | Link |
Formally known as .......
Join Date: Sep 2021
Location: Down Under.
Posts: 1,000
|
A LOT of x265 testing....
These results are from Atak's x265 Benchmark app :-
With updated x265 builds. with default included x265 - encoded 2500 frames in 20.75s (120.51 fps), 7025.74 kbps, Avg QP:37.21 x265 r74 - encoded 2500 frames in 19.89s (125.71 fps), 10890.80 kbps, Avg QP:35.84 x265 r83 - encoded 2500 frames in 19.83s (126.07 fps), 10890.80 kbps, Avg QP:35.84 x265 r85 - encoded 2500 frames in 19.72s (126.76 fps), 10890.80 kbps, Avg QP:35.84 x265 v3.5+67-aMod-gcc12.2.1 - encoded 2500 frames in 19.77s (126.46 fps), 10891.97 kbps, Avg QP:35.84 x265-x64-v3.5+67-aMod-gcc12.2.1 failed x265-3.5+84-5d8f209_vs2022 - encoded 2500 frames in 19.78s (126.39 fps), 10890.80 kbps, Avg QP:35.84 x265-3.5+84-5d8f209_vs2022-AVX2 - encoded 2500 frames in 19.84s (126.01 fps), 10890.80 kbps, Avg QP:35.84 x265-3.5+84-5d8f209_gcc122-AVX2 - encoded 2500 frames in 19.67s (127.07 fps), 10891.97 kbps, Avg QP:35.84 x265-3.5+84-5d8f209_gcc122 - encoded 2500 frames in 19.62s (127.40 fps), 10890.80 kbps, Avg QP:35.84 fastest Patman builds failed this test ------------------------------------------ I have removed most of this post, as it was no longer relevant, after today's testing (21-01-23) Last edited by TDS; 24th January 2023 at 07:20. |
20th January 2023, 14:19 | #19923 | Link | |
Registered User
Join Date: Mar 2011
Posts: 433
|
Quote:
Power Tests: Source: Puss In Boots 4K - 1h:30m:35s AIO 420mm Cooler CQ16 SMDegrain Hard: SMDegrain(video,tr=8,thSAD=800,thSADC=400,contrasharp=true,prefilter=2,refinemotion=true) 7950x Default Frequencies: 3h:26m:08s 7950x Underclocked 4.9GHz at 1.08v: 3h:29m:50s |
|
20th January 2023, 15:59 | #19924 | Link | |
Registered User
Join Date: Aug 2020
Location: Pennsylvania
Posts: 83
|
Quote:
It would be nice if we all could figure out exactly what is the hangup when doing 4k once it appears you start going over 12 cores. There must be some program buffer that just gets overloaded with all that extra information contained in each frame |
|
20th January 2023, 16:19 | #19925 | Link | |
RipBot264 author
Join Date: May 2006
Location: Poland
Posts: 7,815
|
Quote:
__________________
Windows 7 Image Updater - SkyLake\KabyLake\CoffeLake\Ryzen Threadripper Last edited by Atak_Snajpera; 20th January 2023 at 16:23. |
|
21st January 2023, 01:25 | #19926 | Link | ||
Formally known as .......
Join Date: Sep 2021
Location: Down Under.
Posts: 1,000
|
Quote:
https://forum.doom9.org/showthread.p...81#post1980881 I think you will see a HUGE difference Code:
AMD Ryzen 9 7950X 16-Core @ 4.5GHz ( 8C / 16T ) y4m [info]: 1920x1080 fps 50/1 i420p8 sar 1:1 unknown frame count raw [info]: output file: NUL x265 [info]: HEVC encoder version 3.5+85-c2e8e8d13 x265 [info]: build info [Windows][GCC 12.2.0][64 bit] 8bit+10bit+12bit x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 x265 [info]: Main Still Picture profile, Level-4.1 (Main tier) x265 [info]: Thread pool created using 16 threads x265 [info]: Slices : 1 x265 [info]: frame threads / pool features : 4 / wpp(17 rows) x265 [info]: Coding QT: max CU size, min CU size : 64 / 8 x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra x265 [info]: ME / range / subpel / merge : hex / 57 / 2 / 3 x265 [info]: Keyframe min / max / scenecut / bias : 50 / 500 / 40 / 5.00 x265 [info]: Lookahead / bframes / badapt : 20 / 4 / 2 x265 [info]: b-pyramid / weightp / weightb : 1 / 1 / 0 x265 [info]: References / ref-limit cu / depth : 3 / off / on x265 [info]: AQ: mode / str / qg-size / cu-tree : 2 / 1.0 / 32 / 1 x265 [info]: Rate Control / qCompress : CRF-28.0 / 0.60 x265 [info]: tools: rd=3 psy-rd=2.00 early-skip rskip mode=1 signhide tmvp x265 [info]: tools: b-intra strong-intra-smoothing lslices=6 deblock sao encoded 2500 frames in 35.68s (70.07 fps), 10890.80 kbps, Avg QP:35.84 OK, have run a test (same as yesterday, above), and using this :- Tested a 5 minute 4K clip, using 1 x 6 minute chunk size, CQ 16, no filters !! Quote:
So the next step is doing this on the 13900KF... Last edited by TDS; 21st January 2023 at 02:29. |
||
21st January 2023, 17:57 | #19927 | Link |
Registered User
Join Date: Aug 2020
Location: Pennsylvania
Posts: 83
|
So I ran a whole slew of tests to try to answer Atak's question about turning the 7950x into a 7900x in the bios and see what happens when doing 4k encoding. I also did some runs doing a 1080p encode for comparisons to try to figure out if there was one setting that worked best for both.
For the 4k encode, I did Animal House starting 30 minutes in and just used the ending fps of the first chunk for each encoding server. Ran the same smdegrain heavy grain script. This way for each run, the same encoding server did the same chunk each time for control purposes. As a reference, a 12 core 5900x averaged 5.28 fps for all the runs. First up is running the 7950x and 5950x servers with no extra settings, so they were running full out max cores and threads. 7950x single server - 4.5 fps 5950x single server - 3.95 fps 5950x dual svr - 6.52 fps combined 7950x with CCD1 disabled in bios, so running as a 7700x - 4.97fps, so in essence a 7700x is 10 percent faster that a 7950x Next up was running the 7950x and 5950x with: /avisynth-prefetch-threads 12 /x264-threads 16 /x265-threads 16 7950x Dual svr - 8.27 fps combined 7950x single server - 6.95 fps 5950x single server - 5.52 fps 5950x dual svr - 6.89 fps combined Next up was running the 7950x and 5950x with: /affinity 3FFC3FFC for all encoding servers, essentially running them as 12 core cpus 7950x Dual svr - 8.41 fps combined 7950x single server - 7.06 fps 5950x single server - 5.39 fps 5950x dual svr - 6.99 fps combined Then I thought lets run the 5950x and 7950x as 2 separate 8 core machines using an affinity mask of FFFF0000 on 1 server and 0000FFFF on the other. So a dual 7700x and a dual 5700x 7950x Dual svr - 9.73 fps combined 5950x dual svr - 7.14 fps combined Finally I tried on the 7950x running as two 7900x like above, but running 2 server with each affinity mask, so 4 total servers running 7950x quad server - 9.16 fps combined So this is great, running especially the 7950x as 2 separate 7900x showed a significant improvement, but how does that work when doing 1080p stuff. So I re-ran the tests using where eagles dare using the same heavy smdegrain script as above. the 5900x averaged out at 19.75 fps through all the runs. First up is running the 7950x and 5950x servers with no extra settings, so they were running full out max cores and threads. 7950x dual svr - 35.71 fps 5950x single server - 19.43 fps 5950x dual svr - 29.24 fps combined Then running the prefetch 12 setting 7950x dual svr - 35.48 fps 5950x dual svr - 29.11 fps combined Then running the both servers with the /affinity 3FFC3FFC 7950x dual svr - 34.2 fps 5950x dual svr - 27.92 fps combined Then using an affinity mask of FFFF0000 on 1 server and 0000FFFF on the other. So a dual 7700x and a dual 5700x. This was the fastest for 4k 7950x dual svr - 25.72 fps 5950x dual svr - 22.2 fps combined Well this stinks, fastest for 4k , but slowest for lower res by a good bit Finally running the 7950x with 4 servers as above 7950x quad server - 36.17 fps combined So my take on this is basically what we already know. Unless you do something to knock down the core counts on the 16 core Ryzens, they are horrible at doing 4k encoding, but once you do, they scale back up quickly and fall in line with what you would expect as they are still faster than their 12 core brothers. The fact that a 7950x cut in half is faster than stock by 10% is really telling. My best guess is that ffmpeg is the culprit here once the core count goes above 12 in 4k. When running the 16 cores at default max. when I watch the green and blue encoding graph at the bottom of the encoding window, it's the Decoder that is having the most issues and that cpu usage corresponds to ffmpeg in task manager. The green decoder is constantly bouncing up and down with many times there is no blue encoding going on. When encoding is going as it should with reduced threads, the decoding and encoding graph move across the graph in a very steady state, It's the difference between the Plains and the Swiss Alps. The only time I see it wildly bouncing is running all 16 cores maxed out on the 39,59,79 50x's. Last edited by rlev11; 21st January 2023 at 17:59. |
21st January 2023, 22:36 | #19928 | Link |
RipBot264 author
Join Date: May 2006
Location: Poland
Posts: 7,815
|
How about setting ffmpeg.exe affinity to CCD0 and x265.exe to CCD1 manually in task manager?
__________________
Windows 7 Image Updater - SkyLake\KabyLake\CoffeLake\Ryzen Threadripper |
22nd January 2023, 00:34 | #19929 | Link | |
Formally known as .......
Join Date: Sep 2021
Location: Down Under.
Posts: 1,000
|
Quote:
EDIT:- I just did a simple test with RipBot264.exe in Task Manager, and it indeed loses any settings applied, when app is closed. Is there somewhere else that FFmpeg could be changed ?? Some option within RB, maybe ?? As x265 could be changed in its command line/instructions. I did a bit of Googling earlier, and it definitely appears that this IS an issue with FFmpeg Last edited by TDS; 22nd January 2023 at 01:01. |
|
22nd January 2023, 00:39 | #19930 | Link | |
Formally known as .......
Join Date: Sep 2021
Location: Down Under.
Posts: 1,000
|
Quote:
But in the big picture, "we're" only chasing small differences. |
|
22nd January 2023, 14:53 | #19931 | Link |
Registered User
Join Date: Mar 2011
Posts: 433
|
I should also point out that the threading issue is really related to high core counts and not just the 7950x. My Intel Xeon system passes 24 cores and 48 threads to my KVM VM. My tests in the past showed 14 avisynth threads with two encoding servers (24 x264/x265 threads) to be optimum for my setup on that system.
Atak: There is also a little bug in the Audio Description setting if I use the same audio track more than once. When using a 7.1 audio track, I've been setting the Audio1 Description to be something like "AAC 7.1" and using FFMPEG 7.1, then Audio2 I use xcopy stream along with a description like "Dolby TrueHD with Dolby Atmos 7.1". The bug is it will apply a single description to both Audio1 and Audio2. Output from mediainfo: Code:
Audio #1 ID : 2 Format : AAC LC Format/Info : Advanced Audio Codec Low Complexity Codec ID : A_AAC-2 Duration : 1 h 30 min Bit rate : 641 kb/s Channel(s) : 8 channels Channel layout : C L R Ls Rs Lw Rw LFE Sampling rate : 48.0 kHz Frame rate : 46.875 FPS (1024 SPF) Compression mode : Lossy Stream size : 415 MiB (7%) Title : DTS-HD (X11 X) 7.1 Language : English Default : Yes Forced : No Audio #2 ID : 3 Format : DTS XLL X Format/Info : Digital Theater Systems Commercial name : DTS-HD Master Audio Codec ID : A_DTS Duration : 1 h 30 min Bit rate mode : Variable Bit rate : 4 445 kb/s Channel(s) : 8 channels Channel(s)_Original : Object Based ChannelLayout_Original : Object Based Sampling rate : 48.0 kHz Frame rate : 93.750 FPS (512 SPF) Bit depth : 24 bits Stream size : 2.81 GiB (46%) Title : DTS-HD (X11 X) 7.1 Language : English Default : No Forced : No |
22nd January 2023, 16:32 | #19932 | Link | |
RipBot264 author
Join Date: May 2006
Location: Poland
Posts: 7,815
|
Quote:
__________________
Windows 7 Image Updater - SkyLake\KabyLake\CoffeLake\Ryzen Threadripper |
|
22nd January 2023, 19:45 | #19934 | Link | |
Registered User
Join Date: Aug 2020
Location: Pennsylvania
Posts: 83
|
Quote:
I guess my goal is just to find an encoding server setting that seems to work best for ALL encoding jobs. There is still so many variables in play even with my testing to say that a particular server profile is best for a given resolution. Even saying a 7950x run as 2 7700x's is best for the Animal House (1:1.85 AR), doesn't mean that setting would be BEST for a 1:2.40 AR encode or a 1:1.78 AR encode. Add in maybe different degraining scripts might also make a difference. I just want to find a setting for the 16 core Ryzens (I have 5 of them)that runs well with whatever I throw at it even if it may not be the best setting for that particular encode. As of now it seems to me still that adding the avisynth prefetch set to 12 is a happy medium. I mean right now I am able to encode 4k with 15 servers active with some aggressive smdegraining somewhere between 60 and 80 fps depending on the aspect ratio, so things aren't all that bad. I haven't tried it, but i am wondering if we are hard limited to 16 encoding servers in distributed mode. I am 1 more upgrade away from pushing past that. |
|
22nd January 2023, 23:50 | #19935 | Link | |
Formally known as .......
Join Date: Sep 2021
Location: Down Under.
Posts: 1,000
|
Quote:
Code:
/avisynth-prefetch-threads 14 /x264-threads 24 /x265-threads 24 |
|
23rd January 2023, 00:01 | #19936 | Link | |
Formally known as .......
Join Date: Sep 2021
Location: Down Under.
Posts: 1,000
|
Quote:
I would think that adding more DE servers would have to be up to Atak, as he is the ONLY one with the code. I'm sure that if asked nicely, he might consider adding more DE servers, maybe up to 20, or 24, or maybe an option for the user to add or subtract how many are available, and displayed in a "resizable window". I have more than enough machines to occupy more than 16 servers, but ttytt, most of them are too old & now underpowered to warrant their use, and power consumption, and if you have slow PC's "helping" they are just holding up the fast ones, and end up holding up the whole process. Having said that, I would be interested to know what you've got going on there, with your 15 servers? (maybe a PM) sent, thanks. Last edited by TDS; 23rd January 2023 at 07:37. |
|
23rd January 2023, 07:36 | #19937 | Link |
Formally known as .......
Join Date: Sep 2021
Location: Down Under.
Posts: 1,000
|
More x265 testing, only this time on the 13900KF
I have also removed ALL the info here, as it's also irrelevant, and besides, there's a "copy" of it on rlev11's post, below.
Last edited by TDS; 24th January 2023 at 07:22. |
23rd January 2023, 11:20 | #19938 | Link |
Pig on the wing
Join Date: Mar 2002
Location: Finland
Posts: 5,733
|
When you figure out these slowdowns with high thread amounts, have you made sure that Avisynth is not running out of cache memory?
__________________
And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon... |
23rd January 2023, 13:21 | #19939 | Link | |
RipBot264 author
Join Date: May 2006
Location: Poland
Posts: 7,815
|
Quote:
https://www.mediafire.com/file/r2hmt...Source.7z/file
__________________
Windows 7 Image Updater - SkyLake\KabyLake\CoffeLake\Ryzen Threadripper |
|
23rd January 2023, 13:46 | #19940 | Link | |
RipBot264 author
Join Date: May 2006
Location: Poland
Posts: 7,815
|
Quote:
http://avisynth.nl/index.php/Interna...s#SetCacheMode Guys add that at the top of your script and see if this fixes that performance drop on 7850 with 16 prefetch threads with active MDegrain/SMdegrain Code:
SetCacheMode(1)
__________________
Windows 7 Image Updater - SkyLake\KabyLake\CoffeLake\Ryzen Threadripper |
|
Tags |
264, 265, appletv, avchd, bluray, gui, iphone, ipod, ps3, psp, ripbot264, x264 2-pass, x264 gui, x264_64, x265, xbox360 |
|
|