Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
![]() |
#361 | Link | ||
Moderator
![]() Join Date: Jan 2006
Location: Portland, OR
Posts: 4,412
|
Quote:
Quote:
The lack of strong single-threaded perf would be the big bottleneck anyway. Although, I just recalled that WPP might allow some WPP parallelization; nominally 1 thread per 64 pixels high, although probably only 2x better given overhead. WPP certainly allows for decoder parallelization. Even still, an Atom core is many times slower slower for CABAC-like operations than a modern Xeon core, so that's already factored into comparisons. Modern video encoding is stressful in pretty much every way, so Amdahl's Law prevents any big improvement in one area from helping all that much. As I've mentioned before, some years back Intel discovered that x265 pushed Xeon thermals hotter than Intel's on internal thermal test tool's theoretical worst case. The flip side of this is that encoding benefits some from most improvements; when a new processor says it's "X%-Y%" faster, encoding is always close to the higher Y% value. We get to spend orders of magnitude more MIPS/pixel today than when I started doing compression. Circa 1996, it took about 80 minutes to encode 1 minute of 320x240p15 on my then rocket-fast PowerMac 8100/80 workstation. I was able to charge $80/minute for a tape-to-file conversion with a $20/min surcharge for VHS (mainly to encourage the client to find the Beta SP master). |
||
![]() |
![]() |
![]() |
#362 | Link | ||||||
Registered User
Join Date: Oct 2001
Posts: 428
|
Yes, 16GB. But its not a "normal" L3 Cache, its referd to "remote L2 Cache". Its bandwith is higher than the 8 Lane DDR4 access, but not as fast as modern L3 Cache. It can be configured to act as a normal transparent Cache (like a L3 Cache), but also accessed with a seperate driver (or in a hybrid mode). too bad there are no motherboards in Europe for these Xeons. I know that its probally not really worth it, but for a small amount of money, I`d satisfy my curiosity and get one
![]() Quote:
On a side note: When I was tinkering with CPU feature sets yesterday on an 1950x, I found odd performance differences in different runs, depending, turning AVX2 off seemed to speed things up... Seems there is some potential in individually tweaked binary compiles, taylored to a CPU (of course not worth if one wants to distribute it publicly, but tweaking a personal encoding server this way would be fun), so I probably will have to learn to compile stuff like this properly after all... ok, back to topic... Quote:
![]() Quote:
Quote:
![]() Quote:
Quote:
Edit: Phoronix has some CPU-Infos which might be interesting: Code:
processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 87 model name : Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz stepping : 1 microcode : 0x1b0 cpu MHz : 1168.239 cache size : 1024 KB physical id : 0 siblings : 256 core id : 0 cpu cores : 64 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl est tm2 ssse3 fma cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ring3mwait cpuid_fault epb pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms avx512f rdseed adx avx512pf avx512er avx512cd xsaveopt dtherm ida arat pln pts bugs : cpu_meltdown spectre_v1 spectre_v2 mds msbds_only bogomips : 2600.01 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: Code:
rchitecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 46 bits physical, 48 bits virtual CPU(s): 256 On-line CPU(s) list: 0-255 Thread(s) per core: 4 Core(s) per socket: 64 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 87 Model name: Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz Stepping: 1 CPU MHz: 1192.466 CPU max MHz: 1500.0000 CPU min MHz: 1000.0000 BogoMIPS: 2600.01 L1d cache: 2 MiB L1i cache: 2 MiB L2 cache: 32 MiB NUMA node0 CPU(s): 0-255 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT mitigated Vulnerability Meltdown: Mitigation; PTI Vulnerability Spec store bypass: Not affected Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Full generic retpoline, STIBP disabled, RSB filling Vulnerability Tsx async abort: Not affected Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl est tm2 ssse3 fma cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ring3mwait cpuid_fault epb pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms avx512f rdseed adx avx512pf avx512er avx512cd xsaveopt dtherm ida arat pln pts Last edited by ReinerSchweinlin; 18th February 2023 at 14:37. |
||||||
![]() |
![]() |
![]() |
#363 | Link | |
Registered User
Join Date: Feb 2023
Posts: 5
|
Quote:
I typically encode in Slow, Slower or Very Slow. For all 4K encodes, the CPU will run at 90+% utilization and pmode causes encodes to take longer. For BD encodes at Slow or Slower, the encodes take longer with pmode. For BD encodes at Very Slow, pmode does reduce encode time - in one example an encode took 13 hours at Very Slow and 10 hours at Very Slow with pmode. It also seems to increase CPU utilization around 20% (from mid 40% to mid 60%). I've only tested on 3 files and while two showed slightly smaller output file size, one showed a significant output size reduction (4247 MB without pmode and 3474 MB with pmode). Nothing in documentation or what I've read here, lead me to expect this result....wondering if there are any thoughts/comments on this result..?? |
|
![]() |
![]() |
![]() |
#364 | Link | |
Registered User
Join Date: Jan 2006
Location: Italy
Posts: 244
|
Quote:
Taking into account that my CPU (Ryzen 7950) has 16C/32T, to perform x265 encoding of 4K HDR files, I disabled "pmode" should I also disable "pme" from my script to avoid long encoding time or how could I improve my script? Thank you very much ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --crf 16 --preset slower --output-depth 10 --profile main10 --level-idc 5.1 --rd-refine --vbv-bufsize 100000 --vbv-maxrate 100000 --hme-search umh,umh,star --hme --min-keyint 1 --keyint 24 --no-open-gop --pme --master-display "G(8500,39850)B(6550,2300)R(35400,14600)WP(15635,16450)L(10000000,1)" --colorprim bt2020 --colormatrix bt2020nc --transfer smpte2084 --range limited --max-cll "1000,400" --sar 1:1 --no-info --repeat-headers --aud --hrd --uhd-bd ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
__________________
my PC with Ryzen 7950X Last edited by DMD; 19th February 2023 at 16:35. |
|
![]() |
![]() |
![]() |
#365 | Link | |
Formally known as .......
Join Date: Sep 2021
Location: On a need to know basis.
Posts: 584
|
Quote:
Like I said in the Staxrip thread, I use RipBot264, but the Pauly Dunne builds, which have so much more to offer, than the standard one...but I digress. Several "power" users have commented how the 16 core Ryzens, "fall off a cliff" when encoding certain X265 video's, but "we" have come up with a "fix" that is part of the encoders command's that really gets them to do the job they're supposed to do, as well as custom x265 command's as well. I am VERY happy with the way my 3950X, 5950X & the 7950X are performing, as well as the interloper, the 13900KF ![]() I must admit that my 5950X was being bested by the 5900X with almost everything, but I changed some basic BIOS setting's and it's working better that ever before ![]()
__________________
This can be SO "TeDiouS".. Long term RipBot264 user. #1 Ryzen 7950X #2 Intel i9-13900KF #3 Ryzen 5950X #4 Ryzen 5900X #5 Ryzen 3950X |
|
![]() |
![]() |
![]() |
#366 | Link | |
Registered User
Join Date: Jan 2006
Location: Italy
Posts: 244
|
Quote:
But I am also very happy that a solution has been found to make them work at their maximum performance. I don't know how to apply the "fix" and sari happy to know how to do it. As for my bios ( ASUS ROG Strix X670E-F Gaming WiFi) I only performed optimization for RAM and fast boot. Thank you very much
__________________
my PC with Ryzen 7950X Last edited by DMD; 22nd February 2023 at 08:31. |
|
![]() |
![]() |
![]() |
#367 | Link | |
Registered User
Join Date: Jul 2018
Posts: 758
|
Quote:
Last edited by DTL; 23rd February 2023 at 12:21. |
|
![]() |
![]() |
![]() |
#368 | Link | |
Registered User
Join Date: Jan 2006
Location: Italy
Posts: 244
|
Quote:
__________________
my PC with Ryzen 7950X |
|
![]() |
![]() |
![]() |
#369 | Link | |
Registered User
Join Date: Oct 2001
Posts: 428
|
Quote:
|
|
![]() |
![]() |
![]() |
#370 | Link |
Pig on the wing
Join Date: Mar 2002
Location: Finland
Posts: 5,584
|
It would be interesting to hear since I have a 5950X and have zero issues with getting the CPU work at 80-100% usage level when encoding with x265.
__________________
And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon... |
![]() |
![]() |
![]() |
#371 | Link | ||
Moderator
![]() Join Date: Jan 2006
Location: Portland, OR
Posts: 4,412
|
Quote:
--pmode has a much bigger chance to be helpful, I'd say it's likely useful above 20 threads for 4K if using --frame-threads 1. The more modes being evaluated, the more parallelization for --pmode to take advantage of. Using only a single frame thread can improve quality, but limits parallelization a lot, and combining it with --pmode can get some of that perf back if you have enough cores. Looking at the reset of your command line: --rd-refine doesn't do anything in a single pass, which your encode is. Is --no-open-gop still required for BD compatibility with x265 (they are certainly supported by the BD format itself). With 24 frame GOPs, open GOP can provide some real benefit. If you're stuck with --no-open-gop, you could try --radl 2 to get some of the same benefit. I don't know that --hme has proven to be that helpful. You should try with it off to see if it provides any benefit with your content. If there is much grain in the source --rd 4 can both improve quality and throughput. Quote:
|
||
![]() |
![]() |
![]() |
#372 | Link |
Pig on the wing
Join Date: Mar 2002
Location: Finland
Posts: 5,584
|
It does work on CRF encodes, it doesn't need a stats file or anything. I've been trying to figure out what it actually does or what the use case is but I have no clue.
__________________
And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon... |
![]() |
![]() |
![]() |
#373 | Link | |
Registered User
Join Date: Jan 2006
Location: Italy
Posts: 244
|
Quote:
Using StaxRip I had a chance to do some tests with "number of parallel process" and "Chuncks", and I noticed that by setting the maximum value (16) for both parallel processes and Chunks, I got higher process speed, but also missing video frames. In my personal configuration with a setting of 3-3 I was able to get a slight speed increase without any side effects, using the commands I included in the previous post.
__________________
my PC with Ryzen 7950X |
|
![]() |
![]() |
![]() |
#374 | Link | ||
Moderator
![]() Join Date: Jan 2006
Location: Portland, OR
Posts: 4,412
|
Quote:
Quote:
|
||
![]() |
![]() |
![]() |
#375 | Link | |
Pig on the wing
Join Date: Mar 2002
Location: Finland
Posts: 5,584
|
Quote:
I ran this test on a 720p encode, normal setup and settings for my 1080p->720p encodes to the media library. I do use some uncommon parameters like --no-limit-modes and --rskip 0 which probably affect the results compared to standard presets. I seriously need to test the 4K encodes as well. F 5 - 5718.31 kbps - 7.11 fps F 4 - 5713.13 kbps - 6.93 fps F 3 - 5708.73 kbps - 6.74 fps F 2 - 5715.94 kbps - 6.23 fps (odd that the size went up..) F 1 - 5695.93 kbps - 4.50 fps F 1 + pmode - 5490.78 kbps - 5.88 fps F 2 + pmode - 5521.68 kbps - 7.43 fps F 3 + pmode - 5515.12 kbps - 7.72 fps F 4 + pmode - 5519.36 kbps - 7.83 fps F 5 + pmode - 5521.10 kbps - 8.01 fps
__________________
And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon... Last edited by Boulder; 18th March 2023 at 15:30. Reason: pmode for pme |
|
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
Display Modes | |
|
|