Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > High Efficiency Video Coding (HEVC)

Reply
 
Thread Tools Search this Thread Display Modes
Old 17th February 2023, 19:36   #361  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,770
Quote:
Originally Posted by ReinerSchweinlin View Post
Thanx for your estimation.
The atom cores on these later Xeon PHi indeed are pretty weak in integer and floating point. They do have a 16GB internal Cache though which is accessable by all the cores - maybe that helps.
16 GB? That can fit a lot of frames.

Quote:
But I just checked, the widely available nights Landing Chips only offer a very small subset of AVX 512 Instructions, only the latest ones have a few more modern subsets on board. Is there any ressource to x265 where the used AVX512 modi are listed?
The source code would be the definitive resource. There may be a higher level doc somewhere, but I couldn't find one with a quick search. But "a very small subset" is likely not compatible.

The lack of strong single-threaded perf would be the big bottleneck anyway.

Although, I just recalled that WPP might allow some WPP parallelization; nominally 1 thread per 64 pixels high, although probably only 2x better given overhead. WPP certainly allows for decoder parallelization. Even still, an Atom core is many times slower slower for CABAC-like operations than a modern Xeon core, so that's already factored into comparisons.

Modern video encoding is stressful in pretty much every way, so Amdahl's Law prevents any big improvement in one area from helping all that much.

As I've mentioned before, some years back Intel discovered that x265 pushed Xeon thermals hotter than Intel's on internal thermal test tool's theoretical worst case.

The flip side of this is that encoding benefits some from most improvements; when a new processor says it's "X%-Y%" faster, encoding is always close to the higher Y% value. We get to spend orders of magnitude more MIPS/pixel today than when I started doing compression.

Circa 1996, it took about 80 minutes to encode 1 minute of 320x240p15 on my then rocket-fast PowerMac 8100/80 workstation. I was able to charge $80/minute for a tape-to-file conversion with a $20/min surcharge for VHS (mainly to encourage the client to find the Beta SP master).
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 18th February 2023, 13:37   #362  |  Link
ReinerSchweinlin
Registered User
 
Join Date: Oct 2001
Posts: 454
Quote:
Originally Posted by benwaggoner View Post
16 GB? That can fit a lot of frames.
Yes, 16GB. But its not a "normal" L3 Cache, its referd to "remote L2 Cache". Its bandwith is higher than the 8 Lane DDR4 access, but not as fast as modern L3 Cache. It can be configured to act as a normal transparent Cache (like a L3 Cache), but also accessed with a seperate driver (or in a hybrid mode). too bad there are no motherboards in Europe for these Xeons. I know that its probally not really worth it, but for a small amount of money, I`d satisfy my curiosity and get one


Quote:
The source code would be the definitive resource. There may be a higher level doc somewhere, but I couldn't find one with a quick search. But "a very small subset" is likely not compatible.
Thanx for checking though. I am not deep enough into all this to simply look up the source code and get my answer.

On a side note: When I was tinkering with CPU feature sets yesterday on an 1950x, I found odd performance differences in different runs, depending, turning AVX2 off seemed to speed things up... Seems there is some potential in individually tweaked binary compiles, taylored to a CPU (of course not worth if one wants to distribute it publicly, but tweaking a personal encoding server this way would be fun), so I probably will have to learn to compile stuff like this properly after all...
ok, back to topic...

Quote:
The lack of strong single-threaded perf would be the big bottleneck anyway.
I think so, too.... These atom cores really are weak... Even a core2duo has more ooompf per core

Quote:
Although, I just recalled that WPP might allow some WPP parallelization; nominally 1 thread per 64 pixels high, although probably only 2x better given overhead. WPP certainly allows for decoder parallelization. Even still, an Atom core is many times slower slower for CABAC-like operations than a modern Xeon core, so that's already factored into comparisons.
I remember quality penalties from too much parallelization - is it worth thinking about it or are we talking a few percent difference in efficiency here?

Quote:
Modern video encoding is stressful in pretty much every way, so Amdahl's Law prevents any big improvement in one area from helping all that much.
Maybe at this point its worth mentioning that getting a XEON Phi of course is pure for academic research and interest, tinkering with old stuff, etc.... For anyone reading along - simply getting a modern Desktop CPU is a much better idea

Quote:
As I've mentioned before, some years back Intel discovered that x265 pushed Xeon thermals hotter than Intel's on internal thermal test tool's theoretical worst case.
Haha... Whenever something like this happened back in my days at university - people reverted to introducing the "factor correction"... simply mutliply the whole equation by something that sounds reasonable (I hope Intel does better and to be fair - This was one user group of.... not so scientific members...)..


Quote:
Circa 1996, it took about 80 minutes to encode 1 minute of 320x240p15 on my then rocket-fast PowerMac 8100/80 workstation. I was able to charge $80/minute for a tape-to-file conversion with a $20/min surcharge for VHS (mainly to encourage the client to find the Beta SP master).
AH, I remember these machines, I did some service on them back then... Good times... I still remember some encoding adventures - good old Abit BP6 with two P3 celeron Tualatin CPUs was able to do realtime MPEG2 for SVCD Encoding..

Edit:

Phoronix has some CPU-Infos which might be interesting:
Code:
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 87
model name	: Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz
stepping	: 1
microcode	: 0x1b0
cpu MHz		: 1168.239
cache size	: 1024 KB
physical id	: 0
siblings	: 256
core id		: 0
cpu cores	: 64
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl est tm2 ssse3 fma cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ring3mwait cpuid_fault epb pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms avx512f rdseed adx avx512pf avx512er avx512cd xsaveopt dtherm ida arat pln pts
bugs		: cpu_meltdown spectre_v1 spectre_v2 mds msbds_only
bogomips	: 2600.01
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:
Code:
rchitecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          256
On-line CPU(s) list:             0-255
Thread(s) per core:              4
Core(s) per socket:              64
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           87
Model name:                      Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz
Stepping:                        1
CPU MHz:                         1192.466
CPU max MHz:                     1500.0000
CPU min MHz:                     1000.0000
BogoMIPS:                        2600.01
L1d cache:                       2 MiB
L1i cache:                       2 MiB
L2 cache:                        32 MiB
NUMA node0 CPU(s):               0-255
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Vulnerable: Clear CPU buffers attempted, no microcode; SMT mitigated
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Not affected
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full generic retpoline, STIBP disabled, RSB filling
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl est tm2 ssse3 fma cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ring3mwait cpuid_fault epb pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms avx512f rdseed adx avx512pf avx512er avx512cd xsaveopt dtherm ida arat pln pts

Last edited by ReinerSchweinlin; 18th February 2023 at 14:37.
ReinerSchweinlin is offline   Reply With Quote
Old 18th February 2023, 18:32   #363  |  Link
wyliec2
Registered User
 
Join Date: Feb 2023
Posts: 5
Quote:
Originally Posted by benwaggoner View Post
I consider it "fair" to use --pmode if it is only used when it increases throughput in a given configuration, and turned off when it doesn't. As --pmode doesn't decrease quality (and can theoretically increase it a bit).
I have been experimenting with with pmode on my 5950X platform.

I typically encode in Slow, Slower or Very Slow.

For all 4K encodes, the CPU will run at 90+% utilization and pmode causes encodes to take longer.

For BD encodes at Slow or Slower, the encodes take longer with pmode.

For BD encodes at Very Slow, pmode does reduce encode time - in one example an encode took 13 hours at Very Slow and 10 hours at Very Slow with pmode. It also seems to increase CPU utilization around 20% (from mid 40% to mid 60%).

I've only tested on 3 files and while two showed slightly smaller output file size, one showed a significant output size reduction (4247 MB without pmode and 3474 MB with pmode).

Nothing in documentation or what I've read here, lead me to expect this result....wondering if there are any thoughts/comments on this result..??
wyliec2 is offline   Reply With Quote
Old 18th February 2023, 23:07   #364  |  Link
DMD
Registered User
 
DMD's Avatar
 
Join Date: Jan 2006
Location: Italy
Posts: 259
Quote:
Originally Posted by benwaggoner View Post
Slower is the fastest preset where some of of HEVC's more modern features kick in, like relatively deep TU recursion, weighted b-frame prediction, and B-intra encoding. It's the setting I start with by default, and iterate from. It has somewhat reduced parallelism (lookahead-slices 1 instead of 4), so might not be as optimal for benchmarking with many cores available.

Apples-to-apples comparisons can't rely on just presets, however. The number of frame threads can have a big impact on perf and a smaller impact on quality, and the default number of frame threads is based on how many cores are available. Thus comparing two processors with different core counts can see the processor with more cores running with more frame threads, improving encoding speed but potentially reducing quality. So not quite apples-to-apples.

Benchmarking is hard to do in a broadly applicable way, because there are so many encoding scenarios that can impact relative performance. Comparing at slow with default frame threads is certainly a scenario that will matter to plenty of people. For me, comparing with --preset slower --frame-threads 1 would have the most relevance. Benchmarking for realtime encoding would be very different, as predictable worst-case encoding time becomes essential. Plenty of benchmarks just compare with stock default settings.

I see you are comparing with --pmode (makes good sense if you have a lot of cores relative to frame size, but can slow things down if there aren't enough cores) and --pme (which is a net negative unless you have a whole lot of cores encoding sub-HD resolutions).

I consider it "fair" to use --pmode if it is only used when it increases throughput in a given configuration, and turned off when it doesn't. As --pmode doesn't decrease quality (and can theoretically increase it a bit).

The same can apply to using --pme selectively, although the cores needed to make it a net positive are a lot higher. But for 480p with 64 cores or something, it probably would help. I personally rarely test with more than 18/36 available for any given encoder instance. Although with all the ARM patches, Graviton2/3 with 64 cores deserves some benchmarking as well.
After your comprehensive answer, I apologize for this inexperienced question.
Taking into account that my CPU (Ryzen 7950) has 16C/32T, to perform x265 encoding of 4K HDR files, I disabled "pmode" should I also disable "pme" from my script to avoid long encoding time or how could I improve my script?
Thank you very much

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--crf 16 --preset slower --output-depth 10 --profile main10 --level-idc 5.1 --rd-refine --vbv-bufsize 100000 --vbv-maxrate 100000 --hme-search umh,umh,star --hme --min-keyint 1 --keyint 24 --no-open-gop --pme --master-display "G(8500,39850)B(6550,2300)R(35400,14600)WP(15635,16450)L(10000000,1)" --colorprim bt2020 --colormatrix bt2020nc --transfer smpte2084 --range limited --max-cll "1000,400" --sar 1:1 --no-info --repeat-headers --aud --hrd --uhd-bd
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
__________________
my PC with Ryzen 7950X

Last edited by DMD; 19th February 2023 at 16:35.
DMD is offline   Reply With Quote
Old 22nd February 2023, 04:17   #365  |  Link
TDS
Formally known as .......
 
TDS's Avatar
 
Join Date: Sep 2021
Location: Down Under.
Posts: 993
Quote:
Originally Posted by DMD View Post
After your comprehensive answer, I apologize for this inexperienced question.
Taking into account that my CPU (Ryzen 7950) has 16C/32T, to perform x265 encoding of 4K HDR files, I disabled "pmode" should I also disable "pme" from my script to avoid long encoding time or how could I improve my script?
Thank you very much

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--crf 16 --preset slower --output-depth 10 --profile main10 --level-idc 5.1 --rd-refine --vbv-bufsize 100000 --vbv-maxrate 100000 --hme-search umh,umh,star --hme --min-keyint 1 --keyint 24 --no-open-gop --pme --master-display "G(8500,39850)B(6550,2300)R(35400,14600)WP(15635,16450)L(10000000,1)" --colorprim bt2020 --colormatrix bt2020nc --transfer smpte2084 --range limited --max-cll "1000,400" --sar 1:1 --no-info --repeat-headers --aud --hrd --uhd-bd
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hello yet again, DMD,

Like I said in the Staxrip thread, I use RipBot264, but the Pauly Dunne builds, which have so much more to offer, than the standard one...but I digress.

Several "power" users have commented how the 16 core Ryzens, "fall off a cliff" when encoding certain X265 video's, but "we" have come up with a "fix" that is part of the encoders command's that really gets them to do the job they're supposed to do, as well as custom x265 command's as well.

I am VERY happy with the way my 3950X, 5950X & the 7950X are performing, as well as the interloper, the 13900KF

I must admit that my 5950X was being bested by the 5900X with almost everything, but I changed some basic BIOS setting's and it's working better that ever before
__________________
Long term RipBot264 user.

RipBot264 modded builds..
TDS is offline   Reply With Quote
Old 22nd February 2023, 08:29   #366  |  Link
DMD
Registered User
 
DMD's Avatar
 
Join Date: Jan 2006
Location: Italy
Posts: 259
Quote:
Originally Posted by TDS View Post
Hello yet again, DMD,

Like I said in the Staxrip thread, I use RipBot264, but the Pauly Dunne builds, which have so much more to offer, than the standard one...but I digress.

Several "power" users have commented how the 16 core Ryzens, "fall off a cliff" when encoding certain X265 video's, but "we" have come up with a "fix" that is part of the encoders command's that really gets them to do the job they're supposed to do, as well as custom x265 command's as well.

I am VERY happy with the way my 3950X, 5950X & the 7950X are performing, as well as the interloper, the 13900KF

I must admit that my 5950X was being bested by the 5900X with almost everything, but I changed some basic BIOS setting's and it's working better that ever before
I didn't know that, and I'm very surprised that the Ryzens have this problem with x265 encoding.
But I am also very happy that a solution has been found to make them work at their maximum performance.
I don't know how to apply the "fix" and sari happy to know how to do it.
As for my bios ( ASUS ROG Strix X670E-F Gaming WiFi) I only performed optimization for RAM and fast boot.
Thank you very much
__________________
my PC with Ryzen 7950X

Last edited by DMD; 22nd February 2023 at 08:31.
DMD is offline   Reply With Quote
Old 23rd February 2023, 12:17   #367  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,058
Quote:
Originally Posted by DMD View Post
Taking into account that my CPU (Ryzen 7950) how could I improve my script?
With AMD 7xxx you may try to force usage of AVX512 with --asm avx512 . If it will not cause overheating clock trottling (https://www.hwcooling.net/en/intel-a...-does-it-help/ ) you may got some performance benefit.

Last edited by DTL; 23rd February 2023 at 12:21.
DTL is offline   Reply With Quote
Old 23rd February 2023, 18:50   #368  |  Link
DMD
Registered User
 
DMD's Avatar
 
Join Date: Jan 2006
Location: Italy
Posts: 259
Quote:
Originally Posted by DTL View Post
With AMD 7xxx you may try to force usage of AVX512 with --asm avx512 . If it will not cause overheating clock trottling (https://www.hwcooling.net/en/intel-a...-does-it-help/ ) you may got some performance benefit.
Many thanks for the suggestion.
__________________
my PC with Ryzen 7950X
DMD is offline   Reply With Quote
Old 27th February 2023, 10:53   #369  |  Link
ReinerSchweinlin
Registered User
 
Join Date: Oct 2001
Posts: 454
Quote:
Originally Posted by DMD View Post
I didn't know that, and I'm very surprised that the Ryzens have this problem with x265 encoding.
But I am also very happy that a solution has been found to make them work at their maximum performance.
I don't know how to apply the "fix" and sari happy to know how to do it.
As for my bios ( ASUS ROG Strix X670E-F Gaming WiFi) I only performed optimization for RAM and fast boot.
Thank you very much
Since I also have a 1950x lying around, IŽd be happy to read more about that "fix" - could someone point me in the right direction with a link please ?
ReinerSchweinlin is offline   Reply With Quote
Old 27th February 2023, 11:50   #370  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Finland
Posts: 5,729
It would be interesting to hear since I have a 5950X and have zero issues with getting the CPU work at 80-100% usage level when encoding with x265.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 4th March 2023, 23:16   #371  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,770
Quote:
Originally Posted by DMD View Post
After your comprehensive answer, I apologize for this inexperienced question.
Taking into account that my CPU (Ryzen 7950) has 16C/32T, to perform x265 encoding of 4K HDR files, I disabled "pmode" should I also disable "pme" from my script to avoid long encoding time or how could I improve my script?
You definitely want to have --pme off; I've never seen it boost throughput with anything above 480p. You should get a >2x speed improvement turning it off.

--pmode has a much bigger chance to be helpful, I'd say it's likely useful above 20 threads for 4K if using --frame-threads 1. The more modes being evaluated, the more parallelization for --pmode to take advantage of.

Using only a single frame thread can improve quality, but limits parallelization a lot, and combining it with --pmode can get some of that perf back if you have enough cores.

Looking at the reset of your command line:
--rd-refine doesn't do anything in a single pass, which your encode is.
Is --no-open-gop still required for BD compatibility with x265 (they are certainly supported by the BD format itself). With 24 frame GOPs, open GOP can provide some real benefit. If you're stuck with --no-open-gop, you could try --radl 2 to get some of the same benefit.
I don't know that --hme has proven to be that helpful. You should try with it off to see if it provides any benefit with your content.
If there is much grain in the source --rd 4 can both improve quality and throughput.

Quote:
--crf 16 --preset slower --output-depth 10 --profile main10 --level-idc 5.1 --rd-refine --vbv-bufsize 100000 --vbv-maxrate 100000 --hme-search umh,umh,star --hme --min-keyint 1 --keyint 24 --no-open-gop --pme --master-display "G(8500,39850)B(6550,2300)R(35400,14600)WP(15635,16450)L(10000000,1)" --colorprim bt2020 --colormatrix bt2020nc --transfer smpte2084 --range limited --max-cll "1000,400" --sar 1:1 --no-info --repeat-headers --aud --hrd --uhd-bd
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 5th March 2023, 13:47   #372  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Finland
Posts: 5,729
Quote:
Originally Posted by benwaggoner View Post
--rd-refine doesn't do anything in a single pass, which your encode is.
It does work on CRF encodes, it doesn't need a stats file or anything. I've been trying to figure out what it actually does or what the use case is but I have no clue.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 5th March 2023, 22:54   #373  |  Link
DMD
Registered User
 
DMD's Avatar
 
Join Date: Jan 2006
Location: Italy
Posts: 259
Quote:
Originally Posted by benwaggoner View Post
You definitely want to have --pme off; I've never seen it boost throughput with anything above 480p. You should get a >2x speed improvement turning it off.

--pmode has a much bigger chance to be helpful, I'd say it's likely useful above 20 threads for 4K if using --frame-threads 1. The more modes being evaluated, the more parallelization for --pmode to take advantage of.

Using only a single frame thread can improve quality, but limits parallelization a lot, and combining it with --pmode can get some of that perf back if you have enough cores.

Looking at the reset of your command line:
--rd-refine doesn't do anything in a single pass, which your encode is.
Is --no-open-gop still required for BD compatibility with x265 (they are certainly supported by the BD format itself). With 24 frame GOPs, open GOP can provide some real benefit. If you're stuck with --no-open-gop, you could try --radl 2 to get some of the same benefit.
I don't know that --hme has proven to be that helpful. You should try with it off to see if it provides any benefit with your content.
If there is much grain in the source --rd 4 can both improve quality and throughput.
Thank you very much for the advice, I will do some tests for a better result.
Using StaxRip I had a chance to do some tests with "number of parallel process" and "Chuncks", and I noticed that by setting the maximum value (16) for both parallel processes and Chunks, I got higher process speed, but also missing video frames.
In my personal configuration with a setting of 3-3 I was able to get a slight speed increase without any side effects, using the commands I included in the previous post.
__________________
my PC with Ryzen 7950X
DMD is offline   Reply With Quote
Old 6th March 2023, 06:22   #374  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,770
Quote:
Originally Posted by Boulder View Post
It does work on CRF encodes, it doesn't need a stats file or anything. I've been trying to figure out what it actually does or what the use case is but I have no clue.
You're right; was thinking of a different parameter.

Quote:
--rd-refine, --no-rd-refine
For each analysed CU, calculate R-D cost on the best partition mode for a range of QP values, to find the optimal rounding effect. Default disabled.

Only effective at RD levels 5 and 6
It should offer a slight overall compression efficiency improvement.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 18th March 2023, 15:07   #375  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Finland
Posts: 5,729
Quote:
Originally Posted by benwaggoner View Post
--pmode has a much bigger chance to be helpful, I'd say it's likely useful above 20 threads for 4K if using --frame-threads 1. The more modes being evaluated, the more parallelization for --pmode to take advantage of.

Using only a single frame thread can improve quality, but limits parallelization a lot, and combining it with --pmode can get some of that perf back if you have enough cores.
Just for fun, I tested frame-threads from 5 (the default for my CPU) to 1 and then with pmode on my 5950X (16C/32T). The effect of pmode on the compression efficiency is much bigger than I anticipated. The speed increase was weird because my CPU usage is already around 90-100% when encoding with the default frame-threads and no pmode.

I ran this test on a 720p encode, normal setup and settings for my 1080p->720p encodes to the media library. I do use some uncommon parameters like --no-limit-modes and --rskip 0 which probably affect the results compared to standard presets.

I seriously need to test the 4K encodes as well.

F 5 - 5718.31 kbps - 7.11 fps
F 4 - 5713.13 kbps - 6.93 fps
F 3 - 5708.73 kbps - 6.74 fps
F 2 - 5715.94 kbps - 6.23 fps (odd that the size went up..)
F 1 - 5695.93 kbps - 4.50 fps
F 1 + pmode - 5490.78 kbps - 5.88 fps
F 2 + pmode - 5521.68 kbps - 7.43 fps
F 3 + pmode - 5515.12 kbps - 7.72 fps
F 4 + pmode - 5519.36 kbps - 7.83 fps
F 5 + pmode - 5521.10 kbps - 8.01 fps
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...

Last edited by Boulder; 18th March 2023 at 15:30. Reason: pmode for pme
Boulder is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 21:09.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.