View Full Version : x265 HEVC Encoder
CruNcher
18th February 2017, 13:06
Especialy it will be interesting how that plays out at 65/35W vs KabyLake at those 14nm between Globalfoundries and Intel :)
It is the first time Intel lost their shrinking advantage 11nm products still will take a little :)
and we see how much better TSMC did partly vs GF allready between Polaris and Paxwell and that 16nm vs 14nm
Though Polaris is overall slightly more complex especially through it's ACE.
mandarinka
18th February 2017, 13:19
Something tells me that Ryzen may not have great performance in AVX2(FMA).
That's known, but the 256bit units and data paths in Intel archs cost a lot in active power usage and silicon die area. It is likely that Zen can clock it's 128bit infrastructure higher thanks to keeping it more narrow. So the theoretical hit from not getting fast AVX2 might be mitigated by raised overall performance (and higher number of cores the architecture allowed).
I suspect that this effect might be even a net gain in x264, which only draws limited boost from AVX2. For x265, there is going to be overall loss and relative performance against Skylake is going to be worse in Zen, compared to situation in x264. But the hit might be lesser than expected as said above.
Atak_Snajpera
18th February 2017, 13:24
The lack of efficient AVX2 may hurt a lot in x265
https://i.imgsafe.org/83ceca5cee.png
Intel Xeon E5-2690 @ 2.9GHz (8c / 16t) [NO AVX2] 16.4 fps -> 16.4 / 3.2GHz (Turbo boost) / 16t = 0.32
Intel Core i7-6700K @ 4.5GHz (4c / 8t) [AVX2] 22.6 fps -> 22.6 / 4.5GHz / 8t = 0.63
nevcairiel
18th February 2017, 13:42
This thread should really stick to discussing the x265 encoder itself, there is a hardware section on the forums you can discuss Ryzen and other CPUs. :)
NikosD
18th February 2017, 14:34
Are you talking to yourself ?
Because you posted your opinion regarding RyZen and Intel's HEDT CPUs in VP9 thread
Ryzen doesn't have a iGPU.
Ryzen is at a point between Intels consumer models and the HEDT platform. It offers 8 cores at a price somewhere in the HEDT range (ie. latest rumors put EU prices at ~600€ for the 1800X, which is slightly above a i7 6850k, or the 1700X for ~470€ which sits similar to a 6800k), but on the other hand it doesn't have some of the HEDT features like extra PCIe lanes for multi-GPU.
So you can write about RyZen in VP9 thread,but we can't write about RyZen in x265 thread.
A little hypocritical, don't you think ?
CruNcher
18th February 2017, 14:47
It's moderators normal logic to brake up the discussion flow of off-topic, it is a strange thing because nothing is really off-topic if it interconnects in some way for a short time only if the timespan is way to long i personally can understand it to interfere in the discussion flow (making aware of it and moving it), i never gonna accept this strict idea of some moderators of offtopicsm but you get forced too obey that logic of off topic, or you risk to be punished at the hardest consequence.
But im especially against the punishment reaction of some moderators instead of taking appropriate actions in that case.
pingfr
18th February 2017, 20:12
Hey guys,
At the moment all I have on my hands is an i7-6700 (non K, locked at 3.7GHz) and while running a quick test encode, I am experiencing encoding speeds which I find "abysmal" to say the least.
I have two questions:
1) With the identical parameters: --crf 18 --aq-mode 3 --deblock -3:-3 --no-strong-intra-smoothing --no-sao while using the --preset slow, the encoder reports an average speed of 3.86 fps, encoding the clip in 647.09s.
However when using the same parameters but instead switching from --preset slow to --preset veryslow, the speed drops to a measly 0.30 fps, and that is of course while using an i7-6700 which is the 6th generation from Intel, that can't be right, can it? Could anyone here tell me which of the parameters included in the --preset veryslow compared to the --preset slow is the "culprit" here? the idea is to retain most of the "quality improvements" from the veryslow preset while retaining the "merely acceptable" encoding speeds yield by the slow preset.
2) In my testings, it appears switching from the default aq-mode which I believe is 1.0, to aq-mode 3.0 would decrease the encoding speeds by another 1fps, given at the best I would get a 4fps~ encoding speed, the use of an --aq-mode 3 would drop it from a solid 4fps to 3fps, that's a relative 25% performance hit right there, is this intended and does anyone here have an explanation in that regard?
Thanks.
Sample console output below in --preset slow for reference, neither running it as --preset slower or veryslow for now though, that would take over an hour to encode this 2500 frames clip.
C:\x26x>avs4x265.exe --preset slow --crf 18 --aq-mode 3 --deblock -3:-3 --no-strong-intra-smoothing --no-sao x264\SVT_1080p50.mkv.avs -o slow.hevc
avs [info]: AviSynth 2.60, build:Mar 31 2015 [16:38:54]
avs [info]: Video colorspace: YV12
avs [info]: Video resolution: 1920x1080
avs [info]: Video framerate: 50/1
avs [info]: Video framecount: 2500
avs4x265 [info]: "x265" - --frames 2500 --fps 50/1 --input-res 1920x1080 --input-csp i420 --preset slow --crf 18 --aq-mode 3 --deblock -3:-3 --no-strong-intra-smoothing --no-sao -o slow.hevc
yuv [info]: 1920x1080 fps 50/1 i420p8 unknown frame count
raw [info]: output file: slow.hevc
x265 [info]: HEVC encoder version 2.3+7-c15f8bce9f4b
x265 [info]: build info [Windows][GCC 6.3.0][64 bit] 8bit+10bit+12bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
x265 [info]: Main profile, Level-4.1 (Main tier)
x265 [info]: Thread pool created using 8 threads
x265 [info]: Slices : 1
x265 [info]: frame threads / pool features : 3 / wpp(17 rows)
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge : star / 57 / 3 / 3
x265 [info]: Keyframe min / max / scenecut / bias: 25 / 250 / 40 / 5.00
x265 [info]: Lookahead / bframes / badapt : 25 / 4 / 2
x265 [info]: b-pyramid / weightp / weightb : 1 / 1 / 0
x265 [info]: References / ref-limit cu / depth : 4 / on / on
x265 [info]: AQ: mode / str / qg-size / cu-tree : 3 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress : CRF-18.0 / 0.60
x265 [info]: tools: rect limit-modes rd=4 psy-rd=2.00 rdoq=2 psy-rdoq=1.00
x265 [info]: tools: rskip signhide tmvp lslices=4 deblock(tC=-3:B=-3)
x265 [info]: frame I: 10, Avg QP:19.26 kb/s: 167732.56
x265 [info]: frame P: 593, Avg QP:20.62 kb/s: 109843.70
x265 [info]: frame B: 1897, Avg QP:26.18 kb/s: 23832.23
x265 [info]: Weighted P-Frames: Y:4.2% UV:3.0%
x265 [info]: consecutive B-frames: 0.7% 1.7% 5.1% 67.5% 25.0%
encoded 2500 frames in 647.09s (3.86 fps), 44809.76 kb/s, Avg QP:24.83
Also noticed an odd behavior, the encoded result file is much larger than the source, what gives?
Source: 181.288.067 bytes.
Encode (slow preset): 280.072.779 bytes.
Wasn't the whole intent and philosophy behind x265 "smaller encoded file size due to better compression over x264 at the cost of more CPU cycles?", the result is 100MBytes larger than it's sources counterpart...
WhatZit
19th February 2017, 00:56
2) In my testings, it appears switching from the default aq-mode which I believe is 1.0, to aq-mode 3.0 would decrease the encoding speeds by another 1fps, given at the best I would get a 4fps~ encoding speed, the use of an --aq-mode 3 would drop it from a solid 4fps to 3fps, that's a relative 25% performance hit right there, is this intended and does anyone here have an explanation in that regard?
Your settings strongly suggest that fine detail preservation is a priority for your encodes.
From my own testing, I've found that --aq-mode 3 is a poor substitute (quality-wise) for --tune grain. Whilst it is slightly faster, it somehow manages to retain less overall detail (according to my eyes) whilst using a larger bitrate than grain does.
Also noticed an odd behavior, the encoded result file is much larger than the source, what gives?
Firstly, is the source noisy or grainy, with high-contrast detail, or has it otherwise been artificially post-sharpened (excessive peak edging)?
These are the worst types of source for x265 to compress when you start weighting the encoder for detail preservation. It is entirely normal for x265's efficiency to barely exhibit any improvement over x264 in those cases, and you can easily go backwards depending on source ("sharpened" sources especially so).
Secondly, when you have x265 set to preserve detail, the slower speed settings will, indeed, preserve more detail, hence produce larger file sizes. The same settings (i.e. --tune grain) will produce a 25% larger file at --preset slow than they will at --preset fast.
Wasn't the whole intent and philosophy behind x265 "smaller encoded file size due to better compression over x264 at the cost of more CPU cycles?", the result is 100MBytes larger than it's sources counterpart...
x265's lossy compression efficiency isn't magic. It is simply more aggressive (or clever, depending on your point of view) than x264 with what visual data it chooses to include in the encoded bitstream, and it can accordingly decode watchable scenes from much less data than x264. In other words, by default, it leaves out as much visual information as possible during an encode because it doesn't need as much during a decode.
When you use custom high-frequency settings to force the inclusion of visual information that would otherwise have been excluded, you are effectively changing the "philosophy" behind x265. That's a harsh way of putting it, but you asked the question.
x265 can follow x265's philosophy if you let it do it's own thing, or it can operate like any other encoder if you want to manipulate the settings yourself.
Regardless, I've personally found that x265 always has some efficiency improvement over x264, even if it's only marginal due to bothersome sources, and that improvement definitely increases with higher resolutions.
Kotatsu
19th February 2017, 06:46
I frames dose not increase if --scenecut 40 over.
C:\x265\x265.exe --preset slow --crf 17 --scenecut 0 "C:\x265\testvideo.y4m" -o null
frame I: 3, Avg QP:18.35 kb/s: 22734.00 frame P: 288, Avg QP:18.40 kb/s: 15331.26 frame B: 429, Avg QP:22.63 kb/s: 4139.60
C:\x265\x265.exe --preset slow --crf 17 --scenecut 5 "C:\x265\testvideo.y4m" -o null
frame I: 19, Avg QP:15.76 kb/s: 32103.99 frame P: 283, Avg QP:18.48 kb/s: 14309.48 frame B: 418, Avg QP:22.67 kb/s: 4075.80
C:\x265\x265.exe --preset slow --crf 17 --scenecut 10 "C:\x265\testvideo.y4m" -o null
frame I: 20, Avg QP:15.77 kb/s: 32639.96 frame P: 282, Avg QP:18.47 kb/s: 14221.44 frame B: 418, Avg QP:22.67 kb/s: 4075.99
C:\x265\x265.exe --preset slow --crf 17 --scenecut 15 "C:\x265\testvideo.y4m" -o null
frame I: 20, Avg QP:15.77 kb/s: 32639.96 frame P: 282, Avg QP:18.47 kb/s: 14221.44 frame B: 418, Avg QP:22.67 kb/s: 4075.99
C:\x265\x265.exe --preset slow --crf 17 --scenecut 20 "C:\x265\testvideo.y4m" -o null
frame I: 21, Avg QP:15.78 kb/s: 32707.41 frame P: 281, Avg QP:18.48 kb/s: 14169.16 frame B: 418, Avg QP:22.68 kb/s: 4075.12
C:\x265\x265.exe --preset slow --crf 17 --scenecut 25 "C:\x265\testvideo.y4m" -o null
frame I: 21, Avg QP:15.78 kb/s: 32707.41 frame P: 281, Avg QP:18.48 kb/s: 14169.16 frame B: 418, Avg QP:22.68 kb/s: 4075.12
C:\x265\x265.exe --preset slow --crf 17 --scenecut 40 "C:\x265\testvideo.y4m" -o null
frame I: 22, Avg QP:15.81 kb/s: 32495.99 frame P: 281, Avg QP:18.47 kb/s: 14124.01 frame B: 417, Avg QP:22.68 kb/s: 4064.71
C:\x265\x265.exe --preset slow --crf 17 --scenecut 60 "C:\x265\testvideo.y4m" -o null
frame I: 22, Avg QP:15.81 kb/s: 32495.99 frame P: 281, Avg QP:18.47 kb/s: 14124.01 frame B: 417, Avg QP:22.68 kb/s: 4064.71
C:\x265\x265.exe --preset slow --crf 17 --scenecut 80 "C:\x265\testvideo.y4m" -o null
frame I: 22, Avg QP:15.81 kb/s: 32495.99 frame P: 281, Avg QP:18.47 kb/s: 14124.01 frame B: 417, Avg QP:22.68 kb/s: 4064.71
C:\x265\x265.exe --preset slow --crf 17 --scenecut 100 "C:\x265\testvideo.y4m" -o null
frame I: 22, Avg QP:15.81 kb/s: 32495.99 frame P: 281, Avg QP:18.47 kb/s: 14124.01 frame B: 417, Avg QP:22.68 kb/s: 4064.71
pingfr
19th February 2017, 14:38
Could anyone here tell me which of the parameters included in the --preset veryslow compared to the --preset slow is the "culprit" dropping encoding speeds from 4fps to 0.30fps?
The idea is to retain most of the "quality improvements" from the veryslow preset while retaining the "merely acceptable" encoding speeds yield by the slow preset. There has to be some kind of middle grounds between both presets.
Thanks.
Leo 69
19th February 2017, 16:40
This is close to hq middle ground you want:
x265.exe --preset veryslow --crf 18 --aq-mode 3 --ctu 32 --qg-size 8 --deblock -1:-1 --no-strong-intra-smoothing --no-sao --tu-intra-depth 4 --tu-inter-depth 2 --limit-tu 1 --no-b-intra --no-amp --bframes 6 --no-rskip
pingfr
19th February 2017, 17:31
This is close to hq middle ground you want:
x265.exe --preset veryslow --crf 18 --aq-mode 3 --ctu 32 --qg-size 8 --deblock -1:-1 --no-strong-intra-smoothing --no-sao --tu-intra-depth 4 --tu-inter-depth 2 --limit-tu 1 --no-b-intra --no-amp --bframes 6 --no-rskip
Thank you for that.
Is it optimum settings for 1080p Blu-Ray source content?
And which settings would you recommend for resizing to 720p from Blu-Ray source content?
Thanks.
pingfr
19th February 2017, 18:03
@Lego69: Unbearable 0.76 fps encoding speeds. No thanks. ;)
Leo 69
19th February 2017, 18:03
For me these are optimum settings for 1080p encodes, yes. For 720p you might want only add --merange 38. That's all.
Edit: add/change --tu-intra-depth 3 --tu-inter-depth 1 --no-weightb --max-merge 3 to the initial line I proposed.
pingfr
19th February 2017, 18:06
@Leo 69: What about the speeds you're experiencing during 1080p encoding?
Leo 69
19th February 2017, 18:15
@pingfr
This was the last setting I encoded a 1920x800 movie in:
x265.exe --crf 19.3 --preset placebo --output-depth 10 --ctu 32 --no-rect --no-amp --no-b-intra --aq-strength 0.5 --qcomp 0.7 --cbqpoffs -3 --ipratio 1.3 --pbratio 1.2 --subme 6 --merange 57 --bframes 6 --no-strong-intra-smoothing --deblock -1:-1 --psy-rd 3.5 --psy-rdoq 7 --no-sao --qg-size 8 --limit-tu 1 --crqpoffs -3 --no-rskip
This gave me ~1FPS on an old Ivy Bridge 3570K@4400Mhz (no AVX2, of course). I don't touch noisy sources with x265, the blu-ray source was crystal clear.
pingfr
19th February 2017, 18:15
avs4x265 [info]: "x265" - --frames 2500 --fps 50/1 --input-res 1920x1080 --input-csp i420 --preset veryslow --crf 18 --aq-mode 3 --ctu 32 --qg-size 8 --deblock -1:-1 --no-strong-intra-smoothing --no-sao --tu-intra-depth 3 --tu-inter-depth 1 --no-weightb --max-merge 3 --limit-tu 1 --no-b-intra --no-amp --bframes 6 --no-rskip -o out.hevc
Average speed 1.01 fps. That's just terrible.
pingfr
19th February 2017, 18:17
avs4x265 [info]: "x265" - --frames 2500 --fps 50/1 --input-res 1920x1080 --input-csp i420 --crf 19.3 --preset placebo --output-depth 10 --ctu 32 --no-rect --no-amp --no-b-intra --aq-strength 0.5 --qcomp 0.7 --cbqpoffs -3 --ipratio 1.3 --pbratio 1.2 --subme 6 --merange 57 --bframes 6 --no-strong-intra-smoothing --deblock -1:-1 --psy-rd 3.5 --psy-rdoq 7 --no-sao --qg-size 8 --limit-tu 1 --crqpoffs -3 --no-rskip -o test.hevc
That's 0.49 fps. Unacceptable.
pingfr
19th February 2017, 18:19
avs4x265 [info]: "x265" - --frames 2500 --fps 50/1 --input-res 1920x1080 --input-csp i420 --crf 19.3 --preset slow --output-depth 10 --ctu 32 --no-rect --no-amp --no-b-intra --aq-strength 0.5 --qcomp 0.7 --cbqpoffs -3 --ipratio 1.3 --pbratio 1.2 --subme 6 --merange 57 --bframes 6 --no-strong-intra-smoothing --deblock -1:-1 --psy-rd 3.5 --psy-rdoq 7 --no-sao --qg-size 8 --limit-tu 1 --crqpoffs -3 --no-rskip -o slow.hevc
2.98 fps.
Leo 69
19th February 2017, 18:20
I'm sorry, maybe someone else has a better idea to improve your speed without hitting quality too much.
pingfr
19th February 2017, 18:21
Well at least you've tried. Much kudos for that.
Boulder
19th February 2017, 18:26
--limit-refs 3 --limit-modes --limit-tu 3 --rskip are ones to add. --rskip will boost the encoding speed considerably.
pingfr
19th February 2017, 18:52
@Boulder: So --limit-refs 3 --limit-modes --limit-tu 3 --rskip only with a given preset or with other args added as well?
Boulder
19th February 2017, 19:03
I use them with --preset slower --tune grain. They could be incorporated in the preset as well but they are basically the best bang for buck what comes to performance and encoding time.
pingfr
19th February 2017, 19:12
avs4x265 [info]: "x265" - --frames 2500 --fps 50/1 --input-res 1920x1080 --input-csp i420 --preset slower --tune grain --limit-refs 3 --limit-modes --limit-tu 3 --rskip -o boulder.hevc
1.50fps at the best, 1.30fps at the lowest. :/
Leo 69
19th February 2017, 19:18
Try this line. How many FPS do you get?
--preset veryslow --crf 18 --aq-mode 3 --ctu 32 --qg-size 8 --deblock -1:-1 --no-strong-intra-smoothing --no-sao --tu-intra-depth 3 --tu-inter-depth 1 --no-weightb --max-merge 3 --limit-tu 3 --limit-refs 3 --no-b-intra --no-amp --no-rect --bframes 6 --rskip
pingfr
19th February 2017, 19:27
1.58fps at the best, 1.54fps at the worst.
Boulder
19th February 2017, 19:31
I wouldn't say 1.5 fps at preset slower is bad for 1080p.. x265 is much slower than x264, there's no way around it due to the added complexity of things.
--no-rect and/or --no-amp can give a slight boost. I keep them enabled as they should help retain detail better.
Leo 69
19th February 2017, 19:43
I wouldn't say 1.5 fps at preset slower is bad for 1080p.. x265 is much slower than x264, there's no way around it due to the added complexity of things.
Absolutely.
--no-rect and/or --no-amp can give a slight boost. I keep them enabled as they should help retain detail better.
I never saw that switching on rectangular and asymmetric partitions would improve visual quality. I can't tell the difference whether they're on or off on any encodes.
aymanalz
19th February 2017, 22:30
I wouldn't say 1.5 fps at preset slower is bad for 1080p.. x265 is much slower than x264, there's no way around it due to the added complexity of things.
Yes, but with his i7-6700, I think he should definitely be getting faster encodes.
@pingfr : Are you sure there is nothing bottlenecking your system? Have you checked out your CPU usage during encoding? Is it always above 90%?
Reducing the maximum CU size to 32 can give a big speed boost. For resolutions less than 1080p, the difference in quality may not be significant. Same with merange.
Edit: I noticed that you are already using CTU 32. I'm pretty sure something is off, if you are only getting the speeds you say. Is hyperthreading turned on, are you using the appropriate values for threads/pools etc?
pingfr
19th February 2017, 22:52
Yes, but with his i7-6700, I think he should definitely be getting faster encodes.
@pingfr : Are you sure there is nothing bottlenecking your system? Have you checked out your CPU usage during encoding? Is it always above 90%?
Reducing the maximum CU size to 32 can give a big speed boost. For resolutions less than 1080p, the difference in quality may not be significant. Same with merange.
Edit: I noticed that you are already using CTU 32. I'm pretty sure something is off, if you are only getting the speeds you say. Is hyperthreading turned on, are you using the appropriate values for threads/pools etc?
Everything seems normal to me: http://imgur.com/a/fYSy4
Also note this is a regular non-K, non unlocked i7-6700.
According to ARK, it's base frequency is 3.4GHz with a alleged Turbo Boost to 4.00GHz but for some reason throttles to 3.7GHz while under 100% load, it can however reach 4.0GHz on a single core.
https://ark.intel.com/products/88196/Intel-Core-i7-6700-Processor-8M-Cache-up-to-4_00-GHz
Dclose
20th February 2017, 02:39
I never saw that switching on rectangular and asymmetric partitions would improve visual quality. I can't tell the difference whether they're on or off on any encodes.
I have. I never turn rectangle off after doing some tests of it. Sometimes I turn asymmetric off to increase speed since asymm is a worse quality/speed ratio than rectangle, but when I'm going for better quality I turn it on since it does help tighten up the picture a little more.
I easily noticed the difference again the other day when doing some tests. I was doing quality comparisons by watching the video play from four feet away on a 40" screen. Some people may think that's too close, while some people around here do their comparisons by zooming in on screenshots and analyzing pixels.
The biggest speed boost is probably Early Skip. The quality difference is generally quite obvious to me, but the speed increase is even more obvious. I'd say the higher the resolution (and bitrate), the less Early Skip hurts quality.
The more bitrate used, the less the quality settings matter. I'm usually dropping bitrate below what most people would think is acceptable, so the different quality settings tend to show differences more, so I rarely use speed boosts like Early Skip, and things like Max Merge can have a very noticeable difference on video quality.
pradeeprama
20th February 2017, 04:45
Could anyone here tell me which of the parameters included in the --preset veryslow compared to the --preset slow is the "culprit" dropping encoding speeds from 4fps to 0.30fps?
The idea is to retain most of the "quality improvements" from the veryslow preset while retaining the "merely acceptable" encoding speeds yield by the slow preset. There has to be some kind of middle grounds between both presets.
Thanks.
I suspect the top culprits for why fps drops so significantly with veryslow, when compared to slow preset is because of increased rd-level, increased subme, +1 reference frames, and enabling more TU search (--tu-inter 3 --tu-intra 3 in veryslow instead of --tu-inter 1 --tu-intra 1 that slow has).
I would recommend increasing --limit-ref and --limit-modes, enabling --limit-tu 4 and trying if that helps contain the drop in fps. There is another feature called --dynamic-rd that dynamically increases rd-level to 5 when you start with 4, but this works only when VBV parameters are used and they clip the quality due to excessive bits.
aymanalz
20th February 2017, 07:13
@pingfr : Maybe I'm missing it, but I cannot see a value for CPU load in the CPU-Z screenshot you provided. I only see info about the CPU itself. Open task manager (Win+shift+esc) to see how much of your CPU is being used. If the x265 executable is not using more than 90% all the time, you can be sure that there is a bottleneck somewhere.
Your rd-level is 6; that is definitely a major factor in speed. Reducing it to 4 can improve speed considerably. (Will quality loss be tolerable, I cannot say - you will have to test it out.) If I'm not mistaken, limit ref depth and CU only works for rd-level below 5. Those options give significant speed boost with little quality loss, but they are automatically disabled because your rd-level is too high.
Midzuki
20th February 2017, 10:53
x265 2.3+8-cfaff341e350
SAO: avoid negative indexes in 'x265_lambda2_tab' table
http://www.mediafire.com/file/0857vp1pcuvgw80/x265_2.3+8-cfaff341e350.7z
ShogoXT
20th February 2017, 20:09
I think I am going to be getting one of the new 8 core Ryzen CPUs.
From the rumors ive been hearing about, its main weakness will be AVX.
Would it be worthwhile to benchmark and compare its speed on different instructions vs intel CPUs with the use of -asm using x265?
Thanks
Midzuki
21st February 2017, 16:10
According to commit 820f4327ddac... ,
CLI: Remove redundant cli option 'capture-csp'
OK, but now, ¿when MCW is going to finally correct the information shown below?
--bframes <integer> Maximum number of consecutive b-frames (now it only enables B GOP structure) Default 4
Ma
21st February 2017, 20:54
[...] ¿when MCW is going to finally correct the information shown below?
--bframes <integer> Maximum number of consecutive b-frames (now it only enables B GOP structure) Default 4
This change will be OK?
diff -r 820f4327ddac source/x265cli.h
--- a/source/x265cli.h Mon Feb 20 17:18:53 2017 +0530
+++ b/source/x265cli.h Tue Feb 21 20:48:09 2017 +0100
@@ -391,7 +391,7 @@
H0(" --rc-lookahead <integer> Number of frames for frame-type lookahead (determines encoder latency) Default %d\n", param->lookaheadDepth);
H1(" --lookahead-slices <0..16> Number of slices to use per lookahead cost estimate. Default %d\n", param->lookaheadSlices);
H0(" --lookahead-threads <integer> Number of threads to be dedicated to perform lookahead only. Default %d\n", param->lookaheadThreads);
- H0(" --bframes <integer> Maximum number of consecutive b-frames (now it only enables B GOP structure) Default %d\n", param->bframes);
+ H0("-b/--bframes <0..16> Maximum number of consecutive b-frames. Default %d\n", param->bframes);
H1(" --bframe-bias <integer> Bias towards B frame decisions. Default %d\n", param->bFrameBias);
H0(" --b-adapt <0..2> 0 - none, 1 - fast, 2 - full (trellis) adaptive B frame scheduling. Default %d\n", param->bFrameAdaptive);
H0(" --[no-]b-pyramid Use B-frames as references. Default %s\n", OPT(param->bBPyramid));
Midzuki
21st February 2017, 23:25
This change will be OK?
diff -r 820f4327ddac source/x265cli.h
--- a/source/x265cli.h Mon Feb 20 17:18:53 2017 +0530
+++ b/source/x265cli.h Tue Feb 21 20:48:09 2017 +0100
@@ -391,7 +391,7 @@
H0(" --rc-lookahead <integer> Number of frames for frame-type lookahead (determines encoder latency) Default %d\n", param->lookaheadDepth);
H1(" --lookahead-slices <0..16> Number of slices to use per lookahead cost estimate. Default %d\n", param->lookaheadSlices);
H0(" --lookahead-threads <integer> Number of threads to be dedicated to perform lookahead only. Default %d\n", param->lookaheadThreads);
- H0(" --bframes <integer> Maximum number of consecutive b-frames (now it only enables B GOP structure) Default %d\n", param->bframes);
+ H0("-b/--bframes <0..16> Maximum number of consecutive b-frames. Default %d\n", param->bframes);
H1(" --bframe-bias <integer> Bias towards B frame decisions. Default %d\n", param->bFrameBias);
H0(" --b-adapt <0..2> 0 - none, 1 - fast, 2 - full (trellis) adaptive B frame scheduling. Default %d\n", param->bFrameAdaptive);
H0(" --[no-]b-pyramid Use B-frames as references. Default %s\n", OPT(param->bBPyramid));
Thanks for asking.
Well, that's almost good. THIS is the right way to do it.
H0("-b/--bframes <0..16> Maximum number of consecutive B-frames. Default %d\n", param->bframes);
H1(" --bframe-bias <integer> Bias towards B-frame decisions. Default %d\n", param->bFrameBias);
H0(" --b-adapt <0..2> 0 - none, 1 - fast, 2 - full (trellis) adaptive B-frame scheduling. Default %d\n", param->bFrameAdaptive);
H0(" --[no-]b-pyramid Use B-frames as references. Default %s\n", OPT(param->bBPyramid));
Because consistency matters.
youli
22nd February 2017, 07:01
Options test: --aq-motion and --dynamic-rd.
Bitrate decrease about 3,5%.
Encoded 178944 frames in 107297.17s (1.67 fps), 18252.68 kb/s, Avg QP:24.23
MediaInfo:
Writing library : x265 2.2+36-9b975fec584a:[Windows][GCC 6.2.0][64 bit] 10bit
Encoding settings : cpuid=1050111 / frame-threads=3 / numa-pools=8 / wpp / no-pmode / no-pme / no-psnr / no-ssim / log-level=2 / input-csp=1 / input-res=1920x2160 / interlace=0 / total-frames=178944 / level-idc=50 / high-tier=1 / uhd-bd=0 / ref=1 / no-allow-non-conformance / no-repeat-headers / annexb / no-aud / no-hrd / info / hash=0 / no-temporal-layers / open-gop / min-keyint=23 / keyint=250 / bframes=4 / b-adapt=2 / b-pyramid / bframe-bias=0 / rc-lookahead=40 / lookahead-slices=2 / scenecut=40 / no-intra-refresh / ctu=32 / min-cu-size=8 / no-rect / no-amp / max-tu-size=16 / tu-inter-depth=1 / tu-intra-depth=1 / limit-tu=0 / rdoq-level=2 / dynamic-rd=4.00 / signhide / no-tskip / nr-intra=0 / nr-inter=0 / no-constrained-intra / no-strong-intra-smoothing / max-merge=2 / limit-refs=0 / no-limit-modes / me=3 / subme=7 / merange=25 / temporal-mvp / weightp / weightb / no-analyze-src-pics / no-deblock / no-sao / no-sao-non-deblock / rd=3 / early-skip / no-rskip / fast-intra / no-tskip-fast / no-cu-lossless / no-b-intra / rdpenalty=0 / psy-rd=2.00 / psy-rdoq=3.00 / no-rd-refine / analysis-mode=0 / no-lossless / cbqpoffs=0 / crqpoffs=0 / rc=crf / crf=24.0 / qcomp=0.80 / qpstep=1 / stats-write=0 / stats-read=0 / vbv-maxrate=100000 / vbv-bufsize=100000 / vbv-init=0.9 / crf-max=0.0 / crf-min=0.0 / ipratio=1.10 / pbratio=1.10 / aq-mode=3 / aq-strength=0.60 / no-cutree / zone-count=0 / no-strict-cbr / qg-size=16 / no-rc-grain / qpmax=51 / qpmin=0 / sar=16 / overscan=0 / videoformat=5 / range=0 / colorprim=1 / transfer=2 / colormatrix=2 / chromaloc=0 / display-window=0 / max-cll=0,0 / min-luma=0 / max-luma=1023 / log2-max-poc-lsb=8 / vui-timing-info / vui-hrd-info / slices=1 / opt-qp-pps / opt-ref-list-length-pps / no-multi-pass-opt-rps / scenecut-bias=0.05 / no-opt-cu-delta-qp / aq-motion / no-hdr
Bitrate distribution:
http://img11.lostpic.net/2017/02/22/e39e13d4eba1f1e418e73b29631c6267.th.png (http://lostpic.net/image/Q8n6)
Screenshots comparison:
Source BD3D (left eye) and Blu-Ray Rip Top-Bottom (left eye first) at 4914 second with bitrate 76592 kbps (maximum for this video).
http://s018.radikal.ru/i525/1702/04/895a3cbbdc0bt.jpg (http://radikal.ru/fp/vvjsvne2qhtf2) http://s018.radikal.ru/i510/1702/09/760d6c8d0f7ft.jpg (http://radikal.ru/fp/64e29031dw6yq)
NikosD
23rd February 2017, 18:47
I was told by an Intel fanboy and RyZen 7 reviewer that you need a Core i7 7700K@4.8GHz to catch the performance of RyZen 7@4.0GHz at x265 2nd pass or you could see the same thing reversed.
Will see...
pingfr
23rd February 2017, 19:09
I was told by an Intel fanboy and RyZen 7 reviewer that you need a Core i7 7700K@4.8GHz to catch the performance of RyZen 7@4.0GHz at x265 2nd pass or you could see the same thing reversed.
Will see...
Would love to see some substantial evidences. ;)
Atak_Snajpera
23rd February 2017, 19:57
I was told by an Intel fanboy and RyZen 7 reviewer that you need a Core i7 7700K@4.8GHz to catch the performance of RyZen 7@4.0GHz at x265 2nd pass or you could see the same thing reversed.
Will see...
Sound reasonable. AVX2 is about ~1.6x faster than AVX. Thanks to 2xAVX256+FMA Intel can still compete against 8c/16t CPU.
http://images.anandtech.com/reviews/cpu/intel/Haswell/Architecture/flops.jpg?_ga=1.88542875.1424348284.1464096708
http://i.imgsafe.org/afb986b279.png
pingfr
23rd February 2017, 20:00
@Atak_Snajpera: You've got the screenshots I've sent you earlier this week in PM? :p
NikosD
23rd February 2017, 20:12
Sound reasonable. AVX2 is about ~1.6x faster than AVX. Thanks to 2xAVX256+FMA Intel can still compete against 8c/16t CPU.
http://images.anandtech.com/reviews/cpu/intel/Haswell/Architecture/flops.jpg?_ga=1.88542875.1424348284.1464096708
http://i.imgsafe.org/afb986b279.png
FMA has nothing to do with x265, because AFAIK x265 uses integers and FMA is for floating point numbers.
Integer AVX2 makes the difference, will see how much.
Atak_Snajpera
23rd February 2017, 20:36
x264 and x265 use FMA3. See encoder's output 'using cpu capabilities'
NikosD
23rd February 2017, 20:41
It doesn't matter if it lists CPU capabilities.
It really matters what exactly instructions x265 can use.
It would be a huge surprise if it could use FMA3 in a large extent or at all.
pingfr
23rd February 2017, 21:05
It doesn't matter if it lists CPU capabilities.
It really matters what exactly instructions x265 can use.
It would be a huge surprise if it could use FMA3 in a large extent or at all.
See encoder's output 'using cpu capabilities'
Using:
verb (used with object), used, using.
1.
to employ for some purpose; put into service; make use of:
to use a knife.
Source: http://www.dictionary.com/browse/using?s=t
NikosD
23rd February 2017, 21:08
Age ? 15 ?
pingfr
23rd February 2017, 21:14
Age ? 15 ?
*shrug*
vBulletin® v3.8.11, Copyright ©2000-2025, vBulletin Solutions Inc.