Log in

View Full Version : x265 HEVC Encoder


Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 [95] 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197

Barough
11th February 2017, 19:26
Could be so..... ;)

I don't have a special scheme when i run the 'media-autobuild_suite'.... it just happens when i feel like it, ie. when i fire up the laptop i have the suite installed on.

birdie
12th February 2017, 11:38
I wonder why no one has posted comparisons lately, e.g. x264 vs x265 at various bitrates, or maybe various versions of x265.

LigH
12th February 2017, 11:43
Spending energy in repeating the same comparisons again doesn't sound useful to everyone. It may require a specific reason to do some new ones.

If you need one: Try to show us which impact dynamic-rd can have.

hajj_3
12th February 2017, 14:12
I wonder why no one has posted comparisons lately, e.g. x264 vs x265 at various bitrates, or maybe various versions of x265.

It has been quite some time since big improvements were made in x265 compression so there wouldn't be much difference to quality, speed has improved quite a bit though.

birdie
12th February 2017, 15:24
No significant changes between 1.7 and 2.2? How's that even possible? ;-)

Khun_Doug
12th February 2017, 21:39
If you need one: Try to show us which impact dynamic-rd can have.


I took a look at dynamic-rd. First I discovered that VBV buffer size and VBV fillrate need to be non-zero. My initial test used a BD clip with CRF 19 preset Slow. The X265 file has a bitrate of 6900 kb/sec, file size 214.3 MB. To get a video clip that was the same clarity and sharpness I ended up using VBV buffer 4000 and VBV fillrate 1000. I tested VBV buffer 2000 and VBV fillrate 500, but the output lacked the sharpness. With the VBV 4000 settings and dynamic-rd set to 0, the bitrate dropped to 3336 kb/sec, file size 103.4 MB. I then tested dynamic-rd with a value of 2,3 and 4. I didn't bother with testing 1. Results:

0: file size 103.4, bitrate 3336 kb/sec
2: file size 103.6, bitrate 3341 kb/sec
3: file size 103.5, bitrate 3338 kb/sec
4: file size 105.6, bitrate 3340 kb/sec

Visually, the clip looks the same across all VBV test cases. I would expect that more scrupulous eyes would see noticeable changes between the clip without any VBV and the various test cases. I saved the film clips if anyone wants to see sample frames.

All this brings up a question. What are reasonable VBV settings? Are there better options when using VBV than using a CRF value?

Magik Mark
13th February 2017, 00:18
I have 24 cores 28 threads CPU. What would be an optimum value for --slices?

Selur
13th February 2017, 00:30
Not sure what the optimum is but from some tests I did on my old dual Xeon E5640 system, corecount/2 seems to give the best speed.

pradeeprama
13th February 2017, 05:32
I have 24 cores 28 threads CPU. What would be an optimum value for --slices?

It really depends on what your goal is. Slice-based parallelism is a performance/efficiency trade-off. On the one hand, you may get better performance with fewer frame threads (-F option), while on the other, there is some loss in visual quality at the boundary of the slices. I would encourage you to play with a combination of the -F and --slices options.

Magik Mark
13th February 2017, 05:34
Not sure what the optimum is but from some tests I did on my old dual Xeon E5640 system, corecount/2 seems to give the best speed.

How was the quality compared to default and corecount/2

Selur
13th February 2017, 07:34
I didn't compare the quality seriously, that is why I wrote 'give the best speed'.
I just checked tested a few sources (HD, UHD and SD) with different '--slices' values; tested different pools a frame-thread values, but those didn't really help with the speed at all.
Once I found the slices value I simply looked at the output to see if I could spot any problems, since I didn't see a problem I stopped my testing.
There might still be problems quality wise with using slices that weren't present with my sources or I simply didn't spot at the time.

Cu Selur

qtwigg
13th February 2017, 23:08
Anyone been experiencing very low bitrates at low CRF numbers? 20 usually been working for 3.5 to 5GB depending but lately I needed to use 18 but with 18 barely breaking 3k to 4k in bitrate and now I have one CRF 14, 1.7GB. I know all the parameters and differences in all material and how everything is a variable but this is or has been an average lately and only been lately. I usually never had to go below 20 or they file would be enormous but not anymore. Any ideas?

Selur
14th February 2017, 07:24
Wild guess: material is darker than the previous content

nakTT
14th February 2017, 08:48
Barough applies UPX upon x265.exe, there is no sorcery involved :D
Thank you so much for the info. I thought there was a sorcery involved.:D

By the way, anyone here has any idea that in general, how many percent the Placebo setting has the advantage over the VerySlow setting in term of quality at the same bitrate.

Thank you in advance.

Selur
14th February 2017, 08:52
Side note: Assuming the compression efficiency between 'very slow' and 'placebo' was anything worth speaking of wouldn't that make the developers morons to name the preset that way?

LigH
14th February 2017, 09:00
Life would be so easy if we could measure "quality" directly. Unfortunately, it is subjective, thus related to personal opinions of different people looking at the result. We can measure differences, and we can weight them with metrics resembling some psychovisual rules. Still, different people may have different opinions about how annoying these differences feel, and whether they notice them at all.

See preset "veryslow" as a sensible maximum, and "placebo" as a theoretical but nonsensical limit. Other projects named such a preset "insane" instead.

nakTT
15th February 2017, 02:48
Side note: Assuming the compression efficiency between 'very slow' and 'placebo' was anything worth speaking of wouldn't that make the developers morons to name the preset that way?

Life would be so easy if we could measure "quality" directly. Unfortunately, it is subjective, thus related to personal opinions of different people looking at the result. We can measure differences, and we can weight them with metrics resembling some psychovisual rules. Still, different people may have different opinions about how annoying these differences feel, and whether they notice them at all.

See preset "veryslow" as a sensible maximum, and "placebo" as a theoretical but nonsensical limit. Other projects named such a preset "insane" instead.
Thank you so much for the reply.

I do understand that it is not a straightforward kind of affair. The reason I ask because last time I did ask the same question to Dark Shikari about x264 and she did give me an answer/estimation (less than 2%, IIRC).

Thus this time around I thought someone who is really knowledgeable in x265 could share some sort of answer/estimation.

Thanks again and sorry for any inconvenience caused.

:thanks:

LigH
15th February 2017, 08:35
less than 2%

Without testing, I would assume that the difference for x265 will usually be just as small. Mainly because the encoder already works very thoroughly in "veryslow" preset, and would only waste more time with most improbable tests in "placebo".

pradeeprama
15th February 2017, 08:53
Thank you so much for the reply.

I do understand that it is not a straightforward kind of affair. The reason I ask because last time I did ask the same question to Dark Shikari about x264 and she did give me an answer/estimation (less than 2%, IIRC).

Thus this time around I thought someone who is really knowledgeable in x265 could share some sort of answer/estimation.

Thanks again and sorry for any inconvenience caused.

:thanks:

Since x265 is tuned for visual quality, I don't think quoting a number would be straight forward. If you can afford the time for the placebo encodes, they will definitely perform better (efficiency wise) making the jump worth it.

pradeeprama
15th February 2017, 08:54
x265 version 2.3 has been released. This release contains new algorithms that improve visual quality, encoding efficiency, and performance.

The latest version can be downloaded from here (MD5 sum = 18716a7e0c6f6ebd2a1035b82cec30de). Full documentation is available at http://x265.readthedocs.io/en/stable/.

Release Notes for 2.3
================

Encoder enhancements
----------------------------------
1. New SSIM-based RD-cost computation for improved visual quality, and efficiency; use --ssim-rd to exercise.
2. Multi-pass encoding can now share analysis information from prior passes (in addition to rate-control information) to improve performance and quality of subsequent passes; to your multi-pass command-lines that use the --pass option, add --multi-pass-opt-distortion to share distortion information, and --multi-pass-opt-analysis to share other analysis information.
3. A dedicated thread pool for lookahead can now be specified with --lookahead-threads.
4. --dynamic-rd dynamically increase analysis in areas where the bitrate is being capped by VBV; works for both CRF and ABR encodes with VBV settings.
5. The number of bits used to signal the delta-QP can be optimized with the --opt-cu-delta-qp option; found to be useful in some scenarios for lower bitrate targets.
6. Experimental feature option:–aq-motion adds new QP offsets based on relative motion of a block with respect to the movement of the frame.

API changes
-------------------
1. Reconfigure API now supports signalling new scaling lists.
2. x265 application’s csv functionality now reports time (in milliseconds) taken to encode each frame.
3. --strict-cbr enables stricter bitrate adherence by adding filler bits when achieved bitrate is lower than the target; earlier, it was only reacting when the achieved rate was higher.
4. --hdr can be used to ensure that max-cll and max-fall values are always signaled (even if 0,0).

Bug fixes
--------------
1. Fixed incorrect HW thread counting on MacOS platform.
2. Fixed scaling lists support for 4:4:4 videos.
3. Inconsistent output fix for --opt-qp-pss by removing last slice’s QP from cost calculation.
4. VTune profiling (enabled using ENABLE_VTUNE CMake option) now also works with 2017 VTune builds.

Happy compressing!

Selur
15th February 2017, 08:57
Sadly 4:2:2 encoding causes x265 to crash and 4:4:4 encoding still produces artifacts when fed through ffmpeg (to be frank this one is more ffmpegs fault).

Boulder
15th February 2017, 09:04
x265 version 2.3 has been released. This release contains new algorithms that improve visual quality, encoding efficiency, and performance.

1. New SSIM-based RD-cost computation for improved visual quality, and efficiency; use --ssim-rd to exercise.


I was wondering, is this observation based on metrics or visual comparison? Also what about --tune grain, would this option have adverse effects there?

Selur
15th February 2017, 09:14
"SSIM-based RD-cost computation" -> sounds like metrics based ;)

Boulder
15th February 2017, 09:16
I was wondering about the "improved visual quality" -statement as it's generally considered a fact that visual comparison is needed in addition to any metrics :)

LigH
15th February 2017, 10:02
New milestone (stable merge build): x265 v2.3

x265 2.3+2-912dd749bdb5 (https://www.mediafire.com/file/45icnx8fa0nk3ba/x265_2.3%2B2-912dd749bdb5.7z) (mostly identical to v2.2+36)
_

Of course, SSIM-RD uses the SSIM metric to optimize rate distortion. The previously used metric will have been simpler than that, though, so chances are good that many people would agree with improved quality. But don't hesitate to organize a mass ABX test ;)

Sagittaire
15th February 2017, 23:05
@ dev team

x265 introduce specifical optimisation for new AMD Rysen CPU?

x265_Project
15th February 2017, 23:20
@ dev team

x265 introduce specifical optimisation for new AMD Rysen CPU?

We're still in the early phase of evaluating and optimizing performance on Ryzen. Nothing that I can report at this time. :)

Selur
16th February 2017, 14:23
What does 'analyze-src-pics' do?
Enalbe motion estimation with source frame pixels, in this mode, motion estimation can be computed independently.
Didn't really help in whether this is more of a testing feature, whether this is should boost compressibility,...

Barough
16th February 2017, 15:24
x265 v2.3+6-db913efb1a59 (http://ge.tt/2OycKqi2) (MSYS/MinGW, GCC 6.3.0, 32 & 64bit 8/10/12bit multilib EXEs)

x265 [info]: HEVC encoder version 2.3+6-db913efb1a59
x265 [info]: build info [Windows][GCC 6.3.0][32 bit/64 bit] 8bit+10bit+12bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2

https://bitbucket.org/multicoreware/x265/commits/branch/default

mandarinka
16th February 2017, 16:13
We're still in the early phase of evaluating and optimizing performance on Ryzen. Nothing that I can report at this time. :)

In case you are adding AMD core detection for Ryzen, could you also add a detection for Excavator?
Currently that core suffers about 4% performance hit due to using AVX2 which is available but slow on it. I think all you would need to do is add a condition to the CPU detection code that disables AVX2 use if both AVX2 and and AMD XOP instruction extensions are detected on the CPU. (Zen doesn't have XOP supposedly.) Or there might be other ways to check for Excavator more intelligently, I'm no expert by far :)

"SSIM-based RD-cost computation" -> sounds like metrics based ;)

I was wondering about the "improved visual quality" -statement as it's generally considered a fact that visual comparison is needed in addition to any metrics :)

Guys.... duh. Every encoder decision, including the current Psy RDO in x264 and x265, has to use some metric. Encoder being a computer program has no other way to do it. The difference is just that the current metrics were less complex, SSIM is probably replacing difference calculating tools like SAD/SATD.
To check the visual results and decide if you like them is your job, the program can't do it for you.

littlepox
16th February 2017, 16:35
If someone wish to use a method not being any metric, you must do the following:

1. the encoder encodes a block in several ways, then present the pictures to you;
2. you evaluate the quality of the pictures and assign them some marks
3. the encoder takes you evaluation and compute rate/distortion trade-off
4. go back to 1 with another block to encode


This is then a true "Human-eyes tuned rd"

Boulder
16th February 2017, 16:46
Guys.... duh. Every encoder decision, including the current Psy RDO in x264 and x265, has to use some metric. Encoder being a computer program has no other way to do it. The difference is just that the current metrics were less complex, SSIM is probably replacing difference calculating tools like SAD/SATD.
To check the visual results and decide if you like them is your job, the program can't do it for you.I was more or less interested in whether the devs had already had some visual comparison done or if the comparisons are purely based on metrics. I'm more than happy to accept the opinion of the majority what comes to visual quality, that's why I use presets :) But what that option actually does to --tune grain concerning keeping noise/grain is a bit unclear as I'm not that deep into the technical things.

x265_Project
16th February 2017, 16:55
What does 'analyze-src-pics' do?

Didn't really help in whether this is more of a testing feature, whether this is should boost compressibility,...

This is a test feature. It allows us to run experiments. It will hurt compression efficiency, so it shouldn't be used for any production encoding.

Normally, when the encoder performs inter-prediction (also known as motion compensated prediction), it must wait until the reference frames that it wants to use for frame #x are finished encoding before it can start encoding frame #x (or, with Wavefront Parallel Processing, it must wait until the first rows of the reference frames are completed before it can start). This feature tells the encoder to use the original source frames as reference frames, breaking that dependency. This would allow an encoder to run faster, especially for file-based encoding, as it has the dependent data (the source frames), and so it can start encoding each frame right away. But the source frames are not what the decoder will have when it is decoding. The actual decoded reference frames will be different (except for a lossless encode). So the inter-prediction won't be nearly as accurate when you use source frames as reference frames.

Midzuki
16th February 2017, 19:36
So, perhaps XhmikosR will release a new MSYS1 package only when GCC 7 is out...

Not true, fortunately. :) His MSYS_MinGW-w64_GCC_630_x86-x64_Full.7z went out of the oven yesterday:

http://xhmikosr.1f0.de/tools/msys/

LigH
16th February 2017, 20:08
Hooray! Thank you, XhmikosR! I'll start updating.

Rinzler0x7BB
16th February 2017, 21:51
Hi,

the last months i made a lot of tests with x265 and now I want to share my experience I made with one issue I had with small moving objects.

From the description of "Jawed" some months ago it seems that he saw a similar issue:

I found that motion in live action tended to become jittery (sort of as though it were half-rate: "anime" in feel).

http://forum.doom9.org/showpost.php?p=1777880&postcount=4189

I actually found the same source suffered the same fate with x264 at crf 24 (using settings that are almost equivalent to preset very slow in speed).
But in x264 I historically did not use crf as high as 24 - principally because blocking/banding with 8-bit encodes becomes unbearable.

So the same source in both x264 and x265 (both 10-bit) produced the same problem,
which disappears with crf 20. In x264, setting --tune grain also solved the problem (doubled the bitrate for the water test sequence too...).

So when I started my x265 experiments I started to investigate crf, pushing far beyond 21 which was my limit with x264 8-bit. 10-bit, it transpires,
makes x264 blocking/banding a non-issue, which was a good reason to see what I thought of higher crf values. Along the way I stumbled into this problem at crf 24.

(In fact I found the problem back in April and was so dismayed that I just forgot about my x265 experiments and did other things -
not realising at the time that it was b-frames causing the problem.)

On the water sequence I reported 4 b-frames as a solution.
But in other testing I found that rapidly turning faces (small in the frame) in low contrast (think "a steamy room") the problem recurred (horrible jumps, like anime).
So 2 b-frames it is.

I decided not to evaluate CRFs between 20 and 24 to see how the problem arises.
It just seems that at crf 24, b-frames (with x264 or x265 both on very slow preset) are too unreliable.
I hadn't noticed in the past with x264 because I hadn't used such a high value for crf.

My description
This is still visible with the current builds of x265 with lower crf values.
I describe it like a judder in front or the back of the object in the direction of the movement.
At first I thought that it is only visible with crf values lower than 21. But with a closer look it is also visible with crf 20.

How to reproduce
I found a video on youtube with which it was easy to reproduce this behavior.
"Loop 610 & U.S. 290 Interchange - Houston Construction - March 6, 2015 - 4K HD"
https://www.youtube.com/watch?v=Ax90e-F0z_M

This video has a lot of small cars which makes it perfect to see the judder on small cars.
To make it easier to see them i downscaled to 720p.
Enocde the video with --preset medium and --crf 22

My tests
I made a lot of Tests to reduce the judder. At first I started to use higher values which influence the motion estimation like --me star or higher --subme values.
But It helped only very little.

Then I found the setting which helped the most. --rd 5/6 did all the magic.
My problem with --rd 5/6 is the very low speed as I still wanted to have reasonable encoding times.
I also liked the better quality but together with --rdoq-level 2 and higher --psy-rdoq values which we need for high detail encodes it is too slow for me.

My solution
With another bunch of test encodes I found out that --rect helped nearly the same like --rd 5/6.
Then it is not necessary to use --rd 5/6 to have smooth motion on small moving objects.
--rect makes the encode also slower but there is a big difference to --rd 5/6.

My question
Is this a bug?. I did not recognize this behavior with x264.
Maybe there is a way to reduce this judder without the use of --rect or --rd 5 and have a faster encoding speed.

My low quality settings
I ended up in these settings which i use for my low quality encodes:
--profile main10 --output-depth 10 --preset medium --crf 22 --ctu 32 --aq-mode 3 --aq-strength 0.9 --aq-motion
--rc-lookahead 25 --no-sao --level-idc 4.1 --high-tier --rd 4 --psy-rd 2.1 --psy-rdoq 2.5 --rdoq-level 2
--bframes 6 --no-strong-intra-smoothing --b-intra --rect --limit-modes

My high quality settings
Finally these are my high quality settings I currently use:
--profile main10 --output-depth 10 --preset medium --crf 20 --ctu 32 --pbratio 1.22 --subme 3 --aq-mode 3 --aq-strength 0.9
--qcomp 0.64 --rc-lookahead 35 --no-sao --level-idc 4.1 --high-tier --rd 4 --psy-rd 2.5 --psy-rdoq 3.5 --rdoq-level 2
--bframes 8 --no-strong-intra-smoothing --weightb --b-intra --rect --limit-modes

Sagittaire
16th February 2017, 23:29
I was more or less interested in whether the devs had already had some visual comparison done or if the comparisons are purely based on metrics. I'm more than happy to accept the opinion of the majority what comes to visual quality, that's why I use presets :) But what that option actually does to --tune grain concerning keeping noise/grain is a bit unclear as I'm not that deep into the technical things.

well codec use by definition always metric. Tune grain use "complexity" metric too. all the decision in all codec use always metric. SSIM are by definition an HVS metric.

and you have generaly good correlation with metric and eye in all serious bench.

aymanalz
17th February 2017, 09:33
Using V2.3, with 2 pass encoding I am getting significantly higher speed for 2nd pass, than for first pass. Previously, they used to go at the same rate. I am guessing this is due to the Analysis and QP refinement enhancements? I don't use "fast first pass".

Jamaika
17th February 2017, 09:52
Not true, fortunately. :) His MSYS_MinGW-w64_GCC_630_x86-x64_Full.7z went out of the oven yesterday:
Buuuu, Windows 10 delete files EXE.
Trojan:Win32/Kandelo.B!cl

LigH
17th February 2017, 10:07
@ aymanalz:

I believe to remember that 1st pass statistics contain more reusable data than before, so the 2nd pass may be able to spare more calculations?

@ Jamaika:

Which AV possibly reports false alarms for which files? I'm still downloading (crappy rural connection); should be cross-checked with multi-engine AV web services like VirusTotal or malwr.

Jamaika
17th February 2017, 10:21
Well. I will not be risked. I take advantage of other software.:(

LigH
17th February 2017, 10:31
No warnings in MSSE.

MBAM scanning now... — No suspicious files found in this branch.

It would help us all a lot if you could show us a list which files your AV found to be suspicious.

BTW, you should always exclude the directory branch from resident AV scanners where binaries are being compiled to; incomplete binary files are easily mis-reported as suspicious.

aymanalz
17th February 2017, 11:17
@ aymanalz:

I believe to remember that 1st pass statistics contain more reusable data than before, so the 2nd pass may be able to spare more calculations?


That's my guess. The release notes say something about sharing info with the subsequent pass. If that is indeed the reason, then there is a very significant speed boost. I have only tested two clips so far, and the second pass was about 30% and 18% faster respectively in each.

Midzuki
17th February 2017, 12:26
Buuuu, Windows 10 delete files EXE.
Trojan:Win32/Kandelo.B!cl

Windows 10 is the root of all evil.
Just uninstall it, problem solved :sly:

LigH
17th February 2017, 12:35
x265 2.3+7-c15f8bce9f4b (https://www.mediafire.com/file/0sr5xqkkb3pb609/x265_2.3%2B7-c15f8bce9f4b.7z) [Windows][GCC 6.3.0][32+64 bit] 8bit+10bit+12bit

Luma/Chroma(fixed) offsets for HDR/WCG content, and some more Unicode support for filenames in Windows builds. New CLI parameters:

--capture-csp <string> Specify color primaries from bt709, p3d65, bt2020 for the capture device. Default bt709
--[no-]hdr-opt Add luma and chroma offsets for HDR/WCG content. Default disabled

Selur
17th February 2017, 12:52
'--capture-csp' <- shouldn't the default be 'undef' of something similar, in case there was no capture device ?

Barough
17th February 2017, 14:30
x265 v2.3+7-c15f8bce9f4b (http://ge.tt/5zTMDri2) (MSYS/MinGW, GCC 6.3.0, 32 & 64bit 8/10/12bit Multilib EXE's)

x265 [info]: HEVC encoder version 2.3+7-c15f8bce9f4b
x265 [info]: build info [Windows][GCC 6.3.0][32 bit/64 bit] 8bit+10bit+12bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2

https://bitbucket.org/multicoreware/x265/commits/branch/default

NikosD
18th February 2017, 11:26
There is no doubt that in a week or so with the first official reviews of RyZen CPUs, x265 is going to play a significant role evaluating the performance of those CPUs in a very popular app - a SW HEVC encoder.

But on the other hand, RyZen CPUs are going to play a significant role evaluating x265 software too, regarding different kind of optimizations after years of continuous development.

What I mean is that we are going to see if x265 is more optimized in ILP (instruction level parallelism) taking advantage of vector instructions like AVX2, than scale out optimizations with multicore/ multithreaded CPUs.

If there was a poll/ bet which CPU is going to be faster Kabylake Core i7 7700K or RyZen 1800X using x265 at default clocks, I would put my money on RyZen 1800X

Atak_Snajpera
18th February 2017, 12:47
Something tells me that Ryzen may not have great performance in AVX2(FMA).

RyZen (128bit FADD + 128bit FMUL + 128bit FADD +128bit FMUL)
https://www.purepc.pl/image/news/2017/02/10_amd_zen_mnostwo_szczegolow_dotyczacych_nowej_architektury_12.jpg

SandyBridge (256bit FADD + 256bit FMUL)
http://images.anandtech.com/reviews/cpu/intel/Haswell/Architecture/snbexec.png

Haswell (FMA 256bit FADD + FMA 256 FMUL)
http://images.anandtech.com/reviews/cpu/intel/Haswell/Architecture/haswellexec.png

I think that history will repeat itself. AVX2 will probably be again 2 times slower than in Intel.
http://i.imgsafe.org/7222342d9c.png

NikosD
18th February 2017, 12:53
Nice tables, very old news.

AMD designed ZEN architecture with that trade-off.

To sacrifice SIMD AVX/AVX2 performance in order to gain much lower TDP, many more cores (double at least) and higher clocks.

We will all see if that bet is like bulldozer or an Intel's HEDT killer.