Log in

View Full Version : Sharing Ryzen 1700 vs i7-6700 result


Pages : 1 [2] 3 4 5

NikosD
12th March 2017, 21:56
Come on guys...We are all useful here.

Take it easy and not personal.

All the arguments have been put on the table.

Let's take a rest and wait for more and better RyZen results.

Sagittaire
12th March 2017, 21:58
Yeah paste again that graph because I've already forgotten how it looks like. Seriously you are trolling now. I'm out...

well i will post benchmark in 4K with optimized x265 thread profil ...

you will see if I am troll ... :rolleyes:

cojj
12th March 2017, 22:44
Let's embrace intellectual conversation and stay away from slandering/taking things too personally.

From what I read above...
Sagittaire: And for 8C/16T at 100%, Ryzen at 3.6 Ghz will be on par with i7-6900K at 3.2 Ghz. => which means Broadwell is 12.5% more efficient than Zen
Atak_Snajpera: Haswell architecture in x265 encoding is 12% more efficient than Zen

From my understanding, Haswell=>Broadwell has very little performance gain so to me, it seems like your conclusions are the same (within the margin of error).


My Ryzen tests were done in clean-install of windows 10 with minimal processes running in the background (default windows crap only). I did not do any tuning as advised by other people (or benchmarking guides provided by AMD), it was simply out-of-box + slight o.c.

Sagittaire
12th March 2017, 22:54
My Ryzen tests were done in clean-install of windows 10 with minimal processes running in the background (default windows crap only). I did not do any tuning as advised by other people (or benchmarking guides provided by AMD), it was simply out-of-box + slight o.c.

if you want better thread perf, try with --pme command line (particulary good with high quality profil ... ;-)

--pmode command will work too with "veryslow" preset

try "--bframes 3 --frame-thread 8" for better threading

I will post 4K benchmark in few minute if you want compare real perf between i7-6700K and R7 1700

cojj
12th March 2017, 23:00
I will post 4K benchmark in few minute if you want compare real perf between i7-6700K and R7 1700

Looking forward to it :)
Could you still do 1080 benchmark too? That's my primary use-case at the moment.

NikosD
12th March 2017, 23:04
My Ryzen tests were done in clean-install of windows 10 with minimal processes running in the background (default windows crap only). I did not do any tuning as advised by other people (or benchmarking guides provided by AMD), it was simply out-of-box + slight o.c.

Your clock is 2.7% faster but your x265 result is more than 10% faster than the other RyZen.

So, he needs to retest.

BTW, can you put your clock to 3.0GHz and run again with 2 active cores (I'm not sure if you can disable 6 cores)

Can you upload in a server your FlopsCPU logs ?

Thanks.

Sagittaire
12th March 2017, 23:05
Looking forward to it :)
Could you still do 1080 benchmark too? That's my primary use-case at the moment.

will be 1080p for x264 and 2160p for x265.

But you can try 1080p with these command

--pme : work for all preset
--pmode : work for veryslow and placebo preset (else desactive reflist and produce really lower speed for other preset)
--bframes 3 --frame-thread 8: produce better threading for GOP

and try combinaison for these command for 1080p

Sagittaire
12th March 2017, 23:11
and here benchmark:
jfl1974.free.fr/Benchmark.zip

1080p for x264
2160p for x265

try and report your result on 8C/16T CPU:
-speed for x264 and x265
-CPU charge for x264 and x265

ShogoXT
12th March 2017, 23:15
I upgraded from a i7 920 so I dont mind too much on taking risks, but I didnt want another quad core for the third time.

So far I had 6-7 fps on my DVD -> 1080P remaster project on my OCed 920, but now its around 20fps. So im happy. I just hope games dont run too horribly.

I might wait to reformat when the next major windows 10 update comes out. They tend to be a bit messy.

I will also re benchmark and OC more when I get my Noctua installed.

In the meantime livestreaming 1080p 60fps faster profile on OBS gaming works pretty well.

cojj
12th March 2017, 23:17
Your clock is 2.7% faster but your x265 result is more than 10% faster than the other RyZen.

So, he needs to retest.

BTW, can you put your clock to 3.0GHz and run again with 2 active cores (I'm not sure if you can disable 6 cores)

Can you upload in a server your FlopsCPU logs ?

Thanks.

I think the other guy already said he was running some stuff when running the benchmark. source below:

Actually I had ran that with discord open after streaming on youtube with Ryzen, playing World of Warcraft. The FPU program actually froze a couple times getting those results.

I will try to the 2core test when I can. It's currently queued up with tasks which I cannot cancel easily :(

Sagittaire
12th March 2017, 23:20
So im happy. I just hope games dont run too horribly.

Rysen 7 will be between i7-4770K and i7-4790K for game.

If you don't have titan X, it's a good CPU for game in 1080p.

You can't expect really better perf with i7-7700K in 1080p if you have GTX 1060 or RX 480 GPU

CruNcher
13th March 2017, 00:16
Though Benchmarks are mostly done wrong also just FPS based benchmarks are mostly conducted wrong depending on the task to compute and the target to achieve neither FPS are always the correct way to benchmark at all.

Render times and Render Presentation times also play a big role in overall Perception and they even play a bigger role when we talking about VR, latency is not only FPS.

Still conducting FPS based Benchmarks on 3D Engine Realtime Workloads and many others is foolish no developer really judges anything based on the pure FPS counter these "Professional Benchmark Reviews" show mostly still to the Public.

NikosD
13th March 2017, 13:30
and here benchmark:
jfl1974.free.fr/Benchmark.zip

1080p for x264
2160p for x265

try and report your result on 8C/16T CPU:
-speed for x264 and x265
-CPU charge for x264 and x265

OK.

Here are the results of four systems:

Sandybridge Core i5 2400 6MB Cache,
Haswell Core i3 4170 3MB Cache,
Skylake Core i7 6700K 8MB Cache
Kabylake Core i7 7700K 8MB Cache and
RyZen R7 1700 8MB + 8MB Cache *

For all the systems, I tried to find out the IPC of x264 & x265 workload using your settings, eliminating the memory bottleneck.

So, all systems share these features:

a) Win 10 x64 system
b) SMT/ Hyperthreading disabled
c) Turbo disabled (+XFR disabled for RyZen)
d) Underclocked to 3.0GHz
e) Only 2 cores active

* Ryzen results are dual.
One running the apps on one CCX (2+0) and the other result running the apps on two CCXs (1+1)

And here comes the absolute surprise!

The 1+1 system is a little faster than 2+0 (!)

RyZen results have sent to me by Rigaya.

One possible explanation is the access to double L3 cache 16MB (1+1) vs 8MB (2+0)

I used also newer version of x264 (r2762) and x265 (v2.3+18 MS 2017 AVX/AVX2) than those versions included in the zip file to find out possible differences, on my personal systems (Sandybridge & Haswell)


Results x264: (r2744 - Default)


Skylake -2C/2T@3.0GHz -> 3.87 fps

Ryzen (1+1)-2C/2T@3.0GHz -> 3.49 fps

Haswell-2C/2T@3.0GHz -> 3.46 fps

Ryzen (2+0)-2C/2T@3.0GHz -> 3.42 fps

Sandy-2C/2T@3.0GHz -> 2.42 fps




Results x264: (r2762) newer version


Kabylake -2C/2T@3.0GHz -> 3.97 fps

Haswell-2C/2T@3.0GHz -> 3.37 fps

Sandy-2C/2T@3.0GHz -> 2.56 fps


Well, the results for default r2744 version give better fps for Haswell and much better gain at 43% against Sandybridge at the same clock.

Using a newer version r2762 of x264 the gain is smaller 32%

I did an analysis using the --no-asm and --asm SIMD instructions, where SIMD means MMX2, SSE2Fast, SSSE3, SSE4.2, AVX, FMA3, AVX2, LZCNT, BMI2.

The results gave me a 10% faster Haswell using --no-asm and better gains for MMX2 and SSE2Fast than Sandybridge at the same configuration 2C/2T@3.0GHz

AVX2 adds only 6% on Haswell, but there seems to be a little inconsistency in the results of x264 sometimes, because they can give a lot different results when running the same test again.

The larger performance advantage is by far the MMX2 and SSE2Fast instruction sets.

MMX2 gives x3 the performance of --no-asm for Sandy and x3.2 for Haswell.

SSE2Fast gives another 50% for Sandy over MMX2 and 66% for Haswell.

The rest of SIMD instruction sets (SSSE3, SSE4.2, AVX, FMA3, AVX2, LZCNT, BMI2) give from 0% (FMA3, LZCNT, BMI2) on Haswell to 8% (SSSE3) on both Haswell and Sandy.

Using data from the same analysis of RyZen and Kabylake systems, it seems that they follow exactly the same pattern.

RyZen and Kabylake have both a x3.3 MMX2 gain over --no-asm and a 60% gain of SSE2Fast for RyZen over MMX2.
Kaby gains a little more on the same SIMD SSE2fast which is 70%.

SSSE3 is 8% for both and AVX2 is only 4% for Kabylake.

But AVX2 is a little slower for RyZen than AVX and unfortunately LZCNT and BMI2 substract even more performance from RyZen.

So, it's better to add --asm AVX, in order to disable higher SIMD sets, when using RyZen and x264 to maintain maximum performance.

From the results above I can see that Skylake is faster than Haswell ~12% due to better SIMD architecture.

According to this table http://rigaya34589.blog135.fc2.com/blog-entry-916.html?sp it seems that Skylake has 50% faster integer add/sub and 100% integer mul/shifts than Haswell.

Compared to RyZen, some instructions are 4x faster, others 2x, a few less than 100% and for very few, Skylake is a little slower than RyZen.

Generally speaking, integer SIMD architecture of Skylake is faster than FP SIMD architecture compared to RyZen.

RyZen is very close to Haswell and the difference is inside the margin of error.



Now, using your version of x265 and the above four systems, I got these results:


Results x265: (v2.3+7)


Skylake-2C/2T@3.0GHz -> 0.86 fps

RyZen (1+1)-2C/2T@3.0GHz -> 0.71 fps

RyZen (2+0)-2C/2T@3.0GHz -> 0.70 fps

Haswell-2C/2T@3.0GHz -> 0.67 fps

Sandy-2C/2T@3.0GHz -> 0.44 fps

There is a good 52% gain for Haswell over Sandybridge, but less than x265 FHD benchmark of Atak that gave me a 71% gain.

Is it your version, your settings or the 4K sample ?
I don't know.

Skylake is faster than Haswell ~28% due to better architecture and faster AVX2 implementation (?), a lot better than x264 difference so I think that AVX2 implementation plays a significant role here.

Skylake is 95% (!) faster than Sandybridge at 3.0GHz.

But the real surprise here is RyZen that manages to overcome Haswell.
It seems that 4K x265 encoding, although it has more AVX2 integer optimizations, has better IPC for RyZen than 1080p x264 encoding compared to older Intel architectures (Sandybridge & Haswell) but not Skylake

Or maybe AVX2 optimized x264 encoding is actually slower for RyZen, than non-AVX2 settings.


I then tried optimized latest versions of x265 using MS VS 2017 for AVX and AVX2 sets from here http://msystem.waw.pl/x265/


Results x265: (v2.3+18 MS 2017 AVX/AVX2)

Haswell-2C/2T@3.0GHz -> 0.78 fps

Sandy-2C/2T@3.0GHz -> 0.48 fps


First of all those VS 2017 optimized version are 10% faster than yours for Sandybridge and 16% for Haswell.

Now, the gain for Haswell goes to 62.5% which is closer to x265 FullHD benchmark results from Atak.


One last thing.
Inside the zip file there are your encoding results after running the benchmark which are ~100MB.
You could delete them and re-archive the benchmark.zip without them in order to save 100MB.

P.S

Full speed (CPU 100% utilization) at stock and overclocked settings of Skylake Core i7 6700K 4C/8T,
Kabylake Core i7 7700K 4C/8T and RyZen R7 1700 8C/16T using x264 & x265 apps.


x264 (r2744):


Ryzen R7 1700@3.85GHz r2762 (--asm AVX) 23.46 fps

Ryzen R7 1700@stock 19.07 fps

Kabylake 7700K@4.8GHz 15.36 fps

Skylake 6700K@4.7GHz 14.82 fps

Skylake 6700K@stock 13.27 fps



x265(2.3+7):


Ryzen R7 1700@stock 3.19 fps

Skylake 6700K@4.7GHz 3.17 fps

Skylake 6700K@stock 2.86 fps

Sagittaire
13th March 2017, 20:24
OK.

I tried your benchmark and I had some weird results using that x264 version included.
The two CPUs are Core i3-4170 & Core i5-2400.

Your version is not the latest, but a version before latest.

It's r2744 and the latest version is r2762 as you can see here:
http://download.videolan.org/x264/binaries/win64/


v2744 is really recent version. r2762 will not change result for speed.


During testing the screen reported an fps ~3.5-4.5 fps IIRC but when finished the final line gave me 58.8 fps which is completely wrong obviously.
I then disabled HT and underclocked to 3.0GHz and gave me 9.44fps which is still huge and obviously wrong.
On the other hand, Sandybridge result at 3.0GHz with only 2 cores active gave me 1.84 fps which is very low I think.

it's simply because there are two different speed: ffmpeg frame server and x264 encoder itself. Initial speed for frame server is really high simply because x264 encoder have initial lookahead frame buffer.


Results x264: (r2762)
Haswell-2C/2T@3.0GHz -> 3.37 fps
Sandy-2C/2T@3.0GHz -> 2.56 fps

I was thinking that x264 is not too much AVX2 integer optimized, but I was wrong obviously, because I had seen a post of the developer a few years ago comparing Haswell vs Ivybridge with only 5% gain for Haswell at the same clock.

yes there are big AVX/AVX2 optimisation in x264. Certainely less than in x265 but really not negligible.


Are there any weird switches in x264 settings that are causing problems with benchmark tests ?

There no problem. It's really common command line for 1080p encoding: --preset slower --tune grain --crf 20



I then tried optimized latest versions of x265 using MS VS 2017 for AVX and AVX2 sets from here http://msystem.waw.pl/x265/

Results x265: (v2.3+18 MS 2017 AVX/AVX2)
Haswell-2C/2T@3.0GHz -> 0.78 fps
Sandy-2C/2T@3.0GHz -> 0.48 fps

I use gcc version in the benchmark. Be carrefull here, the author report than AVX version will be higher speed for AVX CPU:

All binaries do the same, so it is only about encoding speed. My recomendations are: for AVX2-CPU the fastest should be VS 2017 AVX2 version, for AVX-CPU – VS 2017 AVX version, for SSE4-CPU – VS 2017 none or GCC none version, for SSSE3-CPU – GCC SSSE3 version, for CPU without even SSSE3 – GCC none version. You can determine fastest version by comparing encoding time on the same short sample.


Anyway, it's really interessing and complete report ... ;-)

NikosD
13th March 2017, 20:42
v2744 is really recent version. r2762 will not change result for speed.


I think it will.

If you see change log file between the two versions is full of AVX2 optimizations.

So, for recent CPUs there will be difference.


it's simply because there are two different speed: ffmpeg frame server and x264 encoder itself. Initial speed for frame server is really high simply because x264 encoder have initial lookahead frame buffer.

OK.

So, where can I find the right x264 speed after running the benchmark ?

Because using your version and reading the last line of the CLI, the results are unreasonable.

Can you provide a screenshot for sure ?



yes there are big AVX/AVX2 optimisation in x264. Certainely less than in x265 but really not negligible.

Last version has surely a lot of AVX2 optimizations.

If I can find a way to measure your version according to my previous paragraph in this post, I will tell you about that too.


I use gcc version in the benchmark.

It's a little slower as you can see from Microsoft's Visual Studio 2017 AVX/AVX2 optimizations.

Sagittaire
13th March 2017, 20:50
OK.

So, where can I find the right x264 speed after running the benchmark ?

Because using your version and reading the last line of the CLI, the results are unreasonable.

Can you provide a screenshot for sure ?

at the encoding end, the speed from frame serveur and x264/x265 converge. You can find final speed at the end of encoding in x264/x265 log information (with file size, psnr, ssim ... etc etc)

NikosD
13th March 2017, 20:53
Is there somewhere a log file that I didn't see ?

I'm not in front of my PC right now and I don't remember a log file.

Sagittaire
13th March 2017, 21:00
Is there somewhere a log file that I didn't see ?

I'm not in front of my PC right now and I don't remember a log file.

Just in the end of command line windows at the end of encoding. Don't worry, you report the good information: ffmpeg has just x.x fps precision and x264/x265 have x.xx precision. Moreover the fps for framserver and x264/x265 will be strickly the same at the end of encoding.

NikosD
13th March 2017, 23:02
I have updated my previous post, adding x264 r2744 results and Skylake results from a friend.

Sagittaire
14th March 2017, 13:35
@cojj

You can test benchmark on rysen?

cojj
15th March 2017, 01:54
@Sagittaire
Sorry for late reply - I'm very busy this week + my ryzen rig is doing a lot of heavy work.
I will try to get it done over the weekend.

On the side note, I'm going to France next month for business trip. I hope people are nice there :)

mandarinka
15th March 2017, 12:51
I already posted it here (https://forum.doom9.org/showpost.php?p=1800971&postcount=123), but apparently encoding performance can be different (lower) in Balanced power plan under Windows 10, compared to High Performance plan.

http://abload.de/img/ryzen_coreparking6lkdn.png

This difference (and maybe also the difference when HPET is on/off) might be worth testing if you can.

Sagittaire
15th March 2017, 13:45
I already posted it here (https://forum.doom9.org/showpost.php?p=1800971&postcount=123), but apparently encoding performance can be different (lower) in Balanced power plan under Windows 10, compared to High Performance plan.

http://abload.de/img/ryzen_coreparking6lkdn.png

This difference (and maybe also the difference when HPET is on/off) might be worth testing if you can.

yes hardware.fr french preview discovered (cocorico!!!) this problem:
http://www.hardware.fr/articles/956-22/indices-performance.html

anyway, if you have CPU charge at 100% then it's not a problem like for x264.

however with application at charge CPU with less than 100%, you can have up to 10% improvement (x265 or game for exemple).

Atak_Snajpera
15th March 2017, 15:34
Updated Flops table
https://i.imgsafe.org/6da60ecb0f.png

Sagittaire
15th March 2017, 18:11
Updated Flops table
https://i.imgsafe.org/6da60ecb0f.png

and ... ???

it's just synthetic test. In real life application, Rysen outperform Haswell, and by far. And in most case, is on par with Broadwell-E.

Atak_Snajpera
15th March 2017, 19:11
It shows interesting data about RyZen.
Up to 8 threads RyZen 7 in integer calculation acts like SandyBridge+. When you put more load on SMTs then efficiency increases beyond SkyLake/KabyLake.
Similar story with SSE2. If you force all SMT's to work then efficiency goes above Haswell. RyZen is very uneven CPU for sure unlike SkyLake/KabyLake. Not to mention about CCX issues.

Sagittaire
15th March 2017, 20:11
It shows interesting data about RyZen.
Up to 8 threads RyZen 7 in integer calculation acts like SandyBridge+. When you put more load on SMTs then efficiency increases beyond SkyLake/KabyLake.
Similar story with SSE2. If you force all SMT's to work then efficiency goes above Haswell. RyZen is very uneven CPU for sure unlike SkyLake/KabyLake. Not to mention about CCX issues.

well it's not true:

1) the real efficiency is not flops/hz but flop/watt. With a calculation like that, all our phone would have intel soc and not arm soc.

2) one more time, in real application like x264 (we are on doom9 after all), rysen outperfom sandybridge, IvyBridge and Haswell and by far.

http://www.hardware.fr/getgraphimg.php?id=464&n=1

and for x264 efficiency (fps/watt), the best CPU in the area is R7 1700 and by far.

Atak_Snajpera
15th March 2017, 20:25
Once again you haven't understood my message. Nothing new here...

Sagittaire
15th March 2017, 20:34
Once again you haven't understood my message. Nothing new here...

well we are on doom9, I don't care about synthetic result in ALU/Hz and FPU/Hz. It is not even a correct measure of efficiency.

What is important is the practical results: and better ALU/Hz or FPU/hz seem not really usefull to Intel in x264 and x265 encoding, isn't it?

I will even generalize: This does not seem to be really useful in many applications.

Atak_Snajpera
15th March 2017, 20:40
Without this synthetic benchmark you would be still living in dream land regarding AVX2 performance. Hint: x265 likes AVX2 (256bit) alooooot.

By the way how is your x265 benchmark. Have you finally optimized it for RyZen with those crazy settings? ;)

Sagittaire
15th March 2017, 21:37
Without this synthetic benchmark you would be still living in dream land regarding AVX2 performance. Hint: x265 likes AVX2 (256bit) alooooot.


Certainely ... but i7-6900K and Rysen 7 1800X will be on par for x265 encoding. It's like that even if it's hard to understand for you.


By the way how is your x265 benchmark. Have you finally optimized it for RyZen with those crazy settings? ;)

Well it's really simple: 2160p encoding with --pme command line for 8C/16T

Anyway you can make 1080p encoding with 4C/8T to compare Rysen (4+0 configuration) with Intel 4C/8T CPU. It's really simple to make that too ... :devil:

ShogoXT
15th March 2017, 22:17
I like more data no matter what, even if it's a synthetic result.

There is news about Ryzen having a bug with FMA3 workloads. I wonder if that's why when I ran the fpu benchmarks it froze up a few times?

Also when I turn on the computer after it being off for a while, it doesn't even reach the post screen. I have to turn it off again to get it to boot. I suspect my overclock or the motherboard....

Sagittaire
15th March 2017, 22:53
and here benchmark:
jfl1974.free.fr/Benchmark.zip

1080p for x264
2160p for x265

try and report your result on 8C/16T CPU:
-speed for x264 and x265
-CPU charge for x264 and x265

and here result with R7 1700 8C/16T @stock 3.0/3.2/3.7 GHz:
x265 4K with CPU charge at 100%:
encoded 1007 frames in 315.68s (3.19 fps), 15638.93 kb/s

X264 2K with CPU charge at 100%:
encoded 1649 frames, 19.07 fps, 8510.48 kb/s


Result with i5 3550 4C/4T at 3.5 Ghz:
x265 4K with CPU charge at 100%:
1.14 fps

X264 2K with CPU charge at 100%:
6.57 fps

NikosD
15th March 2017, 22:56
and here result with R7 1700 8C/16T:


Stock clocks ?

Or overclocked ?

The default/stock clock of R7 1700 8C/16T is 3.1GHz.

In x265 has exactly the same speed of 6700K@4.7GHz, but in x264 RyZen is definitely faster.

Sagittaire
15th March 2017, 22:59
Stock clocks ?

Or overclocked ?

The default/stock clock of R7 1700 8C/16T is 3.1GHz.

at stock base/turbomin/turbomax 3.0/3.2/3.7 GHz

Sagittaire
15th March 2017, 23:10
Stock clocks ?

Or overclocked ?

The default/stock clock of R7 1700 8C/16T is 3.1GHz.

In x265 has exactly the same speed of 6700K@4.7GHz, but in x264 RyZen is definitely faster.

you can extrapole result easily:

result with R7 1700 8C/16T:
x265 4K with CPU charge at 100%:
encoded 1007 frames in 315.68s (3.19 fps), 15638.93 kb/s

X264 2K with CPU charge at 100%:
encoded 1649 frames, 19.07 fps, 8510.48 kb/s


result with R7 1700X 8C/16T (estimation for x265):
x265 4K with CPU charge at 100%:
3.44 fps

X264 2K with CPU charge at 100%:
encoded 1649 frames, 20.61 fps, 8510.48 kb/s


result with R7 1800X 8C/16T (estimation for x265):
x265 4K with CPU charge at 100%:
3.61 fps

X264 2K with CPU charge at 100%:
encoded 1649 frames, 21.62 fps, 8510.48 kb/s

Motenai Yoda
16th March 2017, 17:08
well it's not true:

1) the real efficiency is not flops/hz but flop/watt. With a calculation like that, all our phone would have intel soc and not arm soc.

[...]

and for x264 efficiency (fps/watt), the best CPU in the area is R7 1700 and by far.

yep but you have to take into account the real power consuption, nor tdp, intels never go over their tdp, amds a lot.

Well it's really simple: 2160p encoding with --pme command line for 8C/16T
why pme over pmode?

microchip8
16th March 2017, 20:40
yep but you have to take into account the real power consuption, nor tdp, intels never go over their tdp, amds a lot.


why pme over pmode?

I don't know where you get that from. The power sensors on my FX8350 CPU reports 124.5W when I run an encode using all cores. The FX8350 is a 125W CPU so what the sensor reports is accurate.

Sagittaire
16th March 2017, 21:26
why pme over pmode?

pmode desactive reflist option and reflist is really powerfull option to optimisize speed for preset < veryslow. You can use pmode only with preset veryslow or placebo.

anyway --preset veryslow --pme --pmode is certainely really good to optimisize encoding speed for 1080p source.

evilr00t
16th March 2017, 23:42
pmode desactive reflist option and reflist is really powerfull option to optimisize speed for preset < veryslow. You can use pmode only with preset veryslow or placebo.

anyway --preset veryslow --pme --pmode is certainely really good to optimisize encoding speed for 1080p source.

Just... no.

CPU: E5-2695v3
1080p60 Dark Souls 3 capture:
Common flags: --preset veryslow --qg-size 8 --crf 24.5 --lambda-file (new file from x265 thread), --frames 500 --threads 28
x265 stock : 2.58 fps (100%)
x265 --pmode : 2.43 fps (94.2%, slower)
x265 --pme : 2.54 fps (98.4%, slower)
x265 --pmode --pme: 2.29 fps (88.8%, slower)

CPU use graph:
http://imgur.com/a/V18RK

pmode+pme can't even saturate this CPU, but as you can see, it slows down the encode. More work doesn't mean it runs faster, especially on SMT systems where going over 50% means you are slowing something else down.

Here's a retest without any custom settings:

Z:\>a:\avs4x265 -P z:\x265_main.exe --preset veryslow --frames 750 --pools 28 [--pme] [--pmode] --crf 24.5 -o NUL DS3.avs

stock : encoded 750 frames in 305.89s (2.45 fps), 6178.28 kb/s, Avg QP:32.15 - 100%
--pmode : encoded 750 frames in 335.43s (2.24 fps), 5985.25 kb/s, Avg QP:32.19 - 91.4%
--pme : encoded 750 frames in 311.13s (2.41 fps), 6178.28 kb/s, Avg QP:32.15 - 98.3%
--pmode --pme: encoded 750 frames in 354.57s (2.12 fps), 5985.25 kb/s, Avg QP:32.19 - 86.5%

x265 [info]: HEVC encoder version 2.3+17-6e348252e902
x265 [info]: build info [Windows][GCC 6.3.0][64 bit] 8bit

mandarinka
17th March 2017, 15:15
I don't know where you get that from. The power sensors on my FX8350 CPU reports 124.5W when I run an encode using all cores. The FX8350 is a 125W CPU so what the sensor reports is accurate.

Indeed. If you meassure 12V CPU line power, you can confirm this. It doesn't go over TDP at default settings (unless you disable TDP limits in bios and so on). The wattage/ampers on the power supply line will be somewhat higher than 125 W when you measure, but that is because you measure before voltage regulators converting from 12V to vcore. That creates extra power consumption (Intel or AMD platform) as evidenced by the VRMs generating substantial heat. The losses here can be 15-20 % added on top of the TDP of the CPU. And of course, losses on motherboard VRM don't count towards CPU's TDP, neither with Intel not AMD.

NikosD
21st March 2017, 12:50
Updated Flops table


Judging by that extremely optimized application for finding the maximum actual flops as close as possible to max theoretical flops that I posted here https://forum.doom9.org/showthread.php?p=1801309#post1801309, your table based on Intel optimized Flops.c might not be accurate.

The application is famous now as the "FMA3 bug" of RyZen, but TheStilt has already managed to run it on RyZen 1700 and posted the results here:
http://forum.hwbot.org/showpost.php?p=480934&postcount=34

According to the results of that app, the FMA3 implementation of RyZen is extremely efficient (~100% of theoretical flops) with Haswell optimized binary for 128bit SSE/SSE2 and 256bit AVX/FMA3 using FP32 and FP64, compiled by MS Visual Studio 2015.

Now, if you see the FP64 results, the FADD and FMUL tests have the same performance for both 128bit and 256bit (SSE2 vs AVX1) using that app.

BUT using FADD+FMUL (not FMA which is something different), RyZen almost doubles (~77%) the performance of FP64 compared to FADD, reaching the results of FMA3 - only ~28% difference.

So, my suggestion is to recompile flops.c using MS Studio 2017 for SSE2/AVX/AVX2-FMA3 instructions and run it again on RyZen.

I don't know if it will be faster than ICC in absolute numbers, but I think we will see different results in relative numbers between different instruction sets SSE2/AVX/AVX2-FMA3.

I will post my binaries of flops.c compiled by MS Studio 2013 and your GUI, in order to be tested by a RyZen user, although a more recent version of MS Studio like 2017 could make a difference.

NikosD
21st March 2017, 13:36
@All RyZen owners.

OK, so this is the GUI of Atak but with my compilations of a little old MS Visual Studio 2013.

You can get it here:
https://mega.nz/#!s9cFXA6a!q-co5cemP-ZaeLFdxcNgVGtgPPUVSczBOLUZRVLEXak

After running the app, please go to menu "Main" - > "Save screenshot" and upload the image of your results.

It should look something like that:
https://s11.postimg.org/wablgq6df/Core_i3_4170_3_0_GHz_Cache_3_0_GHz_HT_OFF.jpg

Sagittaire
22nd March 2017, 01:14
for R7 1700 ...

http://reho.st/self/976028f757a85dedaf36b369bc991f96df523975.jpg

burfadel
22nd March 2017, 02:16
for R7 1700 ...

http://reho.st/self/976028f757a85dedaf36b369bc991f96df523975.jpg

Interesting how the mutli-core on Ryzen in all cases is more than 8 times more power than single core. This suggests that not all of the single core was saturated resulting in the SMT threads being beneficial when running the multi-core test.

If you scale back the results from multi-core to what single core should read, if the whole core was saturated:

x86: 64.13 MFLOPS
x87: 3.06 GFLOPS
SSE2: 4.91 GFLOPS
AVX: 4.99 GFLOPS
AVX2: 7.69 GFLOPS

It suggests something isn't quite right somewhere :)

Atak_Snajpera
22nd March 2017, 13:37
for comparision binaries compiled by Intel Compiler 15
Ryzen@4GHz and RyZen@3.7GHz
http://i.imgur.com/a44Bmnn.jpg http://i.imgur.com/UHgQYiL.jpg

Intel compiler once again generates the fastest code for AMD :) This clearly debunks any conspiracy theories that Intel compiler favours intel cpus.

Sagittaire
22nd March 2017, 14:05
for comparision binaries compiled by Intel Compiler 15
Ryzen@4GHz and RyZen@3.7GHz
http://i.imgur.com/a44Bmnn.jpg http://i.imgur.com/UHgQYiL.jpg

Intel compiler once again generates the fastest code for AMD :) This clearly debunks any conspiracy theories that Intel compiler favours intel cpus.

http://img4.hostingpics.net/pics/904693kabini.png

not so far than Jaguar or i3-4170 ... :devil:

This benchmark seem dont work really well: in practice, R7-1800X is on par with i7-6900K for all heavy application (x264, x265, 3D-calculation ... etc etc)

NikosD
22nd March 2017, 14:53
for R7 1700 ...


Intel compiler once again generates the fastest code for AMD :) This clearly debunks any conspiracy theories that Intel compiler favours intel cpus.

Not exactly the results that I was expecting from MS VC 2013.
I get similar gains with Sandy and Haswell like RyZen.

Probably MS VC 2013 compiler is too old to vectorize flops.c for SSE2/AVX/AVX2-FMA3.

Based on the fact that MS VC 2017 makes the fastest executable for x265, even faster than Intel's compiler and GCC, someone with MS Visual Studio 2017 should try to make a fast executable of flops.c in order for us to test any differences in relative numbers.

It seems that flops.c is better optimized by Intel's compiler and the autovectorizer.

Now, regarding that myth of Intel's compiler, it wasn't a myth.

Probably Intel after paying a multi-million $ penalty to AMD, after cheating and manipulating in various ways compilers, OEMs etc and grabbing a CPU share that is worth a lot more than the penalty itself, could now play fair (at least more than in the past)

But we don't compare in this example different compilers to find the faster one, but only to find specific differences in instructions sets.

The truth is that RyZen mimics a lot Intel's HW and can manage to run fast enough executables optimized for Intel, BUT that doesn't mean that it is optimized for RyZen.

The other guy with that Haswell optimized "FMA3 bug" executable had managed to optimize 128bit/256bit FP32 and FP64 flops app for Bulldozer and Piledriver compiled by MS Visual Studio 2015, having a lot of asm optimizations for those architectures.

I think we have to wait for a RyZen optimized FLOPS app by him.

burfadel
22nd March 2017, 14:54
http://img4.hostingpics.net/pics/904693kabini.png

not so far than Jaguar or i3-4170 ... :devil:

This benchmark seem dont work really well: in practice, R7-1800X is on par with i7-6900K for all heavy application (x264, x265, 3D-calculation ... etc etc)

Yeah something is bung there. 8 core should essentially be 8 times the performance of single core, the SMT threads don't count because they allow a parellel process for unused core power. If the core is saturated, they serve no purpose.

Your CPU scales correctly, the 4.2 is probably because there was some other process that momentarily ran during the single core test.

Atak_Snajpera
22nd March 2017, 16:01
Based on the fact that MS VC 2017 makes the fastest executable for x265, even faster than Intel's compiler and GCC, someone with MS Visual Studio 2017 should try to make a fast executable of flops.c in order for us to test any differences in relative numbers.
Why don't you check yourself?
https://www.visualstudio.com/downloads/