Log in

View Full Version : Thoughts on Xeon Phi for x264?


Blue_MiSfit
16th November 2012, 07:54
Hello Folks!

I was reading about Xeon Phi on Anandtech the other day, and it's an interesting concept:

LINK
(http://www.anandtech.com/show/6451/the-xeon-phi-at-work-at-tacc)
I was wondering if any of our beloved x264 devs have considered whether or not this is a suitable architecture for x264. I'd imagine it suffers from at least some of the same pitfalls as GPUs when it comes to video compression?

Thoughts?

Dust Signs
16th November 2012, 08:16
Hi,

I'm not an x264 dev, but I suppose that the following (taken from the link you provided) is really bad for x264 in terms of its existing optimizations:
The SIMD unit does not support MMX, SSE or AVX: the Xeon Phi has its own vector format.

Best regards
Dust Signs

nm
16th November 2012, 13:31
The biggest pitfall is that Xeon Phi's 512-bit wide SIMD unit only supports 32-bit integer arithmetic, not 8- and 16-bit. Haswell (with 256-bit AVX2) will probably beat it in H.264 encoding performance and only needs simple modifications to current x264 code.

Dark Shikari
16th November 2012, 20:53
Tests years ago showed Larrabee to be incredibly inefficient -- using significantly more power than a Core 2 (the best Intel CPU at the time) to achieve much lower performance with x264. Even if thread scaling across 50 cores was perfect, it would have still been ~1/3 slower for much greater power usage. This was with SIMD rewritten for the Larrabee.

Larrabee had a slow issue rate (1 instruction per cycle), a slow clock rate, and not nearly enough cores to compensate for this. With ~3-4x slower issue rate than a Core chip and ~3-4x slower clock rate, you're already a full order of magnitude slower per core even without considering other factors. This is the classic problem non-GPU many-core systems (Tilera, Larrabee, etc) seem to have -- getting dozens of cores on one chip seems to require making each core so slow as to be useless.

Unless the Xeon Phi has massively improved on this, it will likely pale in comparison to modern CPUs, at least on x264.

Blue_MiSfit
17th November 2012, 06:56
That's exactly he kind of commentary I was looking for. It was also exactly what I had feared :) Thanks, folks!

iwod
17th November 2012, 12:34
Better just wait for Haswell. Hopefully we see some double digit performance gain.

Hiritsuki
6th January 2014, 19:19
The 2nd Xeon Phi will support AVX, maybe the x265 x264 can run faster than usual CPU.

mandarinka
6th January 2014, 21:57
Not just that, the 14nm Phis will have Silvermont CPU cores (but with extra AVX-512 unit), and are supposed to support every isntruction that Haswell supports - except for TSX.

The performance of the thing will likely be very interesting (and ZOMG, integrated memory!), but the power of individual thread will still not be that high and there will be like 60-80 of the cores, which means that you might have issues utilising them all (= 120 threads in x264). If you compare to 12-core Xeon with HT (36 threads), the threading penalty might get significant.

benwaggoner
7th January 2014, 00:13
Not just that, the 14nm Phis will have Silvermont CPU cores (but with extra AVX-512 unit), and are supposed to support every isntruction that Haswell supports - except for TSX.

The performance of the thing will likely be very interesting (and ZOMG, integrated memory!), but the power of individual thread will still not be that high and there will be like 60-80 of the cores, which means that you might have issues utilising them all (= 120 threads in x264). If you compare to 12-core Xeon with HT (36 threads), the threading penalty might get significant.
For high resolutions and slow, high quality encoding, I've been able to get 100% CPU saturation on 2x8 core (32 logical cores) Ivy Bridge systems. So we can scale that far at least.

For offline encoding with lots of fast memory and fast read speeds, frame-based parallelism keeps the theoretical limit of CPU scalability really high. For example, with --preset veryslow and --rc-lookahead 120 can have a whole lot happening at the same time.

mandarinka
7th January 2014, 19:47
Actually, wait for a moment. I think that even under the Silvermont uarch, they will keep the 4-way threading, so multiply the amount of threads it exposes by 4 :) If the 72 core config is the maximum, that chip will expose 288 virtual cores (which leads to 432 software threads used under x264's model).

At least their slide says so. (http://extrahardware.cnews.cz/galerie/pictures/novinky/ehw/2013/11listopad/slajdy-k-xeonum-phi-14nm-generace-knights-landing-%28vr-zone%29/slajdy_k_xeonum_phi_14nm_generace_knights_landing_vr-zone_02.png)

Well, I think that if this thing is going to be useful with x264, it is going to be with multiple (several) streams being encoded simulataneously, on relatively fast settings. Dunno if the clock (1.0-1.3 GHz I'd guess) and IPC of the Silvermont cores (augmented by two AVX-512 units) is going to be enough to sustain several HD/4K encodes at once, even with fast settings.

iwod
17th January 2014, 05:12
Sorry Google didn't help. Where are all these 2nd Gen Xeon Phi information coming from? I dont see them on Anand.

Selur
17th January 2014, 08:38
@iwod: mandarinka links to: http://extrahardware.cnews.cz/

mandarinka
19th January 2014, 01:55
Here is an english site (source of the slides): http://vr-zone.com/articles/xeon-phi-knights-series-continues-landing-2015/64112.html

I don't know if the text is correct/reasonable interpretation, I was lazy to read it now... but what is on the slides themselves is supposed to be word directly from Intel.