Log in

View Full Version : x265 HEVC Encoder


Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 [112] 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197

LigH
22nd August 2017, 13:32
Interesting... I wonder if it is technically still as powerful as x265 (means, if GPGPU algorithms can still be as complex as CPU algorithms), and also if it is allowed to be still named something-with-x265. Just curious...

burfadel
22nd August 2017, 15:07
GPU Fork - x265 HEVC OpenCL or CUDA Encoder

gcc>=7 , -O2 -fopenacc

https://bitbucket.org/vovagubin/x265-hevc-opencl-or-cuda-encoder

OpenCL would make more sense, CUDA is proprietory and NVidia only.

JohnLai
22nd August 2017, 17:44
GPU Fork - x265 HEVC OpenCL or CUDA Encoder

gcc>=7 , -O2 -fopenacc

https://bitbucket.org/vovagubin/x265-hevc-opencl-or-cuda-encoder


Hmmm......process that could be offloaded.....

Motion search + estimation? All big three gpu have some sort of motion search/estimation code available.

Maybe lookahead (didn't work well in x264), but somehow nvidia is able to implement it using CUDA cores in conjunction with its proprietary NVENC. The code ain't available from nvidia....hmmm....but being able to do it fast.....

Maybe SAO? Some form of low complexity SAO like what nvidia used in Pascal?

Or offloading adaptive quantization calculation like that nvidia did?


*Gotta hand it off to Nvidia software engineers.....they are leading.......in GPGPU usage for video acceleration part....with its CUDA....

easyfab
22nd August 2017, 18:12
or using FEI from intel if it come to HEVC .
I don't know if this could be interesting to mix with x264/X265

from https://github.com/01org/intel-vaapi-driver/issues/228

"The main highlight of FEI is the possibility to split the encoding process into two phases, first is ENC and the second is PAK.ENC is the operation which performs all motion vector calculation and prediction.PAK is doing all transformations and entropy coding. Without having FEI, the whole ENC+PAK is a black box to middleware, but with FEI user can extract the output of ENC and feed PAK with a custom enhanced motion vectors and macroblock prediction modes."

x265_Project
22nd August 2017, 23:12
Interesting... I wonder ... if it is allowed to be still named something-with-x265.
No. Anyone can fork a GPL software project, but they can't copy a trademark.

x265_Project
22nd August 2017, 23:16
or using FEI from intel if it come to HEVC .
I don't know if this could be interesting to mix with x264/X265

from https://github.com/01org/intel-vaapi-driver/issues/228

"The main highlight of FEI is the possibility to split the encoding process into two phases, first is ENC and the second is PAK.ENC is the operation which performs all motion vector calculation and prediction.PAK is doing all transformations and entropy coding. Without having FEI, the whole ENC+PAK is a black box to middleware, but with FEI user can extract the output of ENC and feed PAK with a custom enhanced motion vectors and macroblock prediction modes."
Intel's Flexible Encoder Interface isn't the right way to go. They offer lower-level OpenCL libraries that we've looked at to do these functions. Keep in mind that if we replace a whole section of code with a hardware encoder's functionality, we end up with a very different thing.

x265_Project
22nd August 2017, 23:26
Hmmm......process that could be offloaded.....

Motion search + estimation? All big three gpu have some sort of motion search/estimation code available.

Maybe lookahead (didn't work well in x264), but somehow nvidia is able to implement it using CUDA cores in conjunction with its proprietary NVENC. The code ain't available from nvidia....hmmm....but being able to do it fast.....

Maybe SAO? Some form of low complexity SAO like what nvidia used in Pascal?

Or offloading adaptive quantization calculation like that nvidia did?


*Gotta hand it off to Nvidia software engineers.....they are leading.......in GPGPU usage for video acceleration part....with its CUDA....
The challenge is speed. If you offload small chunks of work to a GPU, the CPU won't have to do that work, so it can effectively speed up. But if you don't get the result of those tasks back from the GPU before they're needed, you won't accelerate.
GPUs are very good at work that can be highly parallelized, and not good at work that has serial dependencies. Video encoding has many serial dependencies. The block you're encoding right now makes reference to neighboring blocks, or blocks in other frames, all of which must be completely finished encoding before you can efficiently encode the current block.

x265_Project
23rd August 2017, 06:11
Be sure to vote for x265 in the Streaming Media Reader's Choice poll... http://www.streamingmedia.com/ReadersChoice/2017/Vote.aspx

Ma
23rd August 2017, 08:56
I've made 10-bit "true placebo" 2500 kb/s test with Sintel movie (from 8-bit 4K png source).

Command line:
f:\speed\Sintel>ffmpeg -framerate 24 -start_number 00000001 -i %08d.png -pix_fmt yuv420p16 -vf "scale=1920:-4:flags=bicubic+accu
rate_rnd+full_chroma_int+full_chroma_inp:param0=-0.5:param1=0.25,setsar=1" -v warning -strict -1 -f yuv4mpegpipe - | x265 -D10
--bitrate 2500 -I480 --psnr --ssim -p9 --no-psy-rd --multi-pass-opt-distortion --rc-lookahead 120 --bframes 12 --ref 6 --subme
7 -F1 --y4m - w1.hevc --pass 1
[yuv4mpegpipe @ 00000000005603a0] Warning: generating non standard YUV stream. Mjpegtools will not work.
y4m [info]: 1920x816 fps 24/1 i420p16 sar 1:1 unknown frame count
raw [info]: output file: w1.hevc
x265 [info]: HEVC encoder version 2.5+11-d58761d8db4a
x265 [info]: build info [Windows][MSVC 1911][64 bit] 10bit+8bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX
x265 [warning]: --psnr used with psy on: results will be invalid!
x265 [warning]: --tune psnr should be used if attempting to benchmark psnr!
x265 [info]: Main 10 profile, Level-4 (Main tier)
x265 [info]: Thread pool created using 4 threads
x265 [info]: Slices : 1
x265 [info]: frame threads / pool features : 1 / wpp(13 rows)
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 4 inter / 4 intra
x265 [info]: ME / range / subpel / merge : star / 92 / 7 / 5
x265 [info]: Keyframe min / max / scenecut / bias: 24 / 480 / 40 / 5.00
x265 [info]: Lookahead / bframes / badapt : 120 / 12 / 2
x265 [info]: b-pyramid / weightp / weightb : 1 / 1 / 1
x265 [info]: References / ref-limit cu / depth : 6 / off / off
x265 [info]: AQ: mode / str / qg-size / cu-tree : 1 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress : ABR-2500 kbps / 0.60
x265 [info]: tools: rect amp rd=6 rdoq=2 psy-rdoq=1.00 tskip signhide tmvp
x265 [info]: tools: b-intra strong-intra-smoothing deblock sao stats-write
x265 [info]: frame I: 187, Avg QP:16.78 kb/s: 16314.61 PSNR Mean: Y:51.527 U:54.999 V:54.462 SSIM Mean: 0.994805 (22.844dB
)
x265 [info]: frame P: 5475, Avg QP:18.42 kb/s: 6150.87 PSNR Mean: Y:50.783 U:54.985 V:54.451 SSIM Mean: 0.993361 (21.779dB
)
x265 [info]: frame B: 15650, Avg QP:23.87 kb/s: 992.31 PSNR Mean: Y:50.160 U:56.112 V:55.537 SSIM Mean: 0.992487 (21.242dB
)
x265 [info]: Weighted P-Frames: Y:10.6% UV:7.3%
x265 [info]: Weighted B-Frames: Y:8.8% UV:5.7%
x265 [info]: consecutive B-frames: 17.7% 9.1% 8.2% 42.3% 6.2% 9.2% 3.2% 2.5% 0.4% 0.2% 0.2% 0.2% 0.6%

encoded 21312 frames in 135210.11s (0.16 fps), 2451.98 kb/s, Avg QP:22.41, Global PSNR: 51.632, SSIM Mean Y: 0.9927318 (21.386 d
B)

f:\speed\Sintel>ffmpeg -framerate 24 -start_number 00000001 -i %08d.png -pix_fmt yuv420p16 -vf "scale=1920:-4:flags=bicubic+accu
rate_rnd+full_chroma_int+full_chroma_inp:param0=-0.5:param1=0.25,setsar=1" -v warning -strict -1 -f yuv4mpegpipe - | x265 -D10
--bitrate 2500 -I480 --psnr --ssim -p9 --no-psy-rd --multi-pass-opt-distortion --rc-lookahead 120 --bframes 12 --ref 6 --subme
7 -F1 --y4m - w2.hevc --pass 2
[yuv4mpegpipe @ 00000000004403a0] Warning: generating non standard YUV stream. Mjpegtools will not work.
y4m [info]: 1920x816 fps 24/1 i420p16 sar 1:1 unknown frame count
raw [info]: output file: w2.hevc
x265 [info]: HEVC encoder version 2.5+11-d58761d8db4a
x265 [info]: build info [Windows][MSVC 1911][64 bit] 10bit+8bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX
x265 [warning]: --psnr used with psy on: results will be invalid!
x265 [warning]: --tune psnr should be used if attempting to benchmark psnr!
x265 [info]: Main 10 profile, Level-4 (Main tier)
x265 [info]: Thread pool created using 4 threads
x265 [info]: Slices : 1
x265 [info]: frame threads / pool features : 1 / wpp(13 rows)
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 4 inter / 4 intra
x265 [info]: ME / range / subpel / merge : star / 92 / 7 / 5
x265 [info]: Keyframe min / max / scenecut / bias: 24 / 480 / 40 / 5.00
x265 [info]: Lookahead / bframes / badapt : 120 / 12 / 2
x265 [info]: b-pyramid / weightp / weightb : 1 / 1 / 1
x265 [info]: References / ref-limit cu / depth : 6 / off / off
x265 [info]: AQ: mode / str / qg-size / cu-tree : 1 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress : ABR-2500 kbps / 0.60
x265 [info]: tools: rect amp rd=6 rdoq=2 psy-rdoq=1.00 tskip signhide tmvp
x265 [info]: tools: b-intra strong-intra-smoothing deblock sao stats-read
x265 [info]: frame I: 187, Avg QP:15.44 kb/s: 18757.12 PSNR Mean: Y:53.007 U:56.245 V:55.748 SSIM Mean: 0.996481 (24.535dB
)
x265 [info]: frame P: 5475, Avg QP:19.54 kb/s: 6299.14 PSNR Mean: Y:50.557 U:55.238 V:54.725 SSIM Mean: 0.994567 (22.650dB
)
x265 [info]: frame B: 15650, Avg QP:25.57 kb/s: 977.83 PSNR Mean: Y:49.518 U:56.082 V:55.533 SSIM Mean: 0.993497 (21.869dB
)
x265 [info]: Weighted P-Frames: Y:3.3% UV:2.1%
x265 [info]: Weighted B-Frames: Y:1.0% UV:0.5%
x265 [info]: consecutive B-frames: 17.7% 9.1% 8.2% 42.3% 6.2% 9.2% 3.2% 2.5% 0.4% 0.2% 0.2% 0.2% 0.6%

encoded 21312 frames in 124205.99s (0.17 fps), 2500.86 kb/s, Avg QP:23.93, Global PSNR: 51.261, SSIM Mean Y: 0.9937978 (22.075 d
B)

The result is watchable but encoding speed is a bit too slow.
Result movie -- www.msystem.waw.pl/x265/sintel2500.mkv

Sagittaire
23rd August 2017, 13:58
AMD Threadripper 1950X in x265
https://cubeupload.com/im/bKv6yQ.png

Source -> https://youtu.be/TJiP1bKxLkU?t=3m43s

well, with 4K encoding with this command line:

ffmpeg\ffmpeg.exe -i Sample\Exodus_UHD_HDR_Exodus_draft.mp4 -an -f rawvideo - | x265\x265.exe --input-res 3840x2160 --fps 23.976 - -o Output\x265_2160p.265 --input-depth 10 --output-depth 10 --crf 24 --preset medium --tune grain


|---------------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|
| CPU | x264 | x265 | LAVC | auto | MMX2 | SSE | SSE2 | SSE3 | SSE4 | AVX | AVX2 | All |
|---------------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|
| Core i9-7900X | 32.97 | 6.24 | 157 | 4.70 | 1.57 | 1.58 | 2.50 | 2.74 | 4.07 | 3.98 | 4.82 | N/A |
|---------------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|
| Threadripper 1950X | 37.59 | 6.50 | 136 | 4.65 | 2.02 | 2.00 | 2.95 | 3.10 | 3.89 | 4.10 | 4.26 | N/A |
|---------------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|



With core i9-7900X @4.5 Ghz and 1950X @stock

at stock, core i9-7900X will be at 27.97 fps for x264 and 5.29 fps for x265

As I always say, massive instance encoding is real problem for Rysen.

Anyway certainely that i9-7960X 16C/32T or i9-7980XE 18C/36T will make really better result than 1950X for x264 and x265 encoding. But it's not the case for i9-7800X 10C/20T even with massive OC at 4.5 Ghz.

Atak_Snajpera
23rd August 2017, 14:25
at stock, core i9-7900X will be at 27.97 fps for x264 and 5.29 fps for x265
You do realize that stock clock on core i9-7900X in practice means 4GHz? The problem with 1950x is that during x265 encoding clock drops to base 3.4 GHz while intel can sustain steady 4GHz.

As I always say, massive instance encoding is real problem for Rysen.
Poor ryzen then... I thought it was designed to be a RIPPER OF THE THREADS. Oh well another marketing lie. Nothing new.

Sagittaire
23rd August 2017, 14:37
You do realize that stock clock on core i9-7900X in practice means 4GHz? The problem with 1950x is that during x265 encoding clock drops to base 3.4 GHz while intel can sustain steady 4GHz.

In fact Rysen don't use turbo for x264/x265 encoding: TDP at 180W produce big limitation for that. 1950X have even throttling under 3.4 Ghz base frequency with x264 encoding.

http://www.hardware.fr/medias/photos_news/00/54/IMG0054462.png


Poor ryzen then... I thought it was designed to be a RIPPER OF THE THREADS. Oh well another marketing lie. Nothing new.

it's like that. You have the same problem for 7zip or winrar encoding.

Moreover make 5x x265 instance in 1080p is really massive and useless threading usage. Certainely that 2x or 3x instance is suffisant to have CPU charge at 100% even for 1950X.

Atak_Snajpera
23rd August 2017, 15:54
Moreover make 5x x265 instance in 1080p is really massive and useless threading usage. Certainely that 2x or 3x instance is suffisant to have CPU charge at 100% even for 1950X.
Well rules are the same for all cpus in my benchmark. Besides Intel with less cores has more things to do than Threadripper. The speed penalty should be even higher there. I'm not going to "optimize" my benchmark to show one particular cpu in better light.
Besides 2 extra instances should not be a big problem. Look at older CPUs 2C/2T or 4C/4T. Scalling is good despite running 5 x 265 at the same time.

44.6 fps - Intel Core i9-7900X @ 3.3GHz ( 10C / 20T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
43.6 fps - AMD Threadripper 1950X @ 3.4GHz ( 16C / 32T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
33.9 fps - Intel Core i7-5960X @ 4.4GHz^ ( 8C / 16T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
25.5 fps - AMD Ryzen 7 1700 @ 3.7GHz^ ( 8C / 16T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
24.8 fps - Intel Core i7-6700K @ 4.8GHz^ ( 4C / 8T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
23.7 fps - Intel Core i7-7700K @ 4.8GHz^ ( 4C / 8T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
23.3 fps - Intel Core i7-6700K @ 4.7GHz^ ( 4C / 8T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
22.6 fps - Intel Core i7-6700K @ 4.5GHz^ ( 4C / 8T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
18.6 fps - Intel Core i7-6600K @ 4.5GHz^ ( 4C / 4T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
16.4 fps - Intel Xeon E5-2690 @ 2.9GHz ( 8C / 16T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX
15.8 fps - Intel i7-6770HQ @ 2.6GHz ( 4C / 8T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
15.8 fps - Intel Core i5-4690K @ 4.2GHz^ ( 4C / 4T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
15.6 fps - Intel Xeon E3 1231 v3 @ 3.4GHz ( 4C / 8T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
14.7 fps - Intel Xeon E5-2670 @ 2.6GHz ( 8C / 16T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX
14.7 fps - Intel Core i7-3930K @ 3.2GHz ( 6C / 12T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX
13.9 fps - Intel Core i5-7400 @ 3.0GHz ( 4C / 4T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
13.8 fps - Intel Core i5-6500 @ 3.2GHz ( 4C / 4T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
13.5 fps - AMD Ryzen 5 1500X @ 3.5GHz ( 4C / 8T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
13.3 fps - Intel Core i5-4570S @ 3.6GHz^ ( 4C / 4T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
12.1 fps - AMD FX-8320 Eight-Core @ 4.32GHz^ ( 4C / 8T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX XOP FMA4 FMA3 LZCNT BMI1
12.0 fps - Intel Core i5-4460 @ 3.2GHz ( 4C / 4T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
11.2 fps - Intel Core i7-3770K @ 3.5GHz ( 4C / 8T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX
9.7 fps - Intel Core i3-7100 @ 3.9GHz ( 2C / 4T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
8.8 fps - AMD Ryzen 3 1300X @ 3.5GHz ( 4C / 4T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
8.3 fps - Intel Core i5-2400 @ 3.7GHz^ ( 4C / 4T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX
7.8 fps - Intel i7-3612QM @ 2.1GHz ( 4C / 8T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX
7.6 fps - Intel Core i7-7500U @ 3.5GHz^ ( 2C / 4T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
6.1 fps - Intel Celeron G3900 @ 4.0GHz^ ( 2C / 2T ) MMX2 SSE2Fast SSSE3 SSE4.2 LZCNT
5.9 fps - AMD Athlon X4 760K Quad Core @ 4.5GHz^ ( 2C / 4T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX XOP FMA4 FMA3 LZCNT BMI1
5.6 fps - Intel Pentium G3258 @ 4.2GHz^ ( 2C / 2T ) MMX2 SSE2Fast SSSE3 SSE4.2 LZCNT
5.5 fps - Intel Xeon X5470 @ 3.33GHz ( 4C / 4T ) MMX2 SSE2Fast SSSE3 SSE4.1 Cache64
4.8 fps - Intel Core i3-3220 @ 3.3GHz ( 2C / 4T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX
4.4 fps - Intel Core2 Quad Q8200 @ 2.8GHz^ ( 4C / 4T ) MMX2 SSE2Fast SSSE3 SSE4.1 Cache64
4.2 fps - Intel Core i3-2100 @ 3.1GHz ( 2C / 4T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX
3.7 fps - Intel Core2 Quad Q8200 @ 2.33GHz ( 4C / 4T ) MMX2 SSE2Fast SSSE3 SSE4.1 Cache64
3.6 fps - Intel Core i3-4005U @ 1.7GHz ( 2C / 4T ) MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2

pradeeprama
23rd August 2017, 16:33
well, with 4K encoding with this command line:




|---------------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|
| CPU | x264 | x265 | LAVC | auto | MMX2 | SSE | SSE2 | SSE3 | SSE4 | AVX | AVX2 | All |
|---------------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|
| Core i9-7900X | 32.97 | 6.24 | 157 | 4.70 | 1.57 | 1.58 | 2.50 | 2.74 | 4.07 | 3.98 | 4.82 | N/A |
|---------------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|
| Threadripper 1950X | 37.59 | 6.50 | 136 | 4.65 | 2.02 | 2.00 | 2.95 | 3.10 | 3.89 | 4.10 | 4.26 | N/A |
|---------------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|



With core i9-7900X @4.5 Ghz and 1950X @stock

at stock, core i9-7900X will be at 27.97 fps for x264 and 5.29 fps for x265

As I always say, massive instance encoding is real problem for Rysen.

Anyway certainely that i9-7960X 16C/32T or i9-7980XE 18C/36T will make really better result than 1950X for x264 and x265 encoding. But it's not the case for i9-7800X 10C/20T even with massive OC at 4.5 Ghz.

If you are trying to benchmark x265 with ffmpeg, I recommend that you use libx265 that is inside ffmpeg instead of using x265 application to avoid any IO problems that occur due to system-level pipes. When doing 4K encoding, I have faced some performance issues in the past.

Sagittaire
23rd August 2017, 21:52
Well rules are the same for all cpus in my benchmark. Besides Intel with less cores has more things to do than Threadripper. The speed penalty should be even higher there. I'm not going to "optimize" my benchmark to show one particular cpu in better light.
Besides 2 extra instances should not be a big problem. Look at older CPUs 2C/2T or 4C/4T. Scalling is good despite running 5 x 265 at the same time.


I don't know. Anyway result are not the same.

If you have more than 20 fps with i7 4C/8T and 1080p, certainely that you use faster profil for x265 than default "medium" preset.

historically speed are always better for Intel CPU in the fastest mode preset (for x264 and x265). It's perhaps the more simple explication.

Sagittaire
23rd August 2017, 21:55
If you are trying to benchmark x265 with ffmpeg, I recommend that you use libx265 that is inside ffmpeg instead of using x265 application to avoid any IO problems that occur due to system-level pipes. When doing 4K encoding, I have faced some performance issues in the past.

No I use ffmpeg just for frame serving (always less than 5% for CPU charge in stream decoding). I want use particular x264 and x265 for make compilator test and use the best possible compilation.

Atak_Snajpera
24th August 2017, 10:45
If you have more than 20 fps with i7 4C/8T and 1080p, certainely that you use faster profil for x265 than default "medium" preset.
Unlike to your benchmark I decided to use x265 DEFAULT settings. So CRF 28 and preset medium instead of your --crf 24 --preset medium --tune grain.

Sagittaire
24th August 2017, 20:31
Unlike to your benchmark I decided to use x265 DEFAULT settings. So CRF 28 and preset medium instead of your --crf 24 --preset medium --tune grain.

crf 24 vs crf 28 or --tune grain are not big problem for speed hierarchy. Moreover --crf 24 --preset medium --tune grain is by far more realistic setting in real world encoding. crf 28 is really low quality level even for 4K.

well multiple and intensive instance seem definitely big problem for Rysen and x265.

Atak_Snajpera
24th August 2017, 20:59
Moreover --crf 24 --preset medium --tune grain is by far more realistic setting in real world encoding.
Have you done some survey that you know how people encode? I don't think so. You are just guessing looking in your glass ball. There is not such thing as "far more realistic setting in real world encoding".
Everybody encodes using completely different settings. You basically can't pick some --tune grain and magical --crf 24 (why 24 and not 23 or 22???) and announce the world that this is some sort of gold standard.

well multiple and intensive instance seem definitely big problem for Rysen and x265.
BTW. It is Ryzen not Rysen.

Balthazar2k4
28th August 2017, 16:46
Sorry guys, I run my 1950x at 3.9ghz on all cores all day long without issue. I have been a staunch Intel user for over 15 years and and have owned several Extreme edition processors including the 980X, 5960X, and 6950X. I am very pleased with the 1950X. It might not have the IPC of Intel, but the additional cores allow me to do more while I am encoding.

RanmaCanada
2nd September 2017, 06:12
GPU Fork - x265 HEVC OpenCL or CUDA Encoder

gcc>=7 , -O2 -fopenacc

https://bitbucket.org/vovagubin/x265-hevc-opencl-or-cuda-encoder

has anyone actually compiled this and tried it out yet? Just curious as to how it compares to NVENC, or if it even works.

LigH
5th September 2017, 10:01
x265 2.5+14-2718cb5dd67f (https://www.mediafire.com/file/7imced5a87fy3nd/x265_2.5%2B14-2718cb5dd67f.7z) (merge with stable)

cli: Align color primaries names to ffmpeg; Re-evaluate vbv lookahead in the encode that uses --analysis-reuse-mode load

brumsky
5th September 2017, 18:46
has anyone actually compiled this and tried it out yet? Just curious as to how it compares to NVENC, or if it even works.

I just compiled it and ran it. It doesn't seem to be any faster than the normal x265. I didn't wait for it to complete to check the quality just a very simple speed test. I'm running a 1080 TI with the latest drivers.

It doesn't look like the help section of the tool has been updated yet. I can't find anything specific to enable Cuda or OpenCL support.

I think we need more info about how to use it at this point.

I did find the author added a new preset called ultraslow but it doesn't work for me.

x265_Project
9th September 2017, 09:57
You'll love this, x265 fans...
http://blog.beamr.com/2017/09/08/x265-beamr-5-epic-face-off/

Be sure to download the video files, and compare them side by side. Let us know whose video you prefer.

media2.beamrvideo.com/media/blog1709/fastest/4k/aerial/beamr.265
media2.beamrvideo.com/media/blog1709/fastest/4k/aerial/x.265
media2.beamrvideo.com/media/blog1709/fastest/4k/bar/beamr.265
media2.beamrvideo.com/media/blog1709/fastest/4k/bar/x.265
media2.beamrvideo.com/media/blog1709/fastest/4k/dinner/beamr.265
media2.beamrvideo.com/media/blog1709/fastest/4k/dinner/x.265
media2.beamrvideo.com/media/blog1709/fastest/4k/driving/beamr.265
media2.beamrvideo.com/media/blog1709/fastest/4k/driving/x.265
media2.beamrvideo.com/media/blog1709/fastest/4k/pierseaside/beamr.265
media2.beamrvideo.com/media/blog1709/fastest/4k/pierseaside/x.265
media2.beamrvideo.com/media/blog1709/fastest/4k/ritualdance/beamr.265
media2.beamrvideo.com/media/blog1709/fastest/4k/ritualdance/x.265
media2.beamrvideo.com/media/blog1709/fastest/4k/tango/beamr.265
media2.beamrvideo.com/media/blog1709/fastest/4k/tango/x.265
media2.beamrvideo.com/media/blog1709/fastest/4k/windnature/beamr.265
media2.beamrvideo.com/media/blog1709/fastest/4k/windnature/x.265
media2.beamrvideo.com/media/blog1709/medium/4k/aerial/beamr.265
media2.beamrvideo.com/media/blog1709/medium/4k/aerial/x.265
media2.beamrvideo.com/media/blog1709/medium/4k/bar/beamr.265
media2.beamrvideo.com/media/blog1709/medium/4k/bar/x.265
media2.beamrvideo.com/media/blog1709/medium/4k/dinner/beamr.265
media2.beamrvideo.com/media/blog1709/medium/4k/dinner/x.265
media2.beamrvideo.com/media/blog1709/medium/4k/driving/beamr.265
media2.beamrvideo.com/media/blog1709/medium/4k/driving/x.265
media2.beamrvideo.com/media/blog1709/medium/4k/pierseaside/beamr.265
media2.beamrvideo.com/media/blog1709/medium/4k/pierseaside/x.265
media2.beamrvideo.com/media/blog1709/medium/4k/ritualdance/beamr.265
media2.beamrvideo.com/media/blog1709/medium/4k/ritualdance/x.265
media2.beamrvideo.com/media/blog1709/medium/4k/tango/beamr.265
media2.beamrvideo.com/media/blog1709/medium/4k/tango/x.265
media2.beamrvideo.com/media/blog1709/medium/4k/windnature/x.265
media2.beamrvideo.com/media/blog1709/medium/4k/windnature/beamr.265
media2.beamrvideo.com/media/blog1709/veryslow/4k/aerial/beamr.265
media2.beamrvideo.com/media/blog1709/veryslow/4k/aerial/x.265
media2.beamrvideo.com/media/blog1709/veryslow/4k/bar/beamr.265
media2.beamrvideo.com/media/blog1709/veryslow/4k/bar/x.265
media2.beamrvideo.com/media/blog1709/veryslow/4k/dinner/beamr.265
media2.beamrvideo.com/media/blog1709/veryslow/4k/dinner/x.265
media2.beamrvideo.com/media/blog1709/veryslow/4k/driving/beamr.265
media2.beamrvideo.com/media/blog1709/veryslow/4k/driving/x.265
media2.beamrvideo.com/media/blog1709/veryslow/4k/pierseaside/beamr.265
media2.beamrvideo.com/media/blog1709/veryslow/4k/pierseaside/x.265
media2.beamrvideo.com/media/blog1709/veryslow/4k/ritualdance/beamr.265
media2.beamrvideo.com/media/blog1709/veryslow/4k/ritualdance/x.265
media2.beamrvideo.com/media/blog1709/veryslow/4k/tango/beamr.265
media2.beamrvideo.com/media/blog1709/veryslow/4k/tango/x.265
media2.beamrvideo.com/media/blog1709/veryslow/4k/windnature/beamr.265
media2.beamrvideo.com/media/blog1709/veryslow/4k/windnature/x.265

Atak_Snajpera
9th September 2017, 10:40
Cool but where is download link to some trial version at least?

Selur
9th September 2017, 11:00
Took a fast look at some of the very slow encodes.
(repacked the stream into mp4 and used Vapoursynth to compare them side-by-side and frame-alternating; using a 5k and a 4k monitor)

Aerial:
The sky look bad on x265s side. Better Details in x265. Especially with faster motion
-> For me: Win for x265 on this one.

Bar Scene:
Dark areas better in x265, light areas better in beamr. When more motion occured x265 looked better.
-> For me: No winner on this one.

Dinner Scene:
Less Banding in x265, more details.
-> For me: x265 is the clear winner here.

PierSeaside:
The sky look bad on x265s side, but x265 has more details in dark areas.
-> For me: Win for Beamr on this one.

Cu Selur

Ps.: Not sure whether I'll take a look at some of the other streams.

LigH
9th September 2017, 11:01
I wonder about one point: BEAMR explains that their software is superior i.a. because they use wave-front parallelism (explained in the summary, fact 5.), or brief in 4.:

Beamr 5 supports full codec multithreading while x265 uses slices and tiles for parallelism.

That surprises me: IIRC, x265 supports slices and tiles optionally, but uses WPP per default as well.

But I believe that the deeper explanations of Beamr 5 advantages may give x265 developers some inspirations to think about algorithmic speed-up potential.

One more claim from the summary, fact 4.:

x265 will begin encoding frames before all reference frames are finished, which limits motion estimation to only the parts of reference frames that are available.

The x265 developers will know whether to confirm or not.

x265_Project
9th September 2017, 14:06
I wonder about one point: BEAMR explains that their software is superior i.a. because they use wave-front parallelism (explained in the summary, fact 5.), or brief in 4.:

That surprises me: IIRC, x265 supports slices and tiles optionally, but uses WPP per default as well.

You're correct. We support WPP by default, and we had support for WPP long before Vanguard did. They also claim that we don't have early termination heuristics. We have many early termination heuristics. When it comes to x265, it's clear they don't know what they're talking about. When it comes to determining whether the visual quality of their encoder is better than x265, I also don't think they know what they are talking about. I see much better detail retention in the x265 encodes, despite the fact that in most cases the x265 file is smaller. It's not even close.

x265_Project
9th September 2017, 14:10
But I believe that the deeper explanations of Beamr 5 advantages may give x265 developers some inspirations to think about algorithmic speed-up potential.

Keep in mind that they designed this test, and they selected the test parameters, including the test sequences to use, which machine to run on, how many threads to allocate, the rate control mode and bit rates, and which x265 presets to compare with.

WhatZit
9th September 2017, 15:32
When it comes to determining whether the visual quality of their encoder is better than x265, I also don't think they know what they are talking about.

Yes, it's very strange that they attempt to back up their claim by publishing samples and captures that show the exact opposite.

Too much truth in advertising, for once...

Ominously, this is a taste of how Multicoreware will soon have to defend its product against a litany of snake-oil claims once all the Johnny-Come-Lately HEVC developers like Beamr jump on the iOS 11 bandwagon.

The tangible danger is that mainstream consumers will ultimately prefer to sacrifice quality for speed, which means that lower quality, faster encoders could pose a significant threat.

Look at how many one-click instant AVC encoders there are which output absolute garbage, but do it quickly. There's a market for speed over quality. A big one.

Remember, Beamr based this charade on SPEED, with all due apology to quality. In that unfortunate sense alone, they won this test.

LigH
9th September 2017, 15:51
Knowing some of the efforts Multicoreware did to develop x265, I believe we can summarize already beforehand: Being both faster and better is a miracle... ;)

Speed is an objective metric, you can measure it.

Quality is a subjective metric, you have to ask people for their opinions; the more people participate in giving their opinions, the better the result should cover a majority of opinions.

Better quality is usually preserved by more efforts, which require more time; or it requires more space. For a fair speed-vs-quality comparison, at least the size should be as similar as possible. If there is already a remarkable size difference, then you don't need to care about the algorithmic efforts taken for quality preservation anymore, the bitrate difference already makes the test unfair. (Just like comparing accelerating cars: Nothing substitutes cylinder capacity, except more cylinder capacity.)

Atak_Snajpera
9th September 2017, 17:31
The tangible danger is that mainstream consumers will ultimately prefer to sacrifice quality for speed, which means that lower quality, faster encoders could pose a significant threat.
I already see this when people use crappy nVidia/Intel/AMD HEVC encoder instead of x264/x265.

NikosD
9th September 2017, 17:44
Intel's HW H.264 has a decent quality for its speed and it's perfect for easy transcoding, like DVD rip-to-H.264 file.

Easy, because the resolution and the quality of MPEG2 is low and the MKV DVD rip it's huge.

Using Intel's HW H.264 encoder you get 1/10 th of the size of the DVD rip in a few minutes with comparable (almost the same) quality.

No drawbacks at all.

stax76
9th September 2017, 18:17
@Atak_Snajpera

Did you actually compare (probably not with your GUI ;)) with sufficient bitrate like 10 Mb/s?

Atak_Snajpera
9th September 2017, 18:49
@Atak_Snajpera

Did you actually compare (probably not with your GUI ;)) with sufficient bitrate like 10 Mb/s?

10 Mbps? Why not 20Mbps? With 10 MBps budget even Xvid looks decent.

stax76
9th September 2017, 18:55
Are you sure your storage and bandwidth budget is in balance with your CPU budget? Doesn't appear so.

x265_Project
10th September 2017, 07:32
Knowing some of the efforts Multicoreware did to develop x265, I believe we can summarize already beforehand: Being both faster and better is a miracle... ;)

Speed is an objective metric, you can measure it.

Quality is a subjective metric, you have to ask people for their opinions; the more people participate in giving their opinions, the better the result should cover a majority of opinions.

Better quality is usually preserved by more efforts, which require more time; or it requires more space. For a fair speed-vs-quality comparison, at least the size should be as similar as possible. If there is already a remarkable size difference, then you don't need to care about the algorithmic efforts taken for quality preservation anymore, the bitrate difference already makes the test unfair. (Just like comparing accelerating cars: Nothing substitutes cylinder capacity, except more cylinder capacity.)
Why didn't they compare speed when you give both encoders 16 threads or more? Why didn't they conduct this test on a Skylake, Skylake-X or Purley Xeon system, instead of a 4 year old Haswell system? Do you think they didn't test our other performance presets before they decided to show a comparison against Ultrafast, Medium and Veryslow? Why didn't they test 1080P or smaller picture sizes? Do you think that these were the only test sequences (videos) they tried?

Obviously, a competitor can run many tests to find some combination of conditions that enables their product to compare most favorably, and cherry-pick these results. We can't do the same because they won't provide their encoder to us, or to 3rd parties for a fair comparison.

But in the end, even with a cherry-picked test design, when I compare the video, I see nothing but a big win for x265.

LigH
10th September 2017, 11:55
But in the end, even with a cherry-picked test design, when I compare the video, I see nothing but a big win for x265.

And it's hard to counter this test without a copy of their encoder to run the "flan case" tests on a large variety of machines and with a more equal selection of parameters. Even if you can find a machine very similar to their test environment, you could only run the freely available software.

Atak_Snajpera
10th September 2017, 12:13
And it's hard to counter this test without a copy of their encoder to run the "flan case" tests on a large variety of machines and with a more equal selection of parameters. Even if you can find a machine very similar to their test environment, you could only run the freely available software.

Exactly. It very fishy that they do not provide encoder for testing.
For me Beam "something" is just another rip-off company selling nothing but "fake news".
I'm 100% sure that their "superior" encoder would fall apart to tiny pieces in park_joy or crowd_run.

NikosD
10th September 2017, 12:28
@all

Don't be afraid of the competition.

The way system works, it's probably the only way to get things better.

But it has to be real and fair competition in order to work.

Andouille
10th September 2017, 12:50
Why didn't they conduct this test on a Skylake, Skylake-X or Purley Xeon system, instead of a 4 year old Haswell system?

Because the average users computer age is that old ?

Atak_Snajpera
10th September 2017, 13:02
@all

Don't be afraid of the competition.

The way system works, it's probably the only way to get things better.

But it has to be real and fair competition in order to work.

How can we talk about fair competition if their encoder is NOT available for testing? We all have seen over last years bold claims that some new "super-codec" beats x264 by 50% for example.

LigH
10th September 2017, 15:51
Indeed, just remember V-Nova Perseus ... which turned out to be no codec on its own, just a high-frequency spectral band (noise) modeller, like HE-AAC or mp3Pro.

roo1234
10th September 2017, 23:42
Indeed, just remember V-Nova Perseus ... which turned out to be no codec on its own, just a high-frequency spectral band (noise) modeller, like HE-AAC or mp3Pro.No further news or testing on this? It seemed promising for ultra low bandwidth.

x265_Project
11th September 2017, 12:36
Because the average users computer age is that old ?
Actually, it was a Xeon E5 v2 (Ivy Bridge) processor, with no AVX2 support. No, I don't think it was because the average user's computer is that old. Beamr isn't trying to win any consumer business - they're shooting for commercial customers who run on servers. I think it's clear this was hand selected from among the possible options because it seems to favor their encoder the most.

x265_Project
11th September 2017, 14:32
My response to Beamr - http://x265.org/beamr-hevc-encoder-comparison/

LigH
11th September 2017, 20:03
Hmm ... from "Epic Face Off" to "Epic Burn"? :cool:

WhatZit
11th September 2017, 22:45
My response to Beamr - http://x265.org/beamr-hevc-encoder-comparison/

Beamr are even gunning for your UHDcode Pro Player, as well: http://beamr.com/h264-hevc-video-comparison-player/

Watch out they don't file a sniper patent, like McDonalds (https://patentscope.wipo.int/search/en/detail.jsf?docId=WO2006068865&recNum=1&maxRec=&office=&prevFilter=&sortOption=&queryString=&tab=PCT+Biblio) did with sandwiches. That way, they can list 24 pending patents as (dubious) benefit no. 1 instead of only 23 :rolleyes:

x265_Project
12th September 2017, 05:09
Beamr are even gunning for your UHDcode Pro Player, as well: http://beamr.com/h264-hevc-video-comparison-player/

Yes, Vanguard copied our Pro Player years ago, before Beamr bought them.

_kermit
14th September 2017, 11:17
Good day,

my first post, although I follow this thread for a long time now, so first things first:
I'd like to thank you for your work. It's amazing how much effort you put into such a great and free ! solution. Really appriciated!

I'd like to re-encode some UHD material but while I found quite a bit about Quality Tuning, I'm lost when it comes to just re-encoding it, using less bandwidth, but keeping everything else the same.
In particular the HDR Parameters.

Examples:
Color range : Limited
Color primaries : BT.2020
Transfer characteristics : SMPTE ST 2084
Matrix coefficients : BT.2020 non-constant
Mastering display color primaries : R: x=0.680000 y=0.320000, G: x=0.265000 y=0.690000, B: x=0.150000 y=0.060000, White point: x=0.312680 y=0.329000
Mastering display luminance : min: 0.0050 cd/m2, max: 1000.0000 cd/m2
Maximum Content Light Level : 1000 cd/m2
Maximum Frame-Average Light Level : 96 cd/m2


Color range : Limited
Color primaries : BT.2020
Transfer characteristics : SMPTE ST 2084
Matrix coefficients : BT.2020 non-constant
Mastering display color primaries : R: x=0.680000 y=0.320000, G: x=0.265000 y=0.690000, B: x=0.150000 y=0.060000, White point: x=0.312700 y=0.329000
Mastering display luminance : min: 0.0000 cd/m2, max: 1000.0000 cd/m2


Color range : Limited
Color primaries : BT.2020
Transfer characteristics : SMPTE ST 2084
Matrix coefficients : BT.2020 non-constant
Mastering display color primaries : R: x=0.680000 y=0.320000, G: x=0.265000 y=0.690000, B: x=0.150000 y=0.060000, White point: x=0.312680 y=0.329000
Mastering display luminance : min: 0.0050 cd/m2, max: 1000.0000 cd/m2


they all have a bit different Parameters.
How would I re-encode those, while preserving all those values for each file?
I guess there is no "take those Parameters from the source file and re-use them in the target file" Option and I have to provide them manually for every file?

Can you provide an example on how to re-encode an mkv (I use ffmpeg and pipe), maybe with some good tunning tips included?

thanks!!