x265 HEVC Encoder [Archive] - Page 142

View Full Version : x265 HEVC Encoder

Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 [142] 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201

Selur

28th September 2019, 07:19

thanks!

-QfG-

28th September 2019, 14:02

Has anyone results between AQ1 and AQ4? I've made 2 test encodes and see no difference in the movie, but have a smaller filesize with AQ4. Or i'm blind...

pistacho

28th September 2019, 17:57

Hi,

Possible bug:

Commit 21db162 (https://bitbucket.org/multicoreware/x265/commits/21db162c8622677c41a4fc77a14a59eb7326b46a) causes slowdown even is not used aq-mode 4 and output is identical.

Command line used:

"T:\TEST\x265\3108\x265_x64.exe" - --y4m --frames 1000 --crf 20.0 --preset "medium" --aq-mode 3 --keyint 240 --no-open-gop
--colorprim "bt709" --transfer "bt709" --colormatrix "bt709" --sar 1:1 --output "T:\TEST\encode.265"

dec [info]: Intel Quick Sync: API LEVEL 1.29, HW
dec [info]: 1920x1080, YV12, 24000/1001 fps, 1000 frames
y4m [info]: 1920x1080 fps 24000/1001 i420p8 sar 1:1 unknown frame count
raw [info]: output file: T:\TEST\encode.265
x265 [info]: HEVC encoder version 3.1+7-147fb92c5ed5
x265 [info]: build info [Windows][GCC 9.2.0][64 bit] 8bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
x265 [info]: Main profile, Level-4 (Main tier)
x265 [info]: Thread pool created using 8 threads
x265 [info]: Slices : 1
x265 [info]: frame threads / pool features : 3 / wpp(17 rows)
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge : hex / 57 / 2 / 3
x265 [info]: Keyframe min / max / scenecut / bias: 23 / 240 / 40 / 5.00
x265 [info]: Lookahead / bframes / badapt : 20 / 4 / 2
x265 [info]: b-pyramid / weightp / weightb : 1 / 1 / 0
x265 [info]: References / ref-limit cu / depth : 3 / off / on
x265 [info]: AQ: mode / str / qg-size / cu-tree : 3 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress : CRF-20.0 / 0.60
x265 [info]: tools: rd=3 psy-rd=2.00 early-skip rskip signhide tmvp b-intra
x265 [info]: tools: strong-intra-smoothing lslices=6 deblock sao
x265 [info]: frame I: 11, Avg QP:16.20 kb/s: 6958.02
x265 [info]: frame P: 275, Avg QP:20.60 kb/s: 3482.34
x265 [info]: frame B: 714, Avg QP:25.57 kb/s: 356.63
x265 [info]: Weighted P-Frames: Y:10.9% UV:6.5%
x265 [info]: consecutive B-frames: 22.4% 4.5% 4.5% 38.1% 30.4%

encoded 1000 frames in 17.71s (56.47 fps), 1288.81 kb/s, Avg QP:24.10

dec [info]: Intel Quick Sync: API LEVEL 1.29, HW
dec [info]: 1920x1080, YV12, 24000/1001 fps, 1000 frames
y4m [info]: 1920x1080 fps 24000/1001 i420p8 sar 1:1 unknown frame count
raw [info]: output file: T:\TEST\encode.265
x265 [info]: HEVC encoder version 3.1+8-21db162c8622
x265 [info]: build info [Windows][GCC 9.2.0][64 bit] 8bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
x265 [info]: Main profile, Level-4 (Main tier)
x265 [info]: Thread pool created using 8 threads
x265 [info]: Slices : 1
x265 [info]: frame threads / pool features : 3 / wpp(17 rows)
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge : hex / 57 / 2 / 3
x265 [info]: Keyframe min / max / scenecut / bias: 23 / 240 / 40 / 5.00
x265 [info]: Lookahead / bframes / badapt : 20 / 4 / 2
x265 [info]: b-pyramid / weightp / weightb : 1 / 1 / 0
x265 [info]: References / ref-limit cu / depth : 3 / off / on
x265 [info]: AQ: mode / str / qg-size / cu-tree : 3 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress : CRF-20.0 / 0.60
x265 [info]: tools: rd=3 psy-rd=2.00 early-skip rskip signhide tmvp b-intra
x265 [info]: tools: strong-intra-smoothing lslices=6 deblock sao
x265 [info]: frame I: 11, Avg QP:16.20 kb/s: 6958.02
x265 [info]: frame P: 275, Avg QP:20.60 kb/s: 3482.34
x265 [info]: frame B: 714, Avg QP:25.57 kb/s: 356.63
x265 [info]: Weighted P-Frames: Y:10.9% UV:6.5%
x265 [info]: consecutive B-frames: 22.4% 4.5% 4.5% 38.1% 30.4%

encoded 1000 frames in 19.56s (51.13 fps), 1288.81 kb/s, Avg QP:24.10

Outputs seems bit-identical since the QP's and bitrates match so no "slow for best quality" is justified.

-QfG-

28th September 2019, 18:03

AQ4 is the new feature of version 3.2 from x265. AQ3 ist still good for BT709 output.
But the question is, have anyone see differences with YUV420P10 BT2020 HDR AQ1 and AQ4?

pistacho

28th September 2019, 18:09

I don't mean this. I'm talking about the new "AQ mode 4" function causing slowdown even when it's not used.

pistacho

29th September 2019, 15:50

I found the cause:

Some code related to AQ mode 4 is executed always.

This patch restores previous performance and not break anything (i think).

On file slicetype.cpp

line 481 replace

#define AQ_EDGE_BIAS 0.5
#define EDGE_INCLINATION 45
uint32_t numCuInHeight = (maxRow + param->maxCUSize - 1) / param->maxCUSize;
int maxHeight = numCuInHeight * param->maxCUSize;
intptr_t stride = curFrame->m_fencPic->m_stride;
pixel *edgePic = X265_MALLOC(pixel, stride * (maxHeight + (curFrame->m_fencPic->m_lumaMarginY * 2)));
pixel *gaussianPic = X265_MALLOC(pixel, stride * (maxHeight + (curFrame->m_fencPic->m_lumaMarginY * 2)));
pixel *thetaPic = X265_MALLOC(pixel, stride * (maxHeight + (curFrame->m_fencPic->m_lumaMarginY * 2)));
memset(edgePic, 0, stride * (maxHeight + (curFrame->m_fencPic->m_lumaMarginY * 2)) * sizeof(pixel));
memset(gaussianPic, 0, stride * (maxHeight + (curFrame->m_fencPic->m_lumaMarginY * 2)) * sizeof(pixel));
memset(thetaPic, 0, stride * (maxHeight + (curFrame->m_fencPic->m_lumaMarginY * 2)) * sizeof(pixel));
if (param->rc.aqMode == X265_AQ_EDGE)
edgeFilter(curFrame, edgePic, gaussianPic, thetaPic, stride, maxRow, maxCol);

int blockXY = 0, inclinedEdge = 0;

with

#define AQ_EDGE_BIAS 0.5
#define EDGE_INCLINATION 45

pixel *edgePic = NULL;
pixel *gaussianPic = NULL;
pixel *thetaPic = NULL;

if (param->rc.aqMode == X265_AQ_EDGE)
{
uint32_t numCuInHeight = (maxRow + param->maxCUSize - 1) / param->maxCUSize;
int maxHeight = numCuInHeight * param->maxCUSize;
intptr_t stride = curFrame->m_fencPic->m_stride;
edgePic = X265_MALLOC(pixel, stride * (maxHeight + (curFrame->m_fencPic->m_lumaMarginY * 2)));
gaussianPic = X265_MALLOC(pixel, stride * (maxHeight + (curFrame->m_fencPic->m_lumaMarginY * 2)));
thetaPic = X265_MALLOC(pixel, stride * (maxHeight + (curFrame->m_fencPic->m_lumaMarginY * 2)));
memset(edgePic, 0, stride * (maxHeight + (curFrame->m_fencPic->m_lumaMarginY * 2)) * sizeof(pixel));
memset(gaussianPic, 0, stride * (maxHeight + (curFrame->m_fencPic->m_lumaMarginY * 2)) * sizeof(pixel));
memset(thetaPic, 0, stride * (maxHeight + (curFrame->m_fencPic->m_lumaMarginY * 2)) * sizeof(pixel));
edgeFilter(curFrame, edgePic, gaussianPic, thetaPic, stride, maxRow, maxCol);
}

int blockXY = 0, inclinedEdge = 0;

and line 510

pixel *edgeImage = edgePic + curFrame->m_fencPic->m_lumaMarginY * stride + curFrame->m_fencPic->m_lumaMarginX;
pixel *edgeTheta = thetaPic + curFrame->m_fencPic->m_lumaMarginY * stride + curFrame->m_fencPic->m_lumaMarginX;

with

pixel *edgeImage = edgePic + curFrame->m_fencPic->m_lumaMarginY * curFrame->m_fencPic->m_stride + curFrame->m_fencPic->m_lumaMarginX;
pixel *edgeTheta = thetaPic + curFrame->m_fencPic->m_lumaMarginY * curFrame->m_fencPic->m_stride + curFrame->m_fencPic->m_lumaMarginX;

and line 545

X265_FREE(edgePic);
X265_FREE(gaussianPic);
X265_FREE(thetaPic);

with

if (param->rc.aqMode == X265_AQ_EDGE)
{
X265_FREE(edgePic);
X265_FREE(gaussianPic);
X265_FREE(thetaPic);
}

And binary with this patch applied: x265 v3.2 patched x64 GCC 9.2.0 (http://www.mediafire.com/file/nhflci5lxcvr6ej/x265_v3.2%252B2_patched.rar/file)

LigH

30th September 2019, 07:52

You should send your patch to the x265 Developer mailing list. They prefer it in "diff" format and in the mail body.
_

x265 3.2+3-fdd69a766881 (https://www.mediafire.com/file/vmt60ob57kxfemg/x265_3.2+3-fdd69a766881.7z/file) (MSYS2/MinGW, GCC 9.2.0)

pistacho

30th September 2019, 18:06

Diff patch send to mailing list.

Boulder

30th September 2019, 18:14

qtwigg

30th September 2019, 18:18

I have been having this issue; I give my x265 a movie and the x265 chooses how much it wants to encode. I give it a 45 minute show, it does 30 minutes. I give it a 1:40 hour movie, it decides not to do the last 10 minutes. I say it decides because it is not erroring. It literally finishes, mux into a container and says, Done! Why does it do this? Never had this problem before and the last 6 months maybe been having this issue randomly, off and on. Any ideas?

RanmaCanada

30th September 2019, 18:35

qtwigg

30th September 2019, 18:59

That is usually an indication that there is corruption in your original file.

That is what one would think, however the file plays fine (till the end) and without closing the GUI or re-applying my settings or anything, I just press start within the same original GUI that I used before that only encoded some of the file, and it always works second time.
I open the program, add my video, add my settings and AVS, press start.
It does not encode the whole video
I just delete the files
And press start again in the same window
It encodes the whole video
Weird huh? Like I said it is random, not often.

redbtn

30th September 2019, 19:25

That is what one would think, however the file plays fine (till the end) and without closing the GUI or re-applying my settings or anything, I just press start within the same original GUI that I used before that only encoded some of the file, and it always works second time.

I open the program, add my video, add my settings and AVS, press start.

It does not encode the whole video

I just delete the files

And press start again in the same window

It encodes the whole video

Weird huh? Like I said it is random, not often.You have to ask about it the author of your GUI. This problem is not related to x265.

pistacho

30th September 2019, 19:46

I couldn't replicate that difference between the two versions on my 3700X. I used the VS 2019 AVX2 builds from http://www.msystem.waw.pl/x265/

With 3.1+7, encoded 2000 frames in 461.37s (4.33 fps), 3467.58 kb/s, Avg QP:20.97
With 3.1+8, encoded 2000 frames in 459.33s (4.35 fps), 3467.58 kb/s, Avg QP:20.97

In my system (Intel 9700k) the difference persist with these builds:

y4m [info]: 1920x1080 fps 24000/1001 i420p8 sar 1:1 unknown frame count
raw [info]: output file: T:\TEST\encode.265
x265 [info]: HEVC encoder version 3.1+7-147fb92c5ed5
x265 [info]: build info [Windows][MSVC 1921][64 bit] 8bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
x265 [info]: Main profile, Level-4 (Main tier)
x265 [info]: Thread pool created using 8 threads
x265 [info]: Slices : 1
x265 [info]: frame threads / pool features : 3 / wpp(17 rows)
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge : hex / 57 / 2 / 3
x265 [info]: Keyframe min / max / scenecut / bias: 23 / 240 / 40 / 5.00
x265 [info]: Lookahead / bframes / badapt : 20 / 4 / 2
x265 [info]: b-pyramid / weightp / weightb : 1 / 1 / 0
x265 [info]: References / ref-limit cu / depth : 3 / off / on
x265 [info]: AQ: mode / str / qg-size / cu-tree : 3 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress : CRF-20.0 / 0.60
x265 [info]: tools: rd=3 psy-rd=2.00 early-skip rskip signhide tmvp b-intra
x265 [info]: tools: strong-intra-smoothing lslices=6 deblock sao
x265 [info]: frame I: 11, Avg QP:16.20 kb/s: 6958.02
x265 [info]: frame P: 275, Avg QP:20.60 kb/s: 3482.34
x265 [info]: frame B: 714, Avg QP:25.57 kb/s: 356.63
x265 [info]: Weighted P-Frames: Y:10.9% UV:6.5%
x265 [info]: consecutive B-frames: 22.4% 4.5% 4.5% 38.1% 30.4%

encoded 1000 frames in 17.26s (57.95 fps), 1288.81 kb/s, Avg QP:24.10

y4m [info]: 1920x1080 fps 24000/1001 i420p8 sar 1:1 unknown frame count
raw [info]: output file: T:\TEST\encode.265
x265 [info]: HEVC encoder version 3.1+8-21db162c8622
x265 [info]: build info [Windows][MSVC 1921][64 bit] 8bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
x265 [info]: Main profile, Level-4 (Main tier)
x265 [info]: Thread pool created using 8 threads
x265 [info]: Slices : 1
x265 [info]: frame threads / pool features : 3 / wpp(17 rows)
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge : hex / 57 / 2 / 3
x265 [info]: Keyframe min / max / scenecut / bias: 23 / 240 / 40 / 5.00
x265 [info]: Lookahead / bframes / badapt : 20 / 4 / 2
x265 [info]: b-pyramid / weightp / weightb : 1 / 1 / 0
x265 [info]: References / ref-limit cu / depth : 3 / off / on
x265 [info]: AQ: mode / str / qg-size / cu-tree : 3 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress : CRF-20.0 / 0.60
x265 [info]: tools: rd=3 psy-rd=2.00 early-skip rskip signhide tmvp b-intra
x265 [info]: tools: strong-intra-smoothing lslices=6 deblock sao
x265 [info]: frame I: 11, Avg QP:16.20 kb/s: 6958.02
x265 [info]: frame P: 275, Avg QP:20.60 kb/s: 3482.34
x265 [info]: frame B: 714, Avg QP:25.57 kb/s: 356.63
x265 [info]: Weighted P-Frames: Y:10.9% UV:6.5%
x265 [info]: consecutive B-frames: 22.4% 4.5% 4.5% 38.1% 30.4%

encoded 1000 frames in 18.76s (53.31 fps), 1288.81 kb/s, Avg QP:24.10

LigH

1st October 2019, 15:17

Diff patch send to mailing list.

It arrived safely :D

ShortKatz

3rd October 2019, 12:06

Diff patch send to mailing list.

Did you als sign the Contributor License Agreement? See
https://bitbucket.org/multicoreware/x265/wiki/Contribute
They rejected all my patches until I sended them back the signed Contributor License Agreement. It is also important to format it correctly. One of my former patches couldn't be applied, because the formatting was wrong. And don't be surprised if it takes some time. For my patches it normally takes several weeks until they get applied.

pistacho

3rd October 2019, 14:20

My patch is a bug fix (slowdown of 10% for no reason in a stable branch). I don't care if they don't apply my patch as is. I guess someone will correct it somehow...

Selur

3rd October 2019, 17:13

Looks like
https://bitbucket.org/multicoreware/x265/commits/cdd80b53c90d224fd9281ad13de3ca9a1b6e1d39
and
https://bitbucket.org/multicoreware/x265/commits/d4c4624fdc9af25738e000a9c9f10c31238d6bf1
commit are meant to fix the "slowdown even is not used AQ mode 4."

Cu Selur

filler56789

4th October 2019, 14:29

x265.exe 3.2+5-354901970679

--- Fix: AQ mode 4 commit (21db162) introduces slowdown even when AQ mode 4 is not used;

--- Adaptive Frame duplication
This patch does the following:
1. Replaces 2-3 near-identical frames with one frame and sets pic_struct based on frame doubling / tripling;
2. Add option "--frame-dup" and "--dup-threshold' to enable frame duplication and to set threshold for frame similarity (optional);

http://www.mediafire.com/file/ado1h89ndmsdybp/x265_3.2%252B5-354901970679.rar/file

benwaggoner

4th October 2019, 18:26

x265.exe 3.2+5-354901970679--- Adaptive Frame duplication
This patch does the following:
1. Replaces 2-3 near-identical frames with one frame and sets pic_struct based on frame doubling / tripling;
2. Add option "--frame-dup" and "--dup-threshold' to enable frame duplication and to set threshold for frame similarity (optional);[/I]
I see the default value of dup-threshold is 70. It would be helpful to know if higher numbers require more or less similarity, and ballpark how much similarity is requires for 70. I hope it is sub-psychovisual at least.

I could see this helping efficiency and encoding speed for stuff like a title card displayed for a couple of seconds.

fauxreaper

4th October 2019, 19:08

--frame-dup makes output duration smaller than input. Is it a decoder problem of not duplicating/triplicating frame duration when needed? Or is it a muxing problem when using matroska as a container?

nevcairiel

4th October 2019, 19:34

vpupkind

4th October 2019, 20:07

I wonder if that mode is really that beneficial. If it detects dupes, it could just code them as all-skip P or B frames with minimal bitstream overhead. Sure, that single frame is being saved, but in the grand scheme of things for a small scene of a static shot that seems insignficant. Nevermind that repeat flags are likely to trip up decoders and/or muxers, as you basically generate a VFR bitstream.
All-skip frame is way more expensive than changing a pic_struct value.
The reason for doing such a weird VFR is that the same mechanism used for 3:2 pulldown and 24->60p conversion is used here.

vpupkind

4th October 2019, 20:26

I see the default value of dup-threshold is 70. It would be helpful to know if higher numbers require more or less similarity, and ballpark how much similarity is requires for 70. I hope it is sub-psychovisual at least.

I could see this helping efficiency and encoding speed for stuff like a title card displayed for a couple of seconds.

The threshold is PSNR value between consecutive pictures

nevcairiel

4th October 2019, 22:25

All-skip frame is way more expensive than changing a pic_struct value.

Sure, in relative terms. But in absolute terms, in a sea of actually coded and changing frames, how much of a difference are we talking here? 0.01%? Likely not even that.

The reason for doing such a weird VFR is that the same mechanism used for 3:2 pulldown and 24->60p conversion is used here.

A mechanism thats already rarely used in HEVC, and likely not well supported, or intentionally ignored, because original 24p content is just better then stuttery 30p. :)

WhatZit

5th October 2019, 03:51

I could see this helping efficiency and encoding speed for stuff like a title card displayed for a couple of seconds.

I could see this being as big a pain-in-the-arse as traditional VFR. Luckily, it is disabled by default.

Still, one of Multicoreware's corporate clients probably asked for it (anime OTT?), so there it is.

LigH

5th October 2019, 15:54

x265 3.2+5-354901970679 (https://www.mediafire.com/file/7vq9ld7us08rp6y/x265_3.2%2B5-354901970679.7z/file)

mandarinka

5th October 2019, 17:02

I could see this being as big a pain-in-the-arse as traditional VFR. Luckily, it is disabled by default.

Still, one of Multicoreware's corporate clients probably asked for it (anime OTT?), so there it is.

Actually it's not really a good idea for anime. The duplicate removal filters like this can mishandle a very common scenario where the whole picture doesn't move at all, but there is just mouth movement in a tiny part. It can be just few pixels, but killing that and replacing with wrong duplicate is a terrible sort of artifact.

vpupkind

6th October 2019, 15:33

Actually it's not really a good idea for anime. The duplicate removal filters like this can mishandle a very common scenario where the whole picture doesn't move at all, but there is just mouth movement in a tiny part. It can be just few pixels, but killing that and replacing with wrong duplicate is a terrible sort of artifact.
I am unsure whether these two frames will have a PSNR of 70dB -- you will still have a significant absolute distance between a couple of co-located pixels, which should bring it below the threshold. Would be very interested in test results -- don't have a decent anime source.

Magik Mark

6th October 2019, 23:52

Hey Guys,

Are there any new switches that would speed up multi pass encoding?

RanmaCanada

7th October 2019, 05:45

Hey Guys,

Are there any new switches that would speed up multi pass encoding?

Yes the -Buy Ryzen 3900x switch :)

aymanalz

8th October 2019, 20:56

Yes the -Buy Ryzen 3900x switch :)

He has raised a valid and pertinent point. After all these years, x265 is still painfully slow on "normal" processors. You are right that the only way to encode faster is to get faster processors with more cores. I wish that wasn't the case, and that by now the developers could have made it faster. Maybe it is mathematically/programatically impossible to get any more speed improvements. That's a pity.

FranceBB

8th October 2019, 22:36

He has raised a valid and pertinent point. After all these years, x265 is still painfully slow on "normal" processors. You are right that the only way to encode faster is to get faster processors with more cores. I wish that wasn't the case, and that by now the developers could have made it faster. Maybe it is mathematically/programatically impossible to get any more speed improvements. That's a pity.

It was pretty much the same when there was the shift from MPEG-2 and then Xvid to H.264: the computational complexity was way higher and the old single core single thread CPUs weren't able to cope with the amount of resources required. The fact that H.264 has been the de-facto standard for several years kinda got us used to high encoding speed. As a matter of fact, MPEG-2 encoders like x262 and MPEG-4 ASP encoders (Xvid) were not parallelized at all or were poorly parallelized, while x264 is able to max out a very high number of cores and thread depending on the settings.
A thing it won't use properly, though, is the second CPU, for instance I noticed that if you have a dual socket configuration like a Dual Xeon with an high number of cores and threads respectively, x264 will use only one of the CPUs, thus reducing the speed.
Anyway, x264 is generally so fast by now that it's fine, also because of modern assembly optimizations (manually written intrinsics) that were not available for x262 and Xvid encoders like AVX2.
x265 on the other hand has been developed with modern hardware in mind and not only it uses modern assembly optimizations (like x264) but it is also heavily parallelized, it uses both CPUs in a dual socket environment and it also enables you to use some additional settings if your CPU has so many cores that it's not maxed out by it.

You mentioned the mathematical complexity and indeed H.264 was based on a Discrete Cosine Transform (which works with real numbers and is continuous in 2phi) and the Hadamard Transform which is very light and is meant to take care of what the DCT couldn't compress well enough. As to H.265 it is indeed more demanding in terms of computational cost as it's using the Discrete Cosine Transform and the Discrete Sine Transform, but keep in mind that it could have been even more demanding 'cause years ago, before 2013, there were propositions about using the Karhunen-Loeve transform which is the heaviest transform that I know and it's very demanding in terms of computational cost, this is because back in the days it seemed impossible to achieve a 40% reduction compared to H.264 based on a linear-algebra only approach. The thing was that according to the results, the KLT did achieve better results compared to the DCT, however the improvements were so small in some cases and the computational cost was so high that they decided not to proceed with that approach, which then led to the modern DCT, DST approach.
If you take a look at the "future", you'll see 8K and H.266 VVC which inherited the Discrete Cosine Transform and Discrete Sine Transform approach from H.265 HEVC, but it's also using an adaptive multiple transform (AMT) scheme for residual coding for both inter-coded and intra-coded blocks. This approach basically consists of a set of five DCT and DST based transform, namely DCT-II, DCT-V, DCT-VIII, DST-I, and DST-VII and a Signal
Dependent Transform (SDT) is competed to the AMT output. The SDT approximates the optimal Karhunen-Loéve
transform (KLT), which is a signal dependent transform, by estimating current signal to code (transform block) with
similar signals (i.e. reference patch) available at the decoder (already coded). This way a lot of computational power is actually saved by not using the KLT directly which is far too demanding in terms of computational cost.

Anyway, you can be sure of one thing: it will be even more demanding, but, you know, in a world in which we have configurations with Intel Xeons CPUs like this Intel Xeon Platinum 9282 56c/112th (https://www.intel.com/content/www/us/en/products/processors/xeon/scalable/platinum-processors/platinum-9282.html), this doesn't seem to be a problem. For instance, I myself encode with an Intel Xeon 28c/56th at work with 64GB of RAM and a Quadro GPU even though I've been asking them several times to upgrade the CPU as it's been like this ever since 2017 and it's now "old" for what I gotta do on a daily basis.

What do I have at home? Well, a crappy i7 4c/8th with 32 GB DDR4 and an RTX NVIDIA GPU but it doesn't matter since the PC I use at home is for general purpose: browsing (replying to you folks here on Doom9 :P), occasionally watching videos (although I do have my 4K Panasonic Bluray for that), listening to music and studying (I'm in the middle of my master at university while I'm working as encoder for a company).

In a nutshell: computational cost will always get higher and higher but CPUs will get better and better. :P

RanmaCanada

9th October 2019, 02:21

FranceBB hits the nail on the head. It's also why AV1 is a literal order of magnitude slower than x265. As you get more complex with your codecs and your compression, the processing power required is basically a bell curve. There are ways around this, like the SVT implementations of HEVC and AV1, but they are seriously garbage in comparison to a dedicated CPU encode. In time they might get better, but currently, no.

soresu

9th October 2019, 03:05

There are ways around this, like the SVT implementations of HEVC and AV1, but they are seriously garbage in comparison to a dedicated CPU encode. In time they might get better, but currently, no.
SVT is purely CPU encode, there's no GPU, ASIC or other accelerator code in there - just a great parallel scaling framework that seemingly loses no quality as you pile on threads (per the BAV conference), that and oodles of AVX2 and AVX 512 SIMD code.

It's not a question of whether it 'might' get better though, the SVT codecs are owned/controlled by Intel, with Netflix working on it too, so unless they get bored and shelve them, it will continue to get developed because its a perfect way to show off and benchmark their super-mega-core-a-paloosa CPU's.

The libaom encoder is more of a development platform/reference implementation of AV1 optimised into a working encoder, much like libvpx for VP8/VP9 from Google - I wouldn't ever expect it to reach the speed performance of the other implementations because they are also developing the next gen codec on an experimental branch at the moment.

NikosD

9th October 2019, 03:07

Anyway, you can be sure of one thing: it will be even more demanding, but, you know, in a world in which we have configurations with Intel Xeons CPUs like this Intel Xeon Platinum 9282 56c/112th (https://www.intel.com/content/www/us/en/products/processors/xeon/scalable/platinum-processors/platinum-9282.html), this doesn't seem to be a problem. This is not a real world CPU.
It's practically not existent, it doesn't have a written price and it's probably never sold to anyone. Only rumors.
Just for papers.
For instance, I myself encode with an Intel Xeon 28c/56th at work with 64GB of RAM and a Quadro GPU even though I've been asking them several times to upgrade the CPU as it's been like this ever since 2017 and it's now "old" for what I gotta do on a daily basis. Since your needs are that high, you should convince your boss at work to buy some serious processing power with half the money.
Try a 64C/128T EPYC CPU and you will get double processing power with half the money of the Xeon 28C/56T
Yes, it's that simple.

MeteorRain

9th October 2019, 17:55

excellentswordfight

10th October 2019, 10:30

A thing it won't use properly, though, is the second CPU, for instance I noticed that if you have a dual socket configuration like a Dual Xeon with an high number of cores and threads respectively, x264 will use only one of the CPUs, thus reducing the speed.
Anyway, x264 is generally so fast by now that it's fine, also because of modern assembly optimizations (manually written intrinsics) that were not available for x262 and Xvid encoders like AVX2.
x265 on the other hand has been developed with modern hardware in mind and not only it uses modern assembly optimizations (like x264) but it is also heavily parallelized, it uses both CPUs in a dual socket environment and it also enables you to use some additional settings if your CPU has so many cores that it's not maxed out by it.
I have not seen any multi socket issues with x264, I think it handles it fine, maybe not as good as x265, but it can use the second socket if needed. You will see more load on one socket, but isnt that the logical way? Most of the time you wont saturate all threads in a multi socket system, and it will prioritize one socket ofc to minimize cross socket communication.

And tbh, I dont think x265 is better parallelized then x264, at default settings its actaully worse for resolutions under 4k cause of the large CU size. And both x264 and x265 have a hard time to scale beyond 24'ish threads for 1080p at slower settings. Above that clock speeds should be prioritized over threads if not doing chunk encoding. And this is by no means a criticism for x264 or x265, the paralazation and thread scaling is already very impressive for this task. And I dont think we can assume this to get much better without sacrificing something else. To increase speed and to utilize renderfarms, chunk encoding will be the way forward. And for live/realtime content, it will be hw-encoding doing the job.

Now regarding x265, it was released at around 2013, a year which Haswell released. i7-4770K comes with 4c8t at 3.5GHz. Within 5 years, what do we get? Core i9-9900K that's 8c16t at 4GHz if you can afford that. From passmark score it's barely 2x performance to 470K, and even you take AVX2 into consideration it's not gonna be 3x 4x performance. You used to get 3 fps, now it's 9 fps, which is still slow.
4770k have AVX2 to, if I'm not mistaken it was introduced with haswell. I own an 4790k and have some experience with 9900, I would say that there is about an 2,5x performance difference for x265.

aymanalz

10th October 2019, 12:18

Things designed for future are supposed to be used with future technologies.

May I ask what is a "normal" processor. When x264 was released, I was among one of the pioneers to use x264 for daily driving. What is a normal processor by then? An Athlon 64 4000+ with 1 core 1 thread at 2.4GHz is probably a HEDT(?) processor. A Sempron 2400+ is probably a fairly normal processor with 1c1t at 1.66GHz. Does the latest x264 run faster on a Sempron 2400+? Probably not.

Within 5 years after that, at around 2010 we got Phenom II X6 1055T at a reasonable price, with 6c6t at 2.8GHz, which is about 10x fast as an Athlon 64 4000+. You used to get 3 fps from x264, now it's 30 fps, which sounds very reasonable.

Now regarding x265, it was released at around 2013, a year which Haswell released. i7-4770K comes with 4c8t at 3.5GHz. Within 5 years, what do we get? Core i9-9900K that's 8c16t at 4GHz if you can afford that. From passmark score it's barely 2x performance to 4770K, and even you take AVX2 into consideration it's not gonna be 3x 4x performance. You used to get 3 fps, now it's 9 fps, which is still slow.

So, blame the CPU manufacturers, not developers.

Also you probably made an assumption that code can be optimized by a good portion.
...

But HEVC is no longer for the future, is it? Successors to HEVC are in development, so HEVC is the present.

The part about how processors haven't improved much is absolutely right, and I had that in mind as well. From the year 2000 to 2010, processors (low end, mid grade, high end, everything) became several times faster, and sold for the same or even lower prices, probably due to the Intel-AMD competition. But incremental improvements have been a lot less in this decade, especially after Sandybridge or Haswell.

I could blame the CPU manufacturers, or blame the x265 developers for not anticipating that processing power per price will not keep increasing at the rate it used to, but I'm not really trying to assign blame; just making an observation that x265 is extremely slow on "normal" processors - by which I meant a reasonable home desktop without a gazillion cores and threads. (I'd say a quad core i7 or hexacore is the mainstream now.)

I wasn't assuming that code can be optimized further - I was wondering out loud whether it could. I was lamenting that perhaps they have reached a point where further optimizations for speed just isn't possible - in which case, only professional encoders or studios with 16+ core machines can use it at decent speeds. Not the casual home users.

LigH

10th October 2019, 12:34

CPUs are not developed with the one and only purpose to encode video.

If you want video encoding at top speed, use a dedicated video encoder chip... but as always, there are the usual conflicts between speed, accuracy/complexity, and other factors: "You can't have them all at maximum at the same time."

Rousseau

10th October 2019, 15:14

On UHD rips with 3.2 , the image looks darker when played in MPV than with rips made in 3.1 . They look the same in MPC with no tone mapping. I made no change other than the encoder.

Boulder

10th October 2019, 16:52

Have you compared the metadata between the two encodes?

Barough

10th October 2019, 19:32

x265 v3.2+6-f46aa2bc1c341 (https://www.mediafire.com/file/7jca2pcj5eobv2b/x265-3.2+6-f46aa2bc1c34_Win_GCC920.7z/file) (32 & 64-bit 8/10/12bit Multilib Windows Binaries) (GCC 9.2.0)
https://bitbucket.org/multicoreware/x265/commits/branch/default

MeteorRain

10th October 2019, 21:42

And tbh, I dont think x265 is better parallelized then x264, at default settings its actually worse for resolutions under 4k cause of the large CU size. And both x264 and x265 have a hard time to scale beyond 24'ish threads for 1080p at slower settings.

4770k have AVX2 to, if I'm not mistaken it was introduced with haswell. I own an 4790k and have some experience with 9900, I would say that there is about an 2,5x performance difference for x265.

x264 used to suffer from high thread count causing reduction of encoding quality. Later the threading is greatly improved and people are no longer limited to the (more optimized option of) 8 - 12 threads and can go beyond without much loss of quality.

Haswell is the first generation that introduced AVX2. However the performance of AVX2 is not consistent across multiple generations. The manufacturer is constantly improving instruction speed, and AVX2 on Haswell is slower than AVX2 on Skylake or later generations by some percent. That's why I said the performance difference on AVX2 needs to be taken into consideration.

MeteorRain

10th October 2019, 22:02

But HEVC is no longer for the future, is it? Successors to HEVC are in development, so HEVC is the present.

The part about how processors haven't improved much is absolutely right, and I had that in mind as well. From the year 2000 to 2010, processors (low end, mid grade, high end, everything) became several times faster, and sold for the same or even lower prices, probably due to the Intel-AMD competition. But incremental improvements have been a lot less in this decade, especially after Sandybridge or Haswell.

I could blame the CPU manufacturers, or blame the x265 developers for not anticipating that processing power per price will not keep increasing at the rate it used to, but I'm not really trying to assign blame; just making an observation that x265 is extremely slow on "normal" processors - by which I meant a reasonable home desktop without a gazillion cores and threads. (I'd say a quad core i7 or hexacore is the mainstream now.)

I wasn't assuming that code can be optimized further - I was wondering out loud whether it could. I was lamenting that perhaps they have reached a point where further optimizations for speed just isn't possible - in which case, only professional encoders or studios with 16+ core machines can use it at decent speeds. Not the casual home users.

From different point of view, you can say HEVC is the present, you can say it's not. For example, online streaming still uses AVC (some like Youtube may be using VP9 but still), we still buy Blu-rays instead of UHDs, majority of the content distributors are still using AVC. The use of HEVC would still be quite limited so far, even if assuming HEVC can be encoded at a faster speed. I admit there are other factors like royalty, but still.

For casual home users you can still use x265, just with a lower settings. On one of a workstation that I have, with only 6 cores Sandy Bridge, I can still get 4-5 FPS on 1080P with a reasonable settings (10-bit, medium preset, LP tuning). Today a $199 6 cores AMD is already twice the speed of that SNB, so 10 FPS is what you get if you max it out. I personally feel like this is an acceptable speed for casual home users. If someone needs to do extensive HEVC encoding at a decent speed, a higher performance CPU is inevitable. (Ryzen 3900X is only $499 which is still kinda in the affordable range.)

And I apologize for my incorrect assumption. Thanks for your input.

RanmaCanada

10th October 2019, 23:19

And we already know that the current highest end Epyc 7742 (64 cores) can do 8k encoding in faster than real time (79fps). Now imagine what a 32 core Zen2 Threadripper will be able to do. Heck I wish someone here who has a 3900x would run Sagitare's benchmark and post their results. We haven't had any of the new Zen 2 chips benched. Though I think he would need to update all the binaries to reflect the changes that have been made in x265 as the benchmark is 2 years old.

Natty

10th October 2019, 23:38

On UHD rips with 3.2 , the image looks darker when played in MPV than with rips made in 3.1 . They look the same in MPC with no tone mapping. I made no change other than the encoder.

i have observed the same issue with 3.2, but with HD. somehow managed to not make it darker or lighter through avisynth.

Rousseau

11th October 2019, 02:56

Have you compared the metadata between the two encodes?

I'm getting some contradictory results now so this may just be some mpv wonkiness. I did notice that 3.1 stable (3.1.2+4-dc2dcb5) concatenates the cll settings with the master-display settings which might be an issue. Other versions have them separate.

3.1.2+4-dc2dcb5
master-display=G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,1)cll=0,0

3.2+5-354901970679
master-display=G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,1) / cll=0,0

I've also noticed AQ2 seems to have problems in HDR encodes (3.1 & 3.2 builds I've tested), leaving a lot of blotchy grain/puddling in flat areas of some scenes. AQ1 does much better. It's not that visible when you look at it without tone mapping, but in MPV it's very obvious. I tried AQ3 & 4 with no improvement.

NikosD

11th October 2019, 06:08

Oh guys, please.

How many times do I have to post it ?

x265 is not optimized for speed/multi-threading and AMD CPUs.

64C/128T EPYC processor managed to encode faster than real-time 8K H.265 stream, using Beamr H.265 encoding software, which is optimized for speed/multi-threading and EPYC.

H.265 is different than x265.

https://www.tomshardware.com/news/amd-epyc-rome-8k-real-time-encoding,40400.html

Boulder

11th October 2019, 06:52

I'm getting some contradictory results now so this may just be some mpv wonkiness. I did notice that 3.1 stable (3.1.2+4-dc2dcb5) concatenates the cll settings with the master-display settings which might be an issue. Other versions have them separate.

3.1.2+4-dc2dcb5
master-display=G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,1)cll=0,0

3.2+5-354901970679
master-display=G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,1) / cll=0,0

I've also noticed AQ2 seems to have problems in HDR encodes (3.1 & 3.2 builds I've tested), leaving a lot of blotchy grain/puddling in flat areas of some scenes. AQ1 does much better. It's not that visible when you look at it without tone mapping, but in MPV it's very obvious. I tried AQ3 & 4 with no improvement.
Did you try a higher aq-strength for mode 2? I think some people reported that HDR encoding requires that.
EDIT: I forgot that I was "there" as well : https://forum.doom9.org/showthread.php?t=175631 . I've not done any HDR encodes in a long time but I would probably go for aq-mode 1 at the default strength.

I think you should report an issue for both of these cases at https://bitbucket.org/multicoreware/x265/issues.

I have a feeling that HDR encoding is not optimal (and also opened an issue regarding this without a response), pointing to how much you need to change aq-strength and CRF compared to SDR encodes.