Log in

View Full Version : x265 HEVC Encoder


Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 [82] 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197

aymanalz
28th July 2016, 00:49
By "material", I rather mean rather little or much action, rather random or regular motion, short or far range. The fact that they are stored on Blu-ray is less relevant...

If you focus on Hollywood blockbusters, you will probably have to expect much random and rather far motion, therefore a more exhaustive search may be more efficient for the quality/size ratio, but cost some speed. In case of nature documentaries, there are different attributes...

Which is the more exhaustive search, umh or star?

Rogatti
28th July 2016, 01:10
What is everyone's opinion of x265? Is it worth the longer encoding time compared to x264?

X265 (anime 1.2GB)
-
http://i1292.photobucket.com/albums/b570/Bispo_Guerra/x265_zpsltqiur3d.png
-
-
X264 (anime 4.7GB)
-
http://i1292.photobucket.com/albums/b570/Bispo_Guerra/x264_zpsgclhgsvt.png

Khun_Doug
28th July 2016, 04:35
You surely asked a good question aymanalz, and one that I have pondered abundantly lately. For the time being I have decided to stay with x264 until the speed of x265 gets some serious attention. The price of disk is cheap. I added 5TB USB for just over $100. With x264 I usually encode at CRF 20, but sometimes as low as CRF 15. For my HD media I usually go 2 pass with nothing short of 10,000 kbit. I did some test encodes with x265 and calculated the encode time for a full movie would be in excess of 48 hours of solid machine time. The same film encodes in under 5 hours in x264. I don't care that the encoded film is 8 GB in x264 and may be les sin x265. I don't have 48 hours or more of dedicated machine time to use for single encodes.

I can revisit using x265 seriously for my video library once the speed issue is resolved. In the meantime, I am enjoying testing and evaluating it. I feel there is great promise and progress forthcoming.

LigH
28th July 2016, 06:12
Which is the more exhaustive search, umh or star?

Hmm, now that I read the docs (http://x265.readthedocs.io/en/default/cli.html#cmdoption--me) again ... I believe we need explanations (and even diagrams!) from a person with more insight.

The description of "star" mode sounds as if it is quite adaptive and can be fast if it finds a good match already in the first star pattern search, but will take more time if the other two steps are required in case of a bad match. Speculations. :o

aymanalz
28th July 2016, 07:22
You surely asked a good question aymanalz, and one that I have pondered abundantly lately. For the time being I have decided to stay with x264 until the speed of x265 gets some serious attention. The price of disk is cheap. I added 5TB USB for just over $100. With x264 I usually encode at CRF 20, but sometimes as low as CRF 15. For my HD media I usually go 2 pass with nothing short of 10,000 kbit. I did some test encodes with x265 and calculated the encode time for a full movie would be in excess of 48 hours of solid machine time. The same film encodes in under 5 hours in x264. I don't care that the encoded film is 8 GB in x264 and may be les sin x265. I don't have 48 hours or more of dedicated machine time to use for single encodes.

I can revisit using x265 seriously for my video library once the speed issue is resolved. In the meantime, I am enjoying testing and evaluating it. I feel there is great promise and progress forthcoming.

I said the same thing, when x265 was at version 1.7. My computer at the time was getting about 1.5 FPS! The developers then stated (on this thread, I believe) that they had speed improvements planned for the future, but that they had other priorities to address first. Version 1.8 brought quite a bit of speed improvement, and in my experience, version 2.0 has also made it faster. (Not sure if everybody else has experienced speed improvement with 2.0)

But you are right, they do need to further speed it up, if it is possible to do so. And I really hope it is possible, because as you state, x265 shows tremendous promise over previous codecs. I see that with every encode I make - the quality of output is far better than x264, at similar bitrates. But the encoding speed is still drearily slow for those of us who don't have professional level hardware. Which is why I'm also holding back on re-encoding a ton of HD home videos for now.

aymanalz
28th July 2016, 07:45
Hmm, now that I read the docs (http://x265.readthedocs.io/en/default/cli.html#cmdoption--me) again ... I believe we need explanations (and even diagrams!) from a person with more insight.

The description of "star" mode sounds as if it is quite adaptive and can be fast if it finds a good match already in the first star pattern search, but will take more time if the other two steps are required in case of a bad match. Speculations. :o

The manual states:

Generally, the higher the number the harder the ME method will try to find an optimal match.

That line suggests that star is more exhaustive, since the "number" is 3 for star and 2 for umh. But it is far from clear, and several posts here gave me the opposite impression.

I hope this gets clarified soon. Isn't it one of the more important determiners of quality?

LigH
28th July 2016, 09:46
It has an impact to the quality/size ratio, but it is not necessarily responsible for better quality in general. If you don't care much about the size, remember: if motion search doesn't find a good enough match for inter coding (motion vector + difference), intra coding (independent newly coded content) is used, which just requires more space, but if the output size is not restricted directly, quality can still be convenient.

Quality gets only worse if you have a limited average or maximum bitrate, which could only be achieved by coarser quantization if too many intra blocks take too much of the overall bitrate. Fortunate motion search matches could have spared bitrate, and finding them requires a more or less thorough search for them.

shinchiro
28th July 2016, 13:54
Since it has been 3 years x265 under development, is there any news about gpu acceleration? :)

LigH
28th July 2016, 14:05
I doubt x265 development will ever consider GPU support. The complexity of HEVC encoding can easily surpass the limits of most available GPU architecture.

nakTT
28th July 2016, 14:44
XhmikosR recently released an MSYS / MinGW / GCC 6.1.0 package (2016-07-08); so I built x265 binaries for you to cross-compare compilers and linking types of mostly the same generic options, just including the pentium4/generic flag pair Ma suggested for Win32 builds (at least I hope I did it right, please check).

x265 2.0+10-5a0e139e2938 (GCC 5.3.0) (https://www.mediafire.com/download/bwv0iiyye5qz0y1/x265_2.0+10-5a0e139e2938.GCC530.7z)
x265 2.0+10-5a0e139e2938 (GCC 6.1.0) (https://www.mediafire.com/download/srytnqt61y7l76u/x265_2.0+10-5a0e139e2938.GCC610.7z)
Thanks. It works fine.

I'm using the 64bit version of the GCC 6.1.0 build.

Based on my limited testing, it seems that the 8bit version is close to 75% faster than the 12bit version. Both using Very Slow preset.

Thanks for the executables and hope you would release a new executibles as the encoder get updated.:thanks:

LigH
28th July 2016, 18:35
I will. Once in a while. Especially when important updates were published.

brumsky
28th July 2016, 19:10
X265 (anime 1.2GB)
-
http://i1292.photobucket.com/albums/b570/Bispo_Guerra/x265_zpsltqiur3d.png
-
-
X264 (anime 4.7GB)
-
http://i1292.photobucket.com/albums/b570/Bispo_Guerra/x264_zpsgclhgsvt.png


Thanks for the images. I don't argue that the quality of x265 has improved significantly from even a year ago.

I'm still on the fence with the encoding time... All three systems I have are only AVX, none of them have AVX2 which gave a decent speed boost.

brumsky
28th July 2016, 19:15
You surely asked a good question aymanalz, and one that I have pondered abundantly lately. For the time being I have decided to stay with x264 until the speed of x265 gets some serious attention. The price of disk is cheap. I added 5TB USB for just over $100. With x264 I usually encode at CRF 20, but sometimes as low as CRF 15. For my HD media I usually go 2 pass with nothing short of 10,000 kbit. I did some test encodes with x265 and calculated the encode time for a full movie would be in excess of 48 hours of solid machine time. The same film encodes in under 5 hours in x264. I don't care that the encoded film is 8 GB in x264 and may be les sin x265. I don't have 48 hours or more of dedicated machine time to use for single encodes.

I can revisit using x265 seriously for my video library once the speed issue is resolved. In the meantime, I am enjoying testing and evaluating it. I feel there is great promise and progress forthcoming.

Disk space is pretty cheap. However, I do like the lower bitrate for streaming media.

gamebox
28th July 2016, 19:34
While changing some random setting in Media Player Classic I noticed something awkward. My Output filter is set to "Enhanced Video Renderer", and below that is a setting for "resizer" where "Bilinear" was selected by default. After I changed it to one of "Bicubic" settings offered, image sharpness in full-screen mode improved dramatically (I use a Full-HD display and mostly play content encoded below that resolution)!

Could that be the stupid cause for "unexplainable" image softness most people notice with HEVC? When playing H.264 many systems (mine included) use built-in GPU decoders through DXVA, where software resize options might not have influence at all.

Which software do you use for playing HEVC videos?

Barough
28th July 2016, 20:14
I mainly use MPC-BE with default settings. Haven't bothered with tweaking the settings since im pleased with how it looks.

Sent from my Samsung Galaxy S7 edge via Tapatalk

brumsky
28th July 2016, 20:45
I have dual E5-2670 v1 which only have AVX. Anyone here with an equivalent setup with AVX2? What speeds are you averaging given your settings of course.

I use a slightly modified version of littlepox's settings. I'm trying to keep the bitrate down a bit for archival. I average 4.5 - 5.5 FPS with these settings.

--profile main10 --output-depth 10 --ctu 32 --bframes 8 --rc-lookahead 80 --scenecut 40 --ref 5 --limit-refs 0 --me 1 --merange 25 --subme 3 --no-rect --no-amp --limit-modes --max-merge 4 --no-early-skip --b-intra
--no-sao --signhide --weightp --weightb --aq-mode 3 --aq-strength 0.8 --cutree --rd 4 --tu-intra-depth 3 --tu-inter-depth 3 --psy-rd 2.0 --psy-rdoq 2.0 --rdoq-level 2 --lookahead-slices 4 --qcomp 0.65
--no-strong-intra-smoothing --deblock -1:-1 --qg-size 32

I'm curious what a E5-2640 v3 might get on average compared to my setup. They are the same speed 2.6ghz...

aymanalz
28th July 2016, 21:11
It has an impact to the quality/size ratio, but it is not necessarily responsible for better quality in general. If you don't care much about the size, remember: if motion search doesn't find a good enough match for inter coding (motion vector + difference), intra coding (independent newly coded content) is used, which just requires more space, but if the output size is not restricted directly, quality can still be convenient.

Quality gets only worse if you have a limited average or maximum bitrate, which could only be achieved by coarser quantization if too many intra blocks take too much of the overall bitrate. Fortunate motion search matches could have spared bitrate, and finding them requires a more or less thorough search for them.

Right, when I said "quality", I meant quality at a certain bitrate. I should have specified that. When doing 2-pass encodes at a certain average bitrate, isn't the choice of motion estimation method be one of the more important determiners? If I need to encode at a certain bitrate, what other setting would have significant impact on the quality/size ratio? How important in ME method, and what other factors are important?

(For me, it's mostly high definition movies and HD home videos, all in x264, which I want to re-encode at about 30% of original bitrate.)

divxmaster
29th July 2016, 10:43
30% of the original bitrate? You wont get that, but you may get 30% reduction in bitrate. That is what I am seeing. I am currently reencoding ds9, and the latest x265 is excellent, but only with no-sao. Bitrate is about 60% of x264, and the quality is better. Admittedly some of that is that I now have run smdegrain over it.

Ok, now time for something weird. The ultimate x265 low bitrate test. As per previous posts, I have encoded sg1, 480p, low bitrate for mobile devices. I took a 1.36MB chunk of an episode, 1min 7 seconds, and copied it on to a FLOPPY DISK. And guess what, it played it no problem! buffered for 16 seconds and then played the remaining 51 seconds fine. In fact it finished reading the disk after it had played 30 seconds, so the floppy disk transfer rate was too fast! So a great testament to how good x265 is at low bitrates. What about bluray from floppy disk? lets see now...

aymanalz
29th July 2016, 11:10
30% of the original bitrate? You wont get that, but you may get 30% reduction in bitrate. That is what I am seeing. I am currently reencoding ds9, and the latest x265 is excellent, but only with no-sao. Bitrate is about 60% of x264, and the quality is better. Admittedly some of that is that I now have run smdegrain over it.



As I said, I'm using 2-pass encode, so I can choose the bitrate. I don't mind a slight degradation in quality, which is why I'm encoding at 30%-50% of original bitrate. But I'd like to know how I can get maximum possible quality, for the target bitrate I choose.

About SAO that you mention, that's an aspect I'd like to know as well. Are others too experiencing noticeable loss of quality due to SAO? The developers have stated that SAO was improved in 2.0.

divxmaster
29th July 2016, 11:51
Yes, the testing of the 'new' SAO in 2.0 is one thing I have to get around to doing. But as far as fixed bitrate goes, for me I wouldn't do that. The bitrate required is way to variable. In crf mode, I have some 576p that uses a bitrate of 450-500, but other 576p uses a bitrate of 1000-1200! And the 450-500 looks better in this instance.

burfadel
29th July 2016, 14:02
Output bitrate depends on the quality of the source. A poor quality source means you also encode the poor quality elements. A lot of these are random so increases encode complexity.

MeteorRain
29th July 2016, 15:13
Yuuki Asuna Mod
x265-Yuuki-2.0M+9-g457336f+14.7z (https://down.7086.in/x265-Yuuki-Asuna/x265-Yuuki-2.0M%2B9-g457336f%2B14.7z)
x265-Asuna-2.0+2-ge16e208+14.7z (https://down.7086.in/x265-Yuuki-Asuna/x265-Asuna-2.0%2B2-ge16e208%2B14.7z)

dipje
29th July 2016, 22:14
Since it has been 3 years x265 under development, is there any news about gpu acceleration? :)

I thought they had that pretty much from the start, but in the commercial license stuff, not the opensource x265 stuff.

Does x264 have decent gpu acceleration after all those years?? :). OpenCL lookaheads came eventually - very late - and contribute not much.. I don't expect much to change for x265.

Pretty much accelerated hevc encoding already exist on the recent GPUs and there are commandline tools to use it, but you have to do with the quality it provides, not much to improve upon.

JohnLai
30th July 2016, 03:50
I thought they had that pretty much from the start, but in the commercial license stuff, not the opensource x265 stuff.

Does x264 have decent gpu acceleration after all those years?? :). OpenCL lookaheads came eventually - very late - and contribute not much.. I don't expect much to change for x265.

Pretty much accelerated hevc encoding already exist on the recent GPUs and there are commandline tools to use it, but you have to do with the quality it provides, not much to improve upon.

Perhaps they could offload motion estimation section, after all, both Intel and Nvidia (no AMD) support external motion estimation mode only. Then again........it probably will be rejected in the name of 'open source' where there is no room for proprietary code or maybe there isn't any speed gain from it....or nobody is willing to work on it....pick one ~.~

aegisofrime
30th July 2016, 06:46
Perhaps they could offload motion estimation section, after all, both Intel and Nvidia (no AMD) support external motion estimation mode only. Then again........it probably will be rejected in the name of 'open source' where there is no room for proprietary code or maybe there isn't any speed gain from it....or nobody is willing to work on it....pick one ~.~

So is there no possible way to offload some of the processing to the fixed function units on Intel and AMD chips?

Thanks for the images. I don't argue that the quality of x265 has improved significantly from even a year ago.

I'm still on the fence with the encoding time... All three systems I have are only AVX, none of them have AVX2 which gave a decent speed boost.

I used to be on the same boat as you. But one thing that you have to understand is that the presets of x264 and x265 are not comparable. For me, x265's medium is roughly comparable in quality and speed as x264's slow, but with smaller file size of course.

JohnLai
30th July 2016, 07:19
So is there no possible way to offload some of the processing to the fixed function units on Intel and AMD chips?


It is possible, an excerpt from NVENC documentation :

NVENC can be used as a hardware accelerator to perform motion search and generate motion vectors and mode information only. The resulting motion vectors or mode decisions can used, for example, in motion compensated filtering or for supporting other codecs not fully supported by NVENC or simply as motion vector hints for a custom encoder.

Sample code from nvenc :

/**
* Motion vector structure per CU for HEVC motion estimation.
*/
typedef struct _NV_ENC_HEVC_MV_DATA
{
NV_ENC_MVECTOR mv[4]; /**< up to 4 vectors within a CU */
uint8_t cuType; /**< 0 (I), 1(P), 2 (Skip) */
uint8_t cuSize; /**< 0: 8x8, 1: 16x16, 2: 32x32, 3: 64x64 */
uint8_t partitionMode; /**< The CU partition mode
0 (2Nx2N), 1 (2NxN), 2(Nx2N), 3 (NxN),
4 (2NxnU), 5 (2NxnD), 6(nLx2N), 7 (nRx2N) */
uint8_t lastCUInCTB; /**< Marker to separate CUs in the current CTB from CUs in the next CTB */
} NV_ENC_HEVC_MV_DATA;

/**
* Creation parameters for output motion vector buffer for ME only mode.
*/
typedef struct _NV_ENC_CREATE_MV_BUFFER
{
uint32_t version; /**< [in]: Struct version. Must be set to NV_ENC_CREATE_MV_BUFFER_VER */
NV_ENC_OUTPUT_PTR mvBuffer; /**< [out]: Pointer to the output motion vector buffer */
uint32_t reserved1[255]; /**< [in]: Reserved and should be set to 0 */
void* reserved2[63]; /**< [in]: Reserved and should be set to NULL */
} NV_ENC_CREATE_MV_BUFFER;

/** NV_ENC_CREATE_MV_BUFFER struct version*/
#define NV_ENC_CREATE_MV_BUFFER_VER NVENCAPI_STRUCT_VERSION(1)


Of course, there are lotta lines of code before ME step, such as buffer creation + reference frame + buffer locking + buffer pointer location and finally the last step buffer destruction + release.

Questions are.....how to port the code for x265 or can it be done in first place? Who will do it for free? Is there any licensing issue?

burfadel
30th July 2016, 09:25
I have dual E5-2670 v1 which only have AVX. Anyone here with an equivalent setup with AVX2? What speeds are you averaging given your settings of course.

I use a slightly modified version of littlepox's settings. I'm trying to keep the bitrate down a bit for archival. I average 4.5 - 5.5 FPS with these settings.

--profile main10 --output-depth 10 --ctu 32 --bframes 8 --rc-lookahead 80 --scenecut 40 --ref 5 --limit-refs 0 --me 1 --merange 25 --subme 3 --no-rect --no-amp --limit-modes --max-merge 4 --no-early-skip --b-intra
--no-sao --signhide --weightp --weightb --aq-mode 3 --aq-strength 0.8 --cutree --rd 4 --tu-intra-depth 3 --tu-inter-depth 3 --psy-rd 2.0 --psy-rdoq 2.0 --rdoq-level 2 --lookahead-slices 4 --qcomp 0.65
--no-strong-intra-smoothing --deblock -1:-1 --qg-size 32

I'm curious what a E5-2640 v3 might get on average compared to my setup. They are the same speed 2.6ghz...

Those settings are all over the place. try the following:
--output-depth 10 --rd 4 --tu-intra-depth 3 --rdoq-level 2 --early-skip --fast-intra --b-intra --tskip --tskip-fast --limit-modes --aq-mode 2 --qg-size 16 --me star --merange 25 --max-merge 3 --weightb --bframes 6 --rc-lookahead 40 --ref 6 --psy-rdoq 1.38

The merange of 25 is good performance, but it depends on what resolution you are encoding. I have found that the ideal number is the vertical resolution you are encoding to, divided by 1080, multiplied by 57.

So:
480
---- x 57
1080

Equals 25.33, which is why 25 seems to be optimal for 480P If you are doing 720P, likewise:

480
---- x 57
1080

Equals 38. Now in effect, the 25 and 38 takes you to the equivalent point in the picture, since there are more pixels between the two points at 720P than 480P. For b-frames, I found around 6 is ideal for normal content, but maybe 8 for animation. Setting this too high just leads to more encode time with little efficiency benefit. In the stats at the end of the encode, you can see the percentages for the consecutive b-frames. The first number is 0, so if there are 7 numbers the seventh one relates to 6 consecutive b-frames. You will see the percentages can be quite low at the top end, any less than a few percent it wouldn't be worth the extra encode time. Instead of a B frame in that instance, a P frame would be used instead.

For example, for a few recent recent encodes (x265 2.0+11) I got:
x265 [info]: consecutive B-frames: 8.6% 3.0% 6.1% 37.1% 15.3% 23.4% 6.5%
x265 [info]: consecutive B-frames: 6.3% 1.8% 4.5% 35.4% 22.3% 21.4% 8.4%
x265 [info]: consecutive B-frames: 9.7% 5.2% 10.1% 33.7% 13.2% 21.0% 7.1%

The 6.5%, 8.4%, and 7.1% relate to the 6th b-frame. I found no matter what the source, if I selected 7 or higher for testing the 7th and so on frame percentage was very low. That is why I settled on 6 b-frames as optimal.

The other options are a balance of speed and quality. Also note I set --tu-intra-depth 3 as this shows benefits for little peformance cost, however I did not option --tu-inter-depth 3, as it didn't show any noticeable quality or compression improvements, and just slowed down the encode.
The settings -early-skip --fast-intra --tskip --tskip-fast --limit-modes are all performance related, they noticeably improve speed when all used together but really don't impact the quality of the output. Give the settings I listed a go exactly as written apart from the --merange calculation (no other changes) and see how the speed compares to quality.

EDIT:
Forgot to mention, the above is solely the options on the command line. That is, not using any other preset as a basis. The crf (not stated) should be set to the desired amount. I would suggest maybe something a little lower than the default but within what would give you a desirable end file size. You can use decimals, so you could set it to 21.2 if you wanted (for example).

JohnLai
30th July 2016, 14:00
Quick question, why does x265 preset doesn't make use of b-adapt 1(fast)?
All presets either use 0(none) or 2(full). Is there anything wrong with 1(fast)?

Rogatti
30th July 2016, 20:41
Thanks for the images. I don't argue that the quality of x265 has improved significantly from even a year ago.

I'm still on the fence with the encoding time... All three systems I have are only AVX, none of them have AVX2 which gave a decent speed boost.

"speed" certainly x264.
-
benefit cost (quality + size) = x265 .

divxmaster
31st July 2016, 03:43
As I said, I'm using 2-pass encode, so I can choose the bitrate. I don't mind a slight degradation in quality, which is why I'm encoding at 30%-50% of original bitrate. But I'd like to know how I can get maximum possible quality, for the target bitrate I choose.

About SAO that you mention, that's an aspect I'd like to know as well. Are others too experiencing noticeable loss of quality due to SAO? The developers have stated that SAO was improved in 2.0.

got around to testing sao in 2.0. same problem, still blurs way too much. and the no-sao test was *smaller* than the sao one! 1000 frames.

burfadel
31st July 2016, 08:12
got around to testing sao in 2.0. same problem, still blurs way too much. and the no-sao test was *smaller* than the sao one! 1000 frames.

I don't find it blurred with the settings I use...

divxmaster
31st July 2016, 21:33
I don't find it blurred with the settings I use...

Hmmm, odd, by mostly coincidence, my settings are virtually the same as yours, except I use rdoq 1.1, and qg-size 32. I presume you are using staxrip video comparison to compare? It may be due to input source quality, I am currently testing sg1, ntsc.

Leo 69
31st July 2016, 21:34
I'm seeing a significant bitrate reduction when encoding in 12-bit mode compared to 10-bit (around 20% less). The source is 8-bit.
Why?

brumsky
31st July 2016, 22:49
Those settings are all over the place. try the following:


The merange of 25 is good performance, but it depends on what resolution you are encoding. I have found that the ideal number is the vertical resolution you are encoding to, divided by 1080, multiplied by 57.

So:
480
---- x 57
1080

Equals 25.33, which is why 25 seems to be optimal for 480P If you are doing 720P, likewise:

480
---- x 57
1080

Equals 38. Now in effect, the 25 and 38 takes you to the equivalent point in the picture, since there are more pixels between the two points at 720P than 480P. For b-frames, I found around 6 is ideal for normal content, but maybe 8 for animation. Setting this too high just leads to more encode time with little efficiency benefit. In the stats at the end of the encode, you can see the percentages for the consecutive b-frames. The first number is 0, so if there are 7 numbers the seventh one relates to 6 consecutive b-frames. You will see the percentages can be quite low at the top end, any less than a few percent it wouldn't be worth the extra encode time. Instead of a B frame in that instance, a P frame would be used instead.

For example, for a few recent recent encodes (x265 2.0+11) I got:


The 6.5%, 8.4%, and 7.1% relate to the 6th b-frame. I found no matter what the source, if I selected 7 or higher for testing the 7th and so on frame percentage was very low. That is why I settled on 6 b-frames as optimal.

The other options are a balance of speed and quality. Also note I set --tu-intra-depth 3 as this shows benefits for little peformance cost, however I did not option --tu-inter-depth 3, as it didn't show any noticeable quality or compression improvements, and just slowed down the encode.
The settings -early-skip --fast-intra --tskip --tskip-fast --limit-modes are all performance related, they noticeably improve speed when all used together but really don't impact the quality of the output. Give the settings I listed a go exactly as written apart from the --merange calculation (no other changes) and see how the speed compares to quality.

EDIT:
Forgot to mention, the above is solely the options on the command line. That is, not using any other preset as a basis. The crf (not stated) should be set to the desired amount. I would suggest maybe something a little lower than the default but within what would give you a desirable end file size. You can use decimals, so you could set it to 21.2 if you wanted (for example).


@burfadel,

Thanks for all of the info! I really appreciate the detailed response.

I ran a quick test of your settings compared to mine. Here are a couple of screenshots.

Your settings + medium profile: 6068 Kpbs
https://s32.postimg.org/co84sapq9/287_Dexter_Season_5_t03_test_doom9_settings.png (https://postimg.org/image/co84sapq9/)



I took another look at my previous settings, I use those for older video or that I'm trying to get through a little faster.


--crf 21 --profile main10 --output-depth 10 --ctu 32 --bframes 8 --rc-lookahead 80 --scenecut 40 --ref 5 --limit-refs 0 --me 3 --merange 27 --subme 5 --no-rect --no-amp --limit-modes --max-merge 4 --no-early-skip
--b-intra --no-sao --signhide --weightp --weightb --aq-mode 3 --aq-strength 0.9 --cutree --rd 4
--tu-intra-depth 3 --tu-inter-depth 3 --psy-rd 2 --psy-rdoq 2 --rdoq-level 2 --lookahead-slices 4 --qcomp 0.65 --no-strong-intra-smoothing --deblock -1:-1 --qg-size 32

My settings + Medium profile: 9773 Kbps
https://s32.postimg.org/d29grw9tt/287_Dexter_Season_5_t03_test_new.png (https://postimg.org/image/d29grw9tt/)


I admit your setting do offer a lower bit rate, I used crf 21 for both encodes. Your settings were almost twice as fast as mine. I did a couple of quick comparisons purely for speed by adjusting b frames, tu-inter-depth, me. I noticed the about .2-.3 more fps when I removed tu-inter-depth. The others were .1ish.

Regarding me range - I read a post from x265Project regarding how they determine their default number of 57, as it pertains to their profiles. It would take me sometime to find it. They took the ctu size 64 and subtracted 7 for varies reasons. I don't recall of the reasons that were stated. I essentially applied the same logic to a ctu size of 32. 32 - 7 = 25. :) I recently upped it to 27.

With that said I admit watching the two encodes side by side I could barely see a difference. It was only with the screenshots that I could easily identify the differences. I will need to reevaluate some of my settings from a speed perspective.

@Rogatti

I went back and played with x264 a bit and I'm shocked to see the quality difference is far more noticeable then I ever remember. I know my tests weren't apples to apples but I tried x265 medium vs x264 slow + slower. In both cases x265 quality was noticeable better even when in motion.

My question, is x265 worth it has been answered! :)

@littlepox

Why is ctu 32 in your recommended film tune?


@ anyone willing to help :)

Would anyone be willing to help me adjust my settings? I'd like to increase or maintain the quality of my settings while trying to remove the options that are slowing my encodes down with little to no benefit. The idea being I can increase/decrease crf to achieve the ideal bitrate/file size.

On a side note. I'd like to thank the community as a whole for being awesome. I've never been a forum that is as willing to help and most importantly nice about it as doom9.:D

burfadel
1st August 2016, 04:55
@burfadel,

Thanks for all of the info! I really appreciate the detailed response.

I ran a quick test of your settings compared to mine. Here are a couple of screenshots.

Your settings + medium profile: 6068 Kpbs
https://s32.postimg.org/co84sapq9/287_Dexter_Season_5_t03_test_doom9_settings.png (https://postimg.org/image/co84sapq9/)



I took another look at my previous settings, I use those for older video or that I'm trying to get through a little faster.


--crf 21 --profile main10 --output-depth 10 --ctu 32 --bframes 8 --rc-lookahead 80 --scenecut 40 --ref 5 --limit-refs 0 --me 3 --merange 27 --subme 5 --no-rect --no-amp --limit-modes --max-merge 4 --no-early-skip
--b-intra --no-sao --signhide --weightp --weightb --aq-mode 3 --aq-strength 0.9 --cutree --rd 4
--tu-intra-depth 3 --tu-inter-depth 3 --psy-rd 2 --psy-rdoq 2 --rdoq-level 2 --lookahead-slices 4 --qcomp 0.65 --no-strong-intra-smoothing --deblock -1:-1 --qg-size 32

My settings + Medium profile: 9773 Kbps
https://s32.postimg.org/d29grw9tt/287_Dexter_Season_5_t03_test_new.png (https://postimg.org/image/d29grw9tt/)


I admit your setting do offer a lower bit rate, I used crf 21 for both encodes. Your settings were almost twice as fast as mine. I did a couple of quick comparisons purely for speed by adjusting b frames, tu-inter-depth, me. I noticed the about .2-.3 more fps when I removed tu-inter-depth. The others were .1ish.

As a result of the lower bitrate with my suggested settings, you can use an even lower CRF. Between 5058 kbps and 9773 bbps is a big jump, that's 61 percent more bandwidth used, so your settings should look better. Higher bandwidth like that usually will. It is surprising how much more bandwidth your settings used though, it's very much to the point where regardless of whether you like the result there's definitely a lot of bitrate wasting somewhere in your settings. The key concept is that you don't have to stick to a particular CRF. If you want to improve the quality of the settings I suggested, just simply lower the CRF. Normally you do compare CRF to CRF, but you have to take into account encode time and file size as well. I believe comparing based on quality relating to bitrate used would be a far better metric for this comparison.

Another 'trick' to save a bit of bitrate, which in turn means you can lower the CRF slightly is to use:
--nr-intra 400 --nr-inter 400
or whatever decent size number you choose. Don't go too high too low though, otherwise you defeat the purpose :). This cleans up a little of the low-level noise. It shouldn't be used by itself though, you need to combine it with reducing the CRF a bit. You save a bit of bitrate by setting NR, in turn you an use this recovered bitrate to improve the quality of the encode.

Maybe you could try the encode again, using similar to what I listed before but with the addition of the NR and a lower CRF. Note also the lookahead range, I don't know why I wrote 40 before since I use 50! Seeing as you are using 1920x1080, I would suggest a higher ME. Say 57, but let's for argument sakes say 40:
--crf 18 --output-depth 10 --rd 4 --tu-intra-depth 3 --rdoq-level 2 --early-skip --fast-intra --b-intra --tskip --tskip-fast --limit-modes --aq-mode 2 --qg-size 16 --me star --merange 40 --max-merge 3 --weightb --bframes 6 --rc-lookahead 50 --ref 6 --psy-rdoq 1.38 --nr-intra 400 --nr-inter 400

Why the use of --ctu 32 (faster but lower encode efficiency), --bframes 8 (slower) --limit-refs 0 (slower than default of 3), --subme 5 (much slower than default 2, only fractional gains over default 2), --no-early-skip (slower than early skip), --no-sao (why?) --aq-mode 3 (I've gone back to 2 after further testing) --aq-strength 0.9 (default of 1.0 is fine in mode 2), --no-strong-intra-smoothing, --qg-size 32 (I think 16 is better for detail).

Can you explain your reasons for the above settings :). Yes quality is the goal, but that needs to be balanced with speed and output size. Your output size is a little high, you should be able to achieve similar results with a lower CRF based on the settings I suggested.

brumsky
1st August 2016, 18:12
As a result of the lower bitrate with my suggested settings, you can use an even lower CRF. Between 5058 kbps and 9773 bbps is a big jump, that's 61 percent more bandwidth used, so your settings should look better. Higher bandwidth like that usually will. It is surprising how much more bandwidth your settings used though, it's very much to the point where regardless of whether you like the result there's definitely a lot of bitrate wasting somewhere in your settings. The key concept is that you don't have to stick to a particular CRF. If you want to improve the quality of the settings I suggested, just simply lower the CRF. Normally you do compare CRF to CRF, but you have to take into account encode time and file size as well. I believe comparing based on quality relating to bitrate used would be a far better metric for this comparison.

Another 'trick' to save a bit of bitrate, which in turn means you can lower the CRF slightly is to use:
--nr-intra 400 --nr-inter 400
or whatever decent size number you choose. Don't go too high too low though, otherwise you defeat the purpose :). This cleans up a little of the low-level noise. It shouldn't be used by itself though, you need to combine it with reducing the CRF a bit. You save a bit of bitrate by setting NR, in turn you an use this recovered bitrate to improve the quality of the encode.

Maybe you could try the encode again, using similar to what I listed before but with the addition of the NR and a lower CRF. Note also the lookahead range, I don't know why I wrote 40 before since I use 50! Seeing as you are using 1920x1080, I would suggest a higher ME. Say 57, but let's for argument sakes say 40:


Why the use of --ctu 32 (faster but lower encode efficiency), --bframes 8 (slower) --limit-refs 0 (slower than default of 3), --subme 5 (much slower than default 2, only fractional gains over default 2), --no-early-skip (slower than early skip), --no-sao (why?) --aq-mode 3 (I've gone back to 2 after further testing) --aq-strength 0.9 (default of 1.0 is fine in mode 2), --no-strong-intra-smoothing, --qg-size 32 (I think 16 is better for detail).

Can you explain your reasons for the above settings :). Yes quality is the goal, but that needs to be balanced with speed and output size. Your output size is a little high, you should be able to achieve similar results with a lower CRF based on the settings I suggested.

Several of my settings come from littlepox's most recent film tune.

--ctu 32: comes from varies posts on Doom9. The short version is that ctu 64 causes the bitrate to increase to compensate for the increased compression. I've also read that there is increased quality. It is also in littlepox's suggested film tune.

--bframe 8: Looking for increased compression with minimal increase in encoding time. Littlepox again.

--limit-ref 0: looking for increased compression.

--no-early-skip: Fear of decreased compression. I have changed to --early-skip for increased performance.

--subme 5: Better motion est. I've been bouncing back and forth between 3 & 5 though. >=3 includes chroma residual cost...

--no-sao: I've heard it called the blur all objects option. haha

--no-strong-intra-smoothing: blurs

--aq-mode 3: pulled from littlepox's tune film. I've been back and forth on this one as well.

--qg-size 32: I've changed to 16 as well. I did some testing and show a minimal savings compared to 16. 16 does appear to be sharper.

--me-range 25: x265 docs state the following.

The default is derived from the default CTU size (64) minus the luma interpolation half-length (4) minus maximum subpel distance (2) minus one extra pixel just in case the hex search method is used. If the search range were any larger than this, another CTU row of latency would be required for reference frames.

64 - 4 - 2 - 1 = 57

I applied the same logic to a CTU of 32.

32 - 4 - 2 - 1 = 25

I made the change to avoid the additional CTU row of latency.

Your settings have given me a lot to think about and test. I spent several hours testing and tweaking my settings compared to yours. I'm currently using these.

--crf 21 --profile main10 --output-depth 10 --ctu 32 --bframes 6 --rc-lookahead 40 --scenecut 40 --ref 5 --limit-refs 0 --me 3 --merange 26 --subme 3 --no-rect --no-amp
--limit-modes --max-merge 3 --early-skip --b-intra --no-sao --signhide --weightp --weightb --aq-mode 2 --aq-strength 1 --cutree --rd 4 --tu-intra-depth 3 --tu-inter-depth 1
--psy-rd 2 --psy-rdoq 1.5 --rdoq-level 2 --lookahead-slices 4 --qcomp 0.65 --no-strong-intra-smoothing --deblock -1:-1 --qg-size 16

I plan on tweaking limit-refs next.

Take your settings and add --ctu 32 --merange 25 and give it a go. I did and found ctu 32 give a slightly smaller file size. My 60 second test clip is 182 MB.

Using your settings it comes out to 46.5 MB.

Add --ctu 32, 44.8 MB. Nothing crazy I know but it encodes about 1 - 1.5 fps faster for a 3.5% smaller file.

gamebox
1st August 2016, 19:35
@bromski: Can you repeat the last test you talked about, but adding: --rect, --amp, --no-early-skip, and --tu-inter-depth 3 options when encoding with --ctu 64?

It defies logic that a video encoded with a limiting option like --ctu 32 results in smaller file. All these options I suggested replacing should enable the encoder to analyze bigger CUs more thoroughly, and reuse more material from them.

brumsky
1st August 2016, 20:36
@bromski: Can you repeat the last test you talked about, but adding: --rect, --amp, --no-early-skip, and --tu-inter-depth 3 options when encoding with --ctu 64?

It defies logic that a video encoded with a limiting option like --ctu 32 results in smaller file. All these options I suggested replacing should enable the encoder to analyze bigger CUs more thoroughly, and reuse more material from them.

I took the original settings burfadel suggested and added the options you suggested.

--crf 21 --output-depth 10 --rd 4 --tu-intra-depth 3 --tu-inter-depth 3 --rdoq-level 2 --rect --amp --no-early-skip
--fast-intra --b-intra --tskip --tskip-fast --limit-modes --aq-mode 2 --qg-size 16 --me star --merange 25 --max-merge 3 --weightb --bframes 6 --rc-lookahead 40 --ref 6 --psy-rdoq 1.38

Result: 187MB test clip -> 35.7MB - 4553 Kbps
https://s32.postimg.org/jcgxndvch/287_Dexter_Season_5_t03_test_new_rect_amp_ctu.png (https://postimg.org/image/jcgxndvch/)



--crf 21 --output-depth 10 --rd 4 --ctu 32 --tu-intra-depth 3 --tu-inter-depth 3 --rdoq-level 2 --rect --amp --no-early-skip --fast-intra
--b-intra --tskip --tskip-fast --limit-modes --aq-mode 2 --qg-size 16 --me star --merange 25 --max-merge 3 --weightb --bframes 6 --rc-lookahead 40 --ref 6 --psy-rdoq 1.38

Result: 187MB test clip -> 44.1MB - 5737 Kbps
https://s32.postimg.org/slj3xi48h/287_Dexter_Season_5_t03_test_new.png (https://postimg.org/image/slj3xi48h/)

CTU 64 is about 19.1% smaller. Each encode was about 50-60% slower than before. If you compare the pics I actually see more detail in the ctu 32 pic. Look around the mouth and chin, it is noticeable more blurred in the ctu 64 pic.

I was never trying to say that ctu 64 can never be smaller. Just that compared to burfadels original settings adding ctu 32 was faster and smaller.

Burfadels original settings - > 46.5MBs - 6068 Kbps

I'm admittedly trading size for speed.

For comparison here are my current settings with CRF 22. I'm trying to get closer to the size and bitrate of the ctu 64 with amp rect settings.

--crf 22 --profile main10 --output-depth 10 --ctu 32 --bframes 6 --rc-lookahead 40 --scenecut 40 --ref 5 --limit-refs 3 --me 3
--merange 26 --subme 3 --no-rect --no-amp --limit-modes --max-merge 3 --early-skip --b-intra --no-sao --signhide --weightp --weightb --aq-mode 2 --aq-strength 1 --cutree
--rd 4 --tu-intra-depth 3 --tu-inter-depth 1 --psy-rd 2 --psy-rdoq 1.38 --rdoq-level 2 --qcomp 0.65 --no-strong-intra-smoothing --deblock -1:-1 --qg-size 16


Results: 187MB -> 37.7 MB - 4835 Kbps
https://s31.postimg.org/idl87j0h3/287_Dexter_Season_5_t03_test_new.png (https://postimg.org/image/idl87j0h3/)


To my eye my settings look a little better then both encodes and I averaged closer to 7 fps...

I shouldn't use the term "my settings' as those are largely Burfadels suggestions with a few minor tweaks. --no-sao --no-strong-intra-smoothing --ctu 32

gamebox
1st August 2016, 23:08
Hey, brumsky, thanks a lot :)

That test meant a lot to me, as I could verify that --ctu 64 option can indeed give noticeably more efficient encoding, however - it also brings an unexpectedly big speed penalty. My logic tells me that difference in speed should get significantly reduced with further x265 optimizations, as biggest CUs are present in low detail areas, not many get created because of their large size, and I expect them to "crumble down" to smaller blocks fast.

Also, could there be some sort of bug in x265 logic when using --ctu 64, since your first encode has extremely reduced details, and big reduction in bitrate - as if you encoded using completely different settings? I wouldn't expect to have that many largest CUs in a frame you showed.

--amp is probably the most useless option quality wise of the ones I recommended. It is also, highly probably, slowing down process the most.
--rect is more useful for quality and brings less speed penalty.
--no-early-skip influenced speed dramatically in my tests, but likewise had large influence on quality too.
--tu-inter/intra-depth 3 are the options I recently added to my encodes, as they did prove to increase quality. I lost about 10-15% speed.
I encode using slightly modified slower profile.

brumsky
2nd August 2016, 00:01
Hey, brumsky, thanks a lot :)

That test meant a lot to me, as I could verify that --ctu 64 option can indeed give noticeably more efficient encoding, however - it also brings an unexpectedly big speed penalty. My logic tells me that difference in speed should get significantly reduced with further x265 optimizations, as biggest CUs are present in low detail areas, not many get created because of their large size, and I expect them to "crumble down" to smaller blocks fast.

Also, could there be some sort of bug in x265 logic when using --ctu 64, since your first encode has extremely reduced details, and big reduction in bitrate - as if you encoded using completely different settings? I wouldn't expect to have that many largest CUs in a frame you showed.

--amp is probably the most useless option quality wise of the ones I recommended. It is also, highly probably, slowing down process the most.
--rect is more useful for quality and brings less speed penalty.
--no-early-skip influenced speed dramatically in my tests, but likewise had large influence on quality too.
--tu-inter/intra-depth 3 are the options I recently added to my encodes, as they did prove to increase quality. I lost about 10-15% speed.
I encode using slightly modified slower profile.

I copied burfadel's settings exactly and only changed the ones you mentioned. rect, amp, ctu 64, no-early-skip. I stopped using rect and amp months ago because of the speed penalty.

burfadel had --tskip --tskip-early --fast-intra, those could be responsible for the decreased visual quality. Although, they were in both the ctu 64 and 32 tests - yet the 32 looked better to me. I don't use those.

These were my old settings.

--profile main10 --output-depth 10 --ctu 32 --bframes 8 --rc-lookahead 80 --scenecut 40 --ref 5 --limit-refs 0 --me 1 --merange 25 --subme 3 --no-rect --no-amp --limit-modes
--max-merge 4 --no-early-skip --b-intra
--no-sao --signhide --weightp --weightb --aq-mode 3 --aq-strength 0.8 --cutree --rd 4 --tu-intra-depth 3 --tu-inter-depth 3
--psy-rd 2.0 --psy-rdoq 2.0 --rdoq-level 2 --lookahead-slices 4 --qcomp 0.65
--no-strong-intra-smoothing --deblock -1:-1 --qg-size 32

Using those settings with my test clip, I'd average about 1.8 - 2.2 fps. With burfadel's modified settings, I average 3x faster encodes - >6.5 fps average.

I can't tell a difference between my old settings and the new ones. I use staxrip's video comparison and they are indistinguishable to me. I'd imagine a trained pro could pick out the changes but I can't.

Try burfadel's recommendation, --tu-inter-depth 1 --intra-depth 3. I couldn't tell a difference and gained .5 - .75 fps.

Also, I tested limit-refs a bit. 3 is obviously the fastest, with little to no discernable difference - to my eyes. 2 was slower than 1, 0 was the slowest of course. My guess is limiting the depth, 1, is faster than limiting the CU,2. I'm sticking with 3 for now, I may consider testing 1 further from a quality perspective.

Try these setting and let me know what you think. Change the crf to meet your ideal bitrate. Based on my testing I'd rather up crf then go with ctu 64...

--crf 19.75 --profile main10 --output-depth 10 --ctu 32 --bframes 6 --rc-lookahead 40 --scenecut 40 --ref 5 --limit-refs 3 --me 3
--merange 26 --subme 3 --no-rect --no-amp --limit-modes --max-merge 3 --early-skip --b-intra --no-sao --signhide --weightp --weightb --aq-mode 2 --aq-strength 1 --cutree --rd 4 --tu-intra-depth 3 --tu-inter-depth 1 --psy-rd 2
--psy-rdoq 1.38 --rdoq-level 2 --qcomp 0.65 --no-strong-intra-smoothing --deblock -1:-1 --qg-size 16

burfadel
2nd August 2016, 04:29
Yeah the settings --tskip --tskip-early --fast-intra are purely a speed consideration. If you want to balance out a slightly lower bitrate for comparison you could use --crf 21.7 for example :). Also try a small amount of inter and intra noise reduction, like 400, and make up the lower output size with a lower crf. Since I put 21.7 above, try 21.2. It's about trying to maximise efficiency.

burfadel
2nd August 2016, 06:33
I just did some further testing. When I originally did my testing with --t-skip, --t-skip-fast, and --fast-intra they were beneficial speed wise. However, x265 has undergone improvements since then and I am no longer seeing the speed increase. Things like recursion skip etc were added. Of course, a lot of that could be dependent on the source material.

Don't forget that any setting that changes the output can also affect the speed of the encode. For instance, even if --fast-intra does the processing of that particular area faster because the output has changed it affects other areas of the encode. Now that there have been changes in other areas such as --limit-modes, recursion skip, limit references etc., I found in some cases it was actually faster without the speed settings! It's all about synergy of settings. This synergy also applies to the noise reduction argument keeping in mind the noise reduction in x265 is very mild. I don't suggest using it by itself, for benefit you need to use it in conjunction with reducing the CRF. If you are using a CRF of 16 or something it probably wouldn't be worth it, but at a higher CRF it is. In the last lost of testing I did, without 400 noise reduction on intra and inter I got almost the same file size at 22 as I did with testing with both inter and intra NR on (400) at a CRF of 21.2. As I said, you don't have to go 400, you could test with 200 on both Inter and Intra, and a CRF of say, 21.6, or whatever the bitrate equivalency is, and work out the balance that best suits. I do believe though that you can achieve a higher output quality once you take into account the ability to use a lower CRF for a given bitrate.

I also testing without --early-skip, I only used one clip for this, but the speed drop was 21 percent. That is quite a lot so I don't recommend it, I only tried it out to see exactly how much slower it actually is.

If you are okay with the speed loss of --early-skip for quality, then you probably wouldn't mind --rd 5 and using --rd-refine. The picture quality is much nicer when you do a direct comparison, it retains picture texture very well, but is slow.

So, my new settings:
--crf 21.4 --output-depth 10 --rd 4 --tu-intra-depth 3 --rdoq-level 2 --early-skip --b-intra --limit-modes --aq-mode 2 --qg-size 16 --nr-intra 400 --nr-inter 400 --me star --merange 26 --max-merge 3 --weightb --bframes 6 --rc-lookahead 50 --ref 6 --psy-rdoq 1.38
(the decimal CRF is lowered from 22 from using 400 on the noise reduction).

If you want to try something that isn't practical speed wise, but gives a nicer picture for roughly the same file size (it retains flat areas well):
--crf 21.4 --output-depth 10 --rd 5 --rd-refine --tu-intra-depth 3 --rdoq-level 2 --early-skip --b-intra --limit-modes --aq-mode 2 --qg-size 16 --nr-intra 400 --nr-inter 400 --me star --merange 26 --max-merge 3 --weightb --bframes 6 --rc-lookahead 50 --ref 6 --psy-rdoq 1.38

If the development team could work out a fast method of achieving --rd 5 and --rd-refine without being too much slower than --rd 4, it would probably solve all these problems with flat textures etc that people are mentioning. The fast method I mean is like an early-skip pass or something. Even if it worked only half as well as --rd 5 and --rd-refine with say, a 10 percent speed loss, it would be an option I'd recommend straight away:). You need to compare the actual picture and not the metrics, I think you'll be surprised. First try comparing with the settings I have above, and then again with yours (but the second time use --rd 5 and --rd-refine). The reason for the dual testing is because it retains detail even on large flat areas much better, you really need to test it with the default settings for ctu and smoothing, that is not use --ctu 32 and --no-strong-intra-smoothing.

divxmaster
2nd August 2016, 08:33
@burfadel,
what are you using to compare videos, are you using stackhorz/stackvert or are you using staxrip video comparison tool?

Cheers
Divxmaster

burfadel
2nd August 2016, 09:05
I use the Staxrip video comparison tool, but I also just play it in Windows Media Player and compare them that way as well as it includes motion. I use WMP because if I used MPC-HC with the custom madVR settings I have it would throw out any comparison.

gamebox
2nd August 2016, 10:21
brumsky: I haven't encoded using main10 so far. My target playback hardware are some future STBs, TVs, built-in GPU decoders - namely, the cheapest decoding hardware available, and I'm concerned about compatibility. Besides, I don't have a powerful hardware, so speed matters to an extent, and sometimes the quality of my sources is not the best as well.

I use these:
--bframes 9, as I hadn't noticed significant reduction in speed over 6, so kept it for maximum coding efficiency
--ref 6
--rect, as it improved quality slightly without hurting speed too much, unlike amp which brings slight improvement (if any) at a very high cost
--no-sao, --no-strong-intra-smoothing, --deblock -2 (-3)
--early-skip brought obvious quality loss despite big gains in speed, so I discarded it in some of previous tests
--max-merge 2, --limit-modes, --limit-refs 3 --no-weightb, all chosen for speed

--qg-size 16 was accumulating artifacts in blocks containing "important" details to my eyes, so I discarded it to allow the encoder to reduce quality of largest CUs as well. However, I also use --no-cutree, as my encodes are of a kind where quality of the background is not that important, unlike areas of intense movement and changes in foreground. So, even options that reduce encoding efficiency, by discarding some of the textures in such areas, can seem "optimal" for me, as motion estimation tends to blur retained visual material. Precise motion estimation algorithms avoiding harsh and "blocky" look, and well defined edges - especially ones with lower contrast, mean most to me. Most encoders (x264 included) tend to "dissolve" less pronounced edges, so they almost become like gradients. Only well defined objects keep their outlines, while everything else becomes smoothed.

For comparing images I use "old school methods". Save frames for comparison as BMP using MPC (preferably B-frames), then open both in full screen in separate image viewer windows, and alternate between them.

K.i.N.G
3rd August 2016, 16:48
got around to testing sao in 2.0. same problem, still blurs way too much. and the no-sao test was *smaller* than the sao one! 1000 frames.

Same here...

kuchikirukia
4th August 2016, 06:46
--bframes 9, as I hadn't noticed significant reduction in speed over 6, so kept it for maximum coding efficiency

My experience with x264 is that b-frame usage falls off a cliff after 6 with live action. I find there's generally less than 1% compression gain to be found between that and 16. With anime, you can find a couple percent more up to 10 b-frames. If x265's b-frame calculations are similar in nature, you're looking at adding encoding time for pretty much no gain. With x264, b-frames aren't massively expensive, but they're not trivial, either.
So you might want to take a look at your logs to see what the b-frame usage actually is, and do some runs to see what the penalty is. I haven't done any work with x265 so I don't know if the increased computational complexity overshadows the b-frame computation or if it adds to them.

LigH
4th August 2016, 07:22
Long range B-frames also decrease compatibility with consumer players. The more consecutive B-frames may exist between I or P frames, the more frames the decoder will have to handle until a GOP is done, especially when B-frame pyramid and references among B-frames are used, that may go beyond hardware limits.

gamebox
4th August 2016, 10:19
@ kuchikirukia, Ligh:

In my logs, indeed, B-frame usage falls off sharply after 6 or 7. Over 6 I generally see only 1-2 %. I might reconsider that option soon.

I'm currently struggling with increased encoding time after I added --tu-intra/inter-depth 3. Quality did increase considerably, but encoding time seems to have suffered more than I previously estimated in tests. I'll try lower depths and different combinations, aiming to preserve most of the quality gain. --no-rskip also proved useful, but increased encoding time as well. I hope to offset the slowdown with new RAM planned for my system - 1866MHz DDR3. I'm temporarily using 1333MHz modules from previous configuration. CPU is AMD FX-8320, octo-core, AVX capable.

burfadel
4th August 2016, 10:54
I did some more tests, seems it is not only a nicer picture without SAO, but it is faster as others have found. This faster is by several percent and repeatable, so it's not a margin of error thing! SAO is probably not a bad thing in theory, just the internal smoothing amount is probably too high?

I did some further testing out of interest, regarding a couple of key things that directly affect output quality.
--ipratio 1.35
--pbratio 1.25
--bframe-bias 35

The ipratio (between I frames and P frames) by default is 1.40. I tried lowering this slightly to 1.35 meaning higher quality P frames. This of course affects the I frame and B frame bitrate as well, and you also have to remember because the I to P frame ratio is lower, the P to B amount also changes because the ratio of the P to B frame is based off the P frame's ratio to I.

I ended up with a smaller file as a result on the clip I used!... yes, when I thought about it, because of it affecting everything else this can happen.

I then thought I'd try upping the quality of the B frames a bit. Default is 1.30, I tried 1.25. I then thought about the b-frame usage beyond 6 frames, this is easily adjusted with using the --bframe-bias option. Yes, 35 is probably fairly high, but I did manage to be able to select a much higher number of bframes, I tested 9, and still got double digit b-frames at the 9th consecutive, whereas by default 6 seems to be the limit. Even if you still had 6, the b-frame usage would be higher. Of course, b-frames are lower quality, but if you adjust the ratio's and work it out properly :). By that I mean much less extreme changes. If the file size in the end is smaller than an encoded clip without these changes, if the drop is significant enough try dropping the CRF by 0.1.

So I didn't go into fully testing with these settings, but I believe small tweaking of these could help with some of the smoothing issues. The current ratios just seem to be pulled from x264, and there's nothing to say those ratio's were perfected either. That said, because this was pulled from x264, it's probably much more off in x265.

Now there's some testing for you to do! Try small variation for a start, and things like 1.20 for the ipratio probably wouldn't be helpful. I am referring more to say, between 1.34 and 1.42. Likewise with the pbratio. If the pbratio is adjusted then a slight b-frame bias can be applied since the b-frames will be higher quality.

So the fine tuning fun isn't over yet!