x265 HEVC Encoder [Archive] - Page 83

burfadel

4th August 2016, 12:35

To my liking, probably ipratio 1.38 and pbratio 1.28, along with the other settngs including --no-sao.

Do an encode with your normal settings, and one with the two options above. Let me know what you think :).

foxyshadis

5th August 2016, 09:35

SAO really seems more designed for low bitrate than high, it's tremendously useful when every bit counts, where banding is terrible, and entirely useless when you want grain. Even at medium bitrates (where you accept some degradation) it's very questionable, especially for 10+bit. My completely unsubstantiated guess is that x265's thresholds for SAO are just way too high, and since there are no knobs to turn, you have to either accept it or disable it.

It should be noted that SAO doesn't really get a lot of love, outside of big speedups and occasional bug fixes. The last noticeable change was 5 months ago (using the CU's QP instead of the whole slice's), which was relatively small, and before that, the last real algorithmic change was probably in the original HM that x265 was based on. Don't get me wrong, fixing incorrect output is important, but since development of SAO pretty much ended when HEVC was standardized, there's probably room for some improvement, or at least some tweaking. I wish I had time for that.

mandarinka

5th August 2016, 12:49

I hope to offset the slowdown with new RAM planned for my system - 1866MHz DDR3. I'm temporarily using 1333MHz modules from previous configuration. CPU is AMD FX-8320, octo-core, AVX capable.

Memory frequency is not likely to help encoding speed much if any. You might get some small boost if the latency (in nanoseconds) goes down, but still, encoding is not RAM-bound, it is almost solely CPU-bound type of task. The only way up is in faster CPU/OC.

I got to say that the point of using x265 is IMHO in getting better quality than from x264, and to fully realize that goal, you mostly have to turn most of the knobs into the slow territory. So IMHO chasing speed might be contra-productive...
you go into HEVC to get some compression/quality boost over x264, but then loose half of that benefit (if not more), by chasing speed? :)

burfadel

5th August 2016, 13:40

SAO really seems more designed for low bitrate than high, it's tremendously useful when every bit counts, where banding is terrible, and entirely useless when you want grain. Even at medium bitrates (where you accept some degradation) it's very questionable, especially for 10+bit. My completely unsubstantiated guess is that x265's thresholds for SAO are just way too high, and since there are no knobs to turn, you have to either accept it or disable it.

It should be noted that SAO doesn't really get a lot of love, outside of big speedups and occasional bug fixes. The last noticeable change was 5 months ago (using the CU's QP instead of the whole slice's), which was relatively small, and before that, the last real algorithmic change was probably in the original HM that x265 was based on. Don't get me wrong, fixing incorrect output is important, but since development of SAO pretty much ended when HEVC was standardized, there's probably room for some improvement, or at least some tweaking. I wish I had time for that.

I think the issue is it's probably set too strongly as well. Maybe it's probably too code complex as well, or it could be incorporated to some extent in other functions, and optimised using AVX etc for improved speed?

Motenai Yoda

6th August 2016, 01:56

My completely unsubstantiated guess is that x265's thresholds for SAO are just way too high, and since there are no knobs to turn, you have to either accept it or disable it.

maybe it will can be tuned with deblock ones

benwaggoner

7th August 2016, 18:27

@ kuchikirukia, Ligh:
In my logs, indeed, B-frame usage falls off sharply after 6 or 7. Over 6 I generally see only 1-2 %. I might reconsider that option soon.
High bitrates and higher grain retention tend to reduce the number of b-frames. For really low bitrates where grain isn't an issue, I see longer series of consecutive b-frames. For ultra low bitrates, I've tried 16, and seen >1% usage for that 16th. It obviously wasn't good looking video, but it was better looking video for that bitrate than with fewer b-frames.

Of course the use cases of "as good as possible within X bitrate" and "as a low a bitrate as possible with X quality" are quite different, and require different tunings. x265 has certainly been evolving so that stock settings are better at both cases (night and day compared to the fall 2014 builds!).

The first web video I ever did, back in late 1997 (I think), was something like 192x144p10 30 Kbps beta RealVideo. we have certainly come a LONG way from there. It was for the Peter Jacobson golf tournament, and the shaft of the club would vanish whenever it was swung - strong bias against high frequency diagonals! 320x240p24 @ 30 Kbps is leaps and bounds beyond was possible then. Although people's expectations have grown proportionally - analog 480i isn't the "broadcast" quality" Big Rock Candy Mountain.

benwaggoner

7th August 2016, 18:40

CPU is AMD FX-8320, octo-core, AVX capable.
AVX2 yield big improvements for x265. I think we're now seeing a >50% improvement going from Sandy Bridge to Haswell, due to AVX2 and also microarchitectural improvements.

For 1080p and below, my 4-core Skylake 6700 outperforms my 16-core Sandy Bridge workstation. That's AVX2 and microarchitecture for you. And it's nearly twice as fast as my 4-core Haswell "portable workstation" laptop which has AVX2 and the same cores.

I'm not sure how the fastest Skylake compares to the new Broadwell-based i7's, which are available with lots of cores. But per-core performance is really important. And for ultimate quality encodes, you can gain some quality by reducing frame parallelism (although it's not nearly as big a deal as six months ago), which also makes single-core performance relatively more important.

mandarinka

8th August 2016, 00:37

For 1080p and below, my 4-core Skylake 6700 outperforms my 16-core Sandy Bridge workstation. That's AVX2 and microarchitecture for you.

You should probably stress more that it's because (a) the 16core would not nearly be utilizing all cores well or at all and (b) higher frequency on the Skylake. Because you know internet, people are going to misinterpret :)
The per-GHz and per-core performance of course haven't improved 4x,I'd roughly guess the IPC improved 25% for x264 and maybe up to 40% for x265 (not that sure there).

http://www.anandtech.com/bench/product/1554?vs=287
Note that while both chips have the same base clock, i7-6700 probably gets much higher real frequency due to turbo, which will be activating more aggressive bins than on the old 32nm and quite hot-running i7-2600K.

aymanalz

8th August 2016, 07:49

AVX2 yield big improvements for x265. I think we're now seeing a >50% improvement going from Sandy Bridge to Haswell, due to AVX2 and also microarchitectural improvements.

For 1080p and below, my 4-core Skylake 6700 outperforms my 16-core Sandy Bridge workstation. That's AVX2 and microarchitecture for you. And it's nearly twice as fast as my 4-core Haswell "portable workstation" laptop which has AVX2 and the same cores.

I have experienced the huge leap in performance with Haswell, as compared to Ivy bridge, on my two 4-core laptops. That is probably attributable to AVX2. But what explains the twice-as-fast performance increase of your Skylake over Haswell? Is it, as the poster above postulates, due to much higher frequency on the Skylake? Was the Skylake a desktop CPU, as opposed to the "Mobile workstation's" Haswell? That would obviously make a big difference. Or are there architectural improvements in Skylake that produce this result?

gamebox

8th August 2016, 09:40

@benwaggoner
Ever since I did my first video encode (back in 2001) I tend to chose "borderline" bitrates for each technology. I chase that thin line where bitrate is "just sufficient" for "everyday quality" video and picture is not far from obviously falling apart. For HEVC, I've found that bitrate to be approximately 1,2 Mbps for 480p video, 2 Mbps for 720p.

FX-8320 was my only choice for the moment. At 90eur secondhand (30 for mobo) it was far cheaper than anything comparable a monopolist rival makes. AVX (it's resulting speedup, to be precise) was also an "enabling" feature of this processor, making difference between video encoding that was just (barely) "possible" and encoding whose speed comes closer to meeting real-life needs. I've left behind two AMD non-AVX quads which proved useless for (quality) HEVC encoding, despite my optimism. One of mobos in those systems supports early bulldozers, and I already drew a plan to buy another cheap AVX CPU to split encoding load. Electricity consumption is an issue, but secondary, as most of my video encoding takes place in off-peak electricity interval, and I plan all my systems having that amount of utilization in mind (if I used my current CPU 24/7, I wouldn't need another system).

kuchikirukia

8th August 2016, 11:22

High bitrates and higher grain retention tend to reduce the number of b-frames. For really low bitrates where grain isn't an issue, I see longer series of consecutive b-frames. For ultra low bitrates, I've tried 16, and seen >1% usage for that 16th.

Yup, but that 1% doesn't correspond to 1% greater compression, so while it's good to keep in mind that lower quality might mean more b-frames and to run a test to see if it might be worth it to push it out 1 or 2 more, going to 16 is generally going to be pretty far down the list of sensible places to spend CPU time.

mandarinka

8th August 2016, 12:12

AVX (it's resulting speedup, to be precise) was also an "enabling" feature of this processor, making difference between video encoding that was just (barely) "possible" and encoding whose speed comes closer to meeting real-life needs. I've left behind two AMD non-AVX quads which proved useless for (quality) HEVC encoding, despite my optimism. One of mobos in those systems supports early bulldozers, and I already drew a plan to buy another cheap AVX CPU to split encoding load.

You don't really need AVX, which is mainly set for floating point data. The pitfall with K10 ("non-AVX") AMD chips is not lack of AVX, but lack of SSSE3 and SSE4.1/SSE4.2, which x265 absolutely needs, because that is the target its hand-written assembly SIMD uses.

K10 like Phenoms still only had SSE2 as its SIMD instruction extension, and x265 only has small fraction of its SIMD covered for SSE2 target. For that reason, the speed of x265 on K10 is cut to about 1/3 of its theoretical possibilities. This also affects other chips that lack SSE4, like 65nm Core 2 duos/quads and Celerons/Pentiums before Sandy Bridge.

gamebox

8th August 2016, 17:14

Thanks, Mandarinka, I didn't know that :)

I liked AMD K10 core's performance in x264, so I thought four of those, working together, would create a fairly decent data crunching "machinery" for x265. Truth is that, with my settings, an overclocked K10 quad working at 3,2GHz (RAM at nearly 1500 MHz), with nearly 100% CPU load, takes as much as 70 hours to encode a 1,5 hour long SD video! About 5 times longer than FX-8320, although encoding SD content occupies about 70% of that CPU.

brumsky

8th August 2016, 17:21

@ kuchikirukia, Ligh:

In my logs, indeed, B-frame usage falls off sharply after 6 or 7. Over 6 I generally see only 1-2 %. I might reconsider that option soon.

I'm currently struggling with increased encoding time after I added --tu-intra/inter-depth 3. Quality did increase considerably, but encoding time seems to have suffered more than I previously estimated in tests. I'll try lower depths and different combinations, aiming to preserve most of the quality gain. --no-rskip also proved useful, but increased encoding time as well. I hope to offset the slowdown with new RAM planned for my system - 1866MHz DDR3. I'm temporarily using 1333MHz modules from previous configuration. CPU is AMD FX-8320, octo-core, AVX capable.

--tu-inter-depth adds a descent amount of encoding time. Also, I'd suggest allowing rskip. You get better grain retention with psy-rd settings compared to most other changes. I'd suggest increasing psy-rd and psy-rdoq levels.

On another note:

I've been toying with --early-skip --limit-modes and --rect --amp. If I'm reading the x265 docs correctly, early skip only kicks in when no motion or not enough motion is detected. Once it detects enough motion it processes normally. I added rect and only noticed about a .5 fps decrease, rect + amp sees a total decrease of about 1.5 fps when early-skip and limit-modes are used together.

The idea being use early skip so you don't waste time when there isn't enough or no motion at all. Then allow the encoder to spend more time when motion is detected by using rect and/or amp.

Also, note that rect implies an additional tu-inter-depth automatically. I believe that means using --tu-inter-depth 1 with early-skip will prevent a lot of wasted time when no motion is detected. Then when motion is detected, rect kicks in and gives you an effective --tu-inter-depth 2 only in the spots that need it. Limit-modes just allows rect and amp to be more selective in its tests.

Here are my current settings.

--crf 20 --profile main10 --output-depth 10 --ctu 32 --bframes 6 --rc-lookahead 40 --scenecut 40 --ref 5 --limit-refs 3 --me 3 --merange 26 --subme 3 --rect --no-amp
--limit-modes --max-merge 3 --early-skip --b-intra --no-sao --signhide --weightp --weightb --aq-mode 2 --aq-strength 1 --cutree --rd 4 --tu-intra-depth 3
--tu-inter-depth 1 --psy-rd 1 --psy-rdoq 1.28 --rdoq-level 2 --qcomp 0.65 --no-strong-intra-smoothing --deblock -1:-1 --qg-size 16

burfadel

8th August 2016, 17:44

Try adding:
--ipratio 1.38 --pbratio 1.28

It might seem a slight adjustment, but it does seem to be nicer. Adjusting these too much won't be beneficial.

x265_Project

8th August 2016, 18:00

I have experienced the huge leap in performance with Haswell, as compared to Ivy bridge, on my two 4-core laptops. That is probably attributable to AVX2. But what explains the twice-as-fast performance increase of your Skylake over Haswell? Is it, as the poster above postulates, due to much higher frequency on the Skylake? Was the Skylake a desktop CPU, as opposed to the "Mobile workstation's" Haswell? That would obviously make a big difference. Or are there architectural improvements in Skylake that produce this result?

A key improvement in Skylake is the internal ring bus, which has double the memory bandwidth of the previous CPU generation (Haswell/Broadwell). The internal ring moves data inside the CPU, from cache memory to logic units in each core, and back again. Our performance profiling shows that x265 performance is often constrained by the memory bandwidth of Haswell generation cores which is about 20 GB/sec. Skylake cores have about 40 GB/sec bandwidth, removing this bottleneck.

Skylake CPUs have a number of other architectural improvements that they reviewed in a presentation (SPCS001) at the Intel Developer Forum last year. See http://www.overclock.net/t/1570069/idf-pdf-skylake-microarchitecture-details for copies of some key slides...

Segment optimization
• Dedicated server and client IP configurations
Improved front-end
• Higher capacity, improved Branch Predictor
• Wider Instruction supply with deeper buffers
• Faster prefetch
Deeper Out-of-Order buffers
• Extract more instruction parallelism
Improved execution units
• Shorter latencies
• More units
• Power down when not in use
More load/store bandwidth
• Prefetcher improvements
• Deeper store buffer, fill buffer and write-back buffer
• Improved page miss handling
• Better L2 cache miss bandwidth
• New instructions for better cache management
Improved Hyper-Threading
• Wider retirement

aymanalz

9th August 2016, 11:01

A key improvement in Skylake is the internal ring bus, which has double the memory bandwidth of the previous CPU generation (Haswell/Broadwell). The internal ring moves data inside the CPU, from cache memory to logic units in each core, and back again. Our performance profiling shows that x265 performance is often constrained by the memory bandwidth of Haswell generation cores which is about 20 GB/sec. Skylake cores have about 40 GB/sec bandwidth, removing this bottleneck.

Ah I see, thanks for the info. I may have erred in buying a Haswell machine recently, instead of going for Skylake.

Could you tell us which is a more exhaustive motion search, umh or star?

mandarinka

9th August 2016, 14:31

Ah I see, thanks for the info. I may have erred in buying a Haswell machine recently, instead of going for Skylake.

Local improvements like that don't have nearly as huge impact in the overall performance of the chip - just look at the benchmarks.

IIRC according to reviews, Core i7-6700K was beating i7-4790K (comparable chip, both have base clock of 4,0 GHz) by just 10 % in video encoding. Which was a relatively good result, because outside encoding, the per-MHz/per-core improvements were lower, just 2-5% IIRC.

So basically Skylake brings improvements, but if you have Haswell already, you don't need to cry - the cores are close to each other: http://www.anandtech.com/bench/product/1260?vs=1543

benwaggoner

9th August 2016, 16:05

Local improvements like that don't have nearly as huge impact in the overall performance of the chip - just look at the benchmarks.

IIRC according to reviews, Core i7-6700K was beating i7-4790K (comparable chip, both have base clock of 4,0 GHz) by just 10 % in video encoding. Which was a relatively good result, because outside encoding, the per-MHz/per-core improvements were lower, just 2-5% IIRC.

So basically Skylake brings improvements, but if you have Haswell already, you don't need to cry - the cores are close to each other: http://www.anandtech.com/bench/product/1260?vs=1543

One other big difference is you don't get nearly as much thermal throttling when heavily making use of AVX2 like with earlier microarchitectures. Since x265 uses AVX2 so extensively, you almost always got a significant clock speed drop pre-Skylake.

I suspect x265 is probably an extreme case for per-core performance gains with Skylake. Which has been generally true for x264/x265 for most new microarchitecture revisions, honestly.

Sent from my iPhone using Tapatalk

pradeeprama

9th August 2016, 16:37

One other big difference is you don't get nearly as much thermal throttling when heavily making use of AVX2 like with earlier microarchitectures. Since x265 uses AVX2 so extensively, you almost always got a significant clock speed drop pre-Skylake.

I suspect x265 is probably an extreme case for per-core performance gains with Skylake. Which has been generally true for x264/x265 for most new microarchitecture revisions, honestly.

Sent from my iPhone using Tapatalk

Yes, you are right. We are seeing performance improvements of approximately 2X when encoding 4K videos on a quad-core Skylake system, when compared to a quad-core Haswell system. The bandwidth improvements inside the cache and the ability of Skylake to support DDR4 is hugely beneficial to x265 that really stresses the system's performance knobs.

mandarinka

9th August 2016, 17:56

Those are the fast and faster+ presets, I assume?

x265_Project

9th August 2016, 19:01

Those are the fast and faster+ presets, I assume?

For 4K encoding on a quad-core desktop, we see a memory bandwidth bottleneck on Haswell processors with our fastest presets (ultrafast, superfast). Under this condition, Skylake can outperform Haswell by more than 2x.

mandarinka

10th August 2016, 00:05

I guess that makes sense. My "10%" (with older versions of x265 though) figure was for normal slow encoding where stuff is just CPU execution-bound. Since that is what matters to me...

gamebox

10th August 2016, 16:24

@brumsky:
Yesterday I've decided to put --ctu 32 back to use again, at least for SD. Quantizers did increase slightly (when comparing stats files), but I started having issues with 64x64 blocks containing letters or anything sharp - they leave big areas, often geometrically shaped (square), filled with mosquito noise. Apart from that, areas with less details (like human skin) get encoded with rather obvious alternating smooth (big) and sharp (smaller) blocks - though that difference in sharpness could be AQ-related as well. CTUs of 32 helped me a lot with parallelism and saturated all my 8 cores, so 2hrs long 480p video gets encoded with 3.5-4 fps instead of 2 fps.

I'll follow your advice and test --early-skip again in some time. I did most of my image quality tests "the fast and dirty way" - i.e. setting bitrate in 2-pass mode, doing full fast first pass on both videos, and encoding just a hundred or so frames in second passes, so I get some material for comparison as soon as possible. I expect more "final" and "trusty" results by comparing in CRF mode.

brumsky

10th August 2016, 18:01

@x265_project:

Can you go into more detail regarding --early-skip and how it interacts with rect and amp, if at all?

I'd like to know if my logic below is applicable or not...

--tu-inter-depth adds a descent amount of encoding time. Also, I'd suggest allowing rskip. You get better grain retention with psy-rd settings compared to most other changes. I'd suggest increasing psy-rd and psy-rdoq levels.

On another note:

I've been toying with --early-skip --limit-modes and --rect --amp. If I'm reading the x265 docs correctly, early skip only kicks in when no motion or not enough motion is detected. Once it detects enough motion it processes normally. I added rect and only noticed about a .5 fps decrease, rect + amp sees a total decrease of about 1.5 fps when early-skip and limit-modes are used together.

The idea being use early skip so you don't waste time when there isn't enough or no motion at all. Then allow the encoder to spend more time when motion is detected by using rect and/or amp.

Also, note that rect implies an additional tu-inter-depth automatically. I believe that means using --tu-inter-depth 1 with early-skip will prevent a lot of wasted time when no motion is detected. Then when motion is detected, rect kicks in and gives you an effective --tu-inter-depth 2 only in the spots that need it. Limit-modes just allows rect and amp to be more selective in its tests.

Here are my current settings.

--crf 20 --profile main10 --output-depth 10 --ctu 32 --bframes 6 --rc-lookahead 40 --scenecut 40 --ref 5 --limit-refs 3 --me 3 --merange 26 --subme 3 --rect --no-amp
--limit-modes --max-merge 3 --early-skip --b-intra --no-sao --signhide --weightp --weightb --aq-mode 2 --aq-strength 1 --cutree --rd 4 --tu-intra-depth 3
--tu-inter-depth 1 --psy-rd 1 --psy-rdoq 1.28 --rdoq-level 2 --qcomp 0.65 --no-strong-intra-smoothing --deblock -1:-1 --qg-size 16

benwaggoner

10th August 2016, 18:35

@x265_project:

Can you go into more detail regarding --early-skip and how it interacts with rect and amp, if at all?

I'd like to know if my logic below is applicable or not...
In general, features in the presets are turned on based on whether their quality/performance is right for a given speed/quality tradeoff, based on a whole lot of automated testing. In general it probably makes sense to follow those ladders when tuning in speed/quality. For example, some of the skip modes are turned off at pretty fast presets (quality hit is big relative to speed increase) and others only at the slowest presets (smaller quality gain relative to larger performance hit).

benwaggoner

10th August 2016, 18:39

@brumsky:
Yesterday I've decided to put --ctu 32 back to use again, at least for SD. Quantizers did increase slightly (when comparing stats files), but I started having issues with 64x64 blocks containing letters or anything sharp - they leave big areas, often geometrically shaped (square), filled with mosquito noise. Apart from that, areas with less details (like human skin) get encoded with rather obvious alternating smooth (big) and sharp (smaller) blocks - though that difference in sharpness could be AQ-related as well. CTUs of 32 helped me a lot with parallelism and saturated all my 8 cores, so 2hrs long 480p video gets encoded with 3.5-4 fps instead of 2 fps.

I'll follow your advice and test --early-skip again in some time. I did most of my image quality tests "the fast and dirty way" - i.e. setting bitrate in 2-pass mode, doing full fast first pass on both videos, and encoding just a hundred or so frames in second passes, so I get some material for comparison as soon as possible. I expect more "final" and "trusty" results by comparing in CRF mode.
What's your full command line? Are you using --qg-size? Did you increment the --tu-*-depth values by one? If you increase CU size, you'll need to correspondingly increase depth to make sure your minimum block size remains the same. That can help a lot in getting high detail regions to use smaller blocks.

With the new improved --rskip, the performance isn't as bad when doing my tu depth, and the results are a lot better.

brumsky

10th August 2016, 19:31

Hey Ben,

Thanks for the quick reply. Since I am running an old sandy bridge CPU I'm just trying to find the best balance between speed and quality. I guess I'm hoping that early skip is only used when no motion or very little motion is detected vs the majority of the time. If so areas that aren't in motion wouldn't be worth the encoding time.

In general, features in the presets are turned on based on whether their quality/performance is right for a given speed/quality tradeoff, based on a whole lot of automated testing. In general it probably makes sense to follow those ladders when tuning in speed/quality. For example, some of the skip modes are turned off at pretty fast presets (quality hit is big relative to speed increase) and others only at the slowest presets (smaller quality gain relative to larger performance hit).

brumsky

10th August 2016, 19:35

On a different note.

How does everyone else handle video with intentional noise, for example BSG or The Walking Dead? I've found those to have very high bitrates even with the below settings. I've seen it spike to 22000 Kbps! I'd like to target between 2000 - 4000 on average but I don't want to force it across the board by using 2 pass. If a scene needs a little extra that's fine but 10,000 to 20,000 Kbps is to high for my likes...

--crf 20 --profile main10 --output-depth 10 --ctu 32 --bframes 6 --rc-lookahead 40 --scenecut 40 --ref 5 --limit-refs 3 --me 3 --merange 26 --subme 3 --rect --no-amp
--limit-modes --max-merge 3 --early-skip --b-intra --no-sao --signhide --weightp --weightb --aq-mode 2 --aq-strength 1 --cutree --rd 4 --tu-intra-depth 3
--tu-inter-depth 1 --psy-rd 1 --psy-rdoq 1.28 --rdoq-level 2 --qcomp 0.65 --no-strong-intra-smoothing --deblock -1:-1 --qg-size 16

gamebox

11th August 2016, 00:02

@benwaggoner

I encode using "Simple x264 Launcher" by Mulder. 64-bit 8-bit encoder, slower preset, and this commandline (some switches are redundant):

--rd 6 --no-weightb --ref 6 --me star --subme 7 --merange 24 --b-intra --keyint 300 --min-keyint 50 --bframes 9 --aq-mode 2 --psy-rd 2.0 --psy-rdoq 1.0 --deblock=-2 --no-cutree --limit-refs 3 --limit-modes --rect --no-amp --aq-strength 2 --no-sao --no-strong-intra-smoothing --max-merge 2 --no-rskip --no-slow-firstpass --tu-inter-depth 3 --tu-intra-depth 3

I eliminated qg-size switch, as I wanted the encoder to alter quantizers in all types of blocks. Areas with background scenery are not that important in this case - in a typical Hollywood movie, or a documentary, background/static blocks would matter and I would use --cutree, and probably --qg-size 16 too. My previous encodes (if I remember well) were at tu-intra/inter-depth 2, enforced by preset.

burfadel

11th August 2016, 05:57

Don't forget to try:
--ipratio 1.38
--pbratio 1.28

Since you are using --rd 6, also try --rd-refine

I think to make rd 5/6 worth it you need to combine it with --rd-refine.

youli

11th August 2016, 06:28

x265 log:
x265 [info]: HEVC encoder version 2.0+10-5a0e139e2938
x265 [info]: build info [Windows][GCC 5.3.0][64 bit] 10bit
x265 [info]: Main 10 profile, Level-5 (High tier)
x265 [info]: Thread pool created using 8 threads
x265 [info]: frame threads / pool features : 3 / wpp(68 rows)
x265 [info]: Coding QT: max CU size, min CU size : 32 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge : umh / 25 / 7 / 2
x265 [info]: Keyframe min / max / scenecut : 23 / 250 / 40
x265 [info]: Lookahead / bframes / badapt : 40 / 6 / 2
x265 [info]: b-pyramid / weightp / weightb : 1 / 0 / 0
x265 [info]: References / ref-limit cu / depth : 1 / off / off
x265 [info]: AQ: mode / str / qg-size / cu-tree : 3 / 0.5 / 32 / 1
x265 [info]: Rate Control / qCompress : CRF-23.0 / 0.80
x265 [info]: VBV/HRD buffer / max-rate / init : 100000 / 100000 / 0.900
x265 [info]: tools: rd=3 psy-rd=1.50 rdoq=1 psy-rdoq=2.50 early-skip tmvp
x265 [info]: tools: fast-intra lslices=6

x265 [info]: frame I: 2296, Avg QP:22.68 kb/s: 36715.43
x265 [info]: frame P: 45621, Avg QP:23.68 kb/s: 26433.02
x265 [info]: frame B: 170292, Avg QP:24.34 kb/s: 17242.93
x265 [info]: consecutive B-frames: 13.7% 5.6% 5.6% 23.9% 9.6% 21.4% 20.2%

encoded 218209 frames in 111542.06s (1.96 fps), 19369.20 kb/s, Avg QP:24.19
Encoding finished 10.08.2016 21:08:00,73
Settings:
--crf 23 --preset ultrafast --level-idc 5 --high-tier --me umh --subme 7 --scenecut 40
--aq-mode 3 --aq-strength 0.5 --no-sao --no-deblock --rd 3 --psy-rd 1.5
--b-adapt 2 --ctu 32 --min-cu-size 8 --rc-lookahead 40 --bframes 6 --merange 25
--ipratio 1.1 --pbratio 1.0 --qcomp 0.8 --rdoq-level 1 --psy-rdoq 2.5
--lookahead-slices 6 --qpstep 1 --no-strong-intra-smoothing --no-rskip

Source BD3D (left view) and x265 OverUnder:
http://s019.radikal.ru/i642/1608/80/9cd78a657fb1t.jpg (http://s019.radikal.ru/i642/1608/80/9cd78a657fb1.png) http://s019.radikal.ru/i643/1608/22/ef076ea1fb87t.jpg (http://s019.radikal.ru/i643/1608/22/ef076ea1fb87.png)
http://s020.radikal.ru/i722/1608/1c/019b8591e6e3t.jpg (http://s020.radikal.ru/i722/1608/1c/019b8591e6e3.png) http://s012.radikal.ru/i319/1608/d2/4b42916fbbc4t.jpg (http://s012.radikal.ru/i319/1608/d2/4b42916fbbc4.png)
http://s019.radikal.ru/i618/1608/df/59b00b489d20t.jpg (http://s019.radikal.ru/i618/1608/df/59b00b489d20.png) http://i053.radikal.ru/1608/06/d520ba9c69f8t.jpg (http://i053.radikal.ru/1608/06/d520ba9c69f8.png)
http://s019.radikal.ru/i635/1608/9f/83df1ea811c3t.jpg (http://s019.radikal.ru/i635/1608/9f/83df1ea811c3.png) http://i069.radikal.ru/1608/8a/b48141b93c1bt.jpg (http://i069.radikal.ru/1608/8a/b48141b93c1b.png)
http://s020.radikal.ru/i723/1608/a1/a296b21f95d0t.jpg (http://s020.radikal.ru/i723/1608/a1/a296b21f95d0.png) http://s018.radikal.ru/i505/1608/a2/aff566e08019t.jpg (http://s018.radikal.ru/i505/1608/a2/aff566e08019.png)

RainyDog

11th August 2016, 08:39

--crf 23 --preset ultrafast --level-idc 5 --high-tier --me umh --subme 7 --scenecut 40
--aq-mode 3 --aq-strength 0.5 --no-sao --no-deblock --rd 3 --psy-rd 1.5
--b-adapt 2 --ctu 32 --min-cu-size 8 --rc-lookahead 40 --bframes 6 --merange 25
--ipratio 1.1 --pbratio 1.0 --qcomp 0.8 --rdoq-level 1 --psy-rdoq 2.5
--lookahead-slices 6 --qpstep 1 --no-strong-intra-smoothing --no-rskip

Impressive results based on the comparisons. I usually encode at CRF 23 with qcomp 0.8 too as I find that higher CRF values (22-25) combined with higher qcomp gives better results than lower CRF (say 19-22) and qcomp 0.6 with most content.

But note that --psy-rdoq doesn't work with --rd 3. You need RD level 4 and above for psy rdoq to kick in.

burfadel

11th August 2016, 08:59

I believe changing the ipratio and bpratio too much isn't really beneficial when using CRF mode. That's why I recommended 1.28 and 1.38 respectively as the results were seemingly 'better' (cautiiously using that word on here!), however choosing anything much lower than that actually made things worse. Because it does affect everything in the encode, the small changes of 0.02 do make a difference.

RainyDog

11th August 2016, 09:39

On a different note.

How does everyone else handle video with intentional noise, for example BSG or The Walking Dead? I've found those to have very high bitrates even with the below settings. I've seen it spike to 22000 Kbps! I'd like to target between 2000 - 4000 on average but I don't want to force it across the board by using 2 pass. If a scene needs a little extra that's fine but 10,000 to 20,000 Kbps is to high for my likes...

--crf 20 --profile main10 --output-depth 10 --ctu 32 --bframes 6 --rc-lookahead 40 --scenecut 40 --ref 5 --limit-refs 3 --me 3 --merange 26 --subme 3 --rect --no-amp
--limit-modes --max-merge 3 --early-skip --b-intra --no-sao --signhide --weightp --weightb --aq-mode 2 --aq-strength 1 --cutree --rd 4 --tu-intra-depth 3
--tu-inter-depth 1 --psy-rd 1 --psy-rdoq 1.28 --rdoq-level 2 --qcomp 0.65 --no-strong-intra-smoothing --deblock -1:-1 --qg-size 16

Increase CRF. Not all content needs the same CRF value for 'equal' quality, far from it.

I'm not afraid to go up to CRF 25 or even 26 for extremely grainy content, though this is with qcomp 0.8 so in your case probably 23 or 24 at default qcomp. But even then you'll still end up with overall bitrates of 6-7mbps on occasions unless you do some pre denoising. Though personally I've always found the results of denoising undesirable whenever I've tried it, be it externally or just using x264/5's built in --nr commands.

What I do is test a couple of 5 min segments from each film first to find the suitable CRF value. Which can be anywhere from CRF 20-22 for bitrates of 2-3mbps or CRF 25-26 for bitrates of up to 6-7mbps. The average target I'm happy with is CRF 23 with qcomp 0.8 at about 4mbps so that's where I always start. If tests come in around that then away we go with the full encode. But if CRF 23 is giving me too low (say 1-1.5mbps or less) or too high (7mbps+) a bitrate then I'll adjust along the CRF scale from there.

Is obviously more effort and time but worth it for me. I also quite enjoy doing it too so... :)

youli

11th August 2016, 13:51

But note that --psy-rdoq doesn't work with --rd 3. You need RD level 4 and above for psy rdoq to kick in.

I'm not sure...

--rd 3 and above need for work --psy-rd (http://x265.readthedocs.io/en/default/cli.html#cmdoption--psy-rd)

and

--rdoq-level is 1 or 2 need for work --psy-rdoq (http://x265.readthedocs.io/en/default/cli.html#cmdoption--psy-rdoq)

I think rd and rdoq are independent methods. Or not?

youli

11th August 2016, 15:48

Source BD3D (left view) and x265 OverUnder:
http://s017.radikal.ru/i412/1608/fa/3f426035ee29t.jpg (http://s017.radikal.ru/i412/1608/fa/3f426035ee29.png) http://s019.radikal.ru/i628/1608/54/a848aa509ceft.jpg (http://s019.radikal.ru/i628/1608/54/a848aa509cef.png)
http://s019.radikal.ru/i608/1608/c9/413e05b5a5d7t.jpg (http://s019.radikal.ru/i608/1608/c9/413e05b5a5d7.png) http://s017.radikal.ru/i430/1608/64/2b9894ea5133t.jpg (http://s017.radikal.ru/i430/1608/64/2b9894ea5133.png)
http://s008.radikal.ru/i304/1608/e2/518734f53638t.jpg (http://s008.radikal.ru/i304/1608/e2/518734f53638.png) http://s020.radikal.ru/i700/1608/96/8629436f2d56t.jpg (http://s020.radikal.ru/i700/1608/96/8629436f2d56.png)
http://s014.radikal.ru/i327/1608/32/c909fb5cf3f9t.jpg (http://s014.radikal.ru/i327/1608/32/c909fb5cf3f9.png) http://s018.radikal.ru/i520/1608/42/c05f5a4658f2t.jpg (http://s018.radikal.ru/i520/1608/42/c05f5a4658f2.png)
http://s013.radikal.ru/i325/1608/53/945196b7b48et.jpg (http://s013.radikal.ru/i325/1608/53/945196b7b48e.png) http://s018.radikal.ru/i505/1608/d2/ff31ab6a803ft.jpg (http://s018.radikal.ru/i505/1608/d2/ff31ab6a803f.png)
http://s019.radikal.ru/i641/1608/5c/b4103d9816f2t.jpg (http://s019.radikal.ru/i641/1608/5c/b4103d9816f2.png) http://s013.radikal.ru/i324/1608/d1/6daa59a8c2f1t.jpg (http://s013.radikal.ru/i324/1608/d1/6daa59a8c2f1.png)

Settings: same as my sample above (http://forum.doom9.org/showthread.php?p=1776813#post1776813), just --psy-rdoq up from 2.5 to 3.0.

x265 log:
x265 [info]: HEVC encoder version 2.0+10-5a0e139e2938
x265 [info]: build info [Windows][GCC 5.3.0][64 bit] 10bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX
x265 [warning]: Specifying a decoder level with constant rate factor rate-contro
l requires
x265 [warning]: enabling VBV with vbv-bufsize=100000kb vbv-maxrate=100000kbps. V
BV outputs are non-deterministic!
x265 [info]: Main 10 profile, Level-5 (High tier)
x265 [info]: Thread pool created using 8 threads
x265 [info]: frame threads / pool features : 3 / wpp(68 rows)
x265 [info]: Coding QT: max CU size, min CU size : 32 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge : umh / 25 / 7 / 2
x265 [info]: Keyframe min / max / scenecut : 23 / 250 / 40
x265 [info]: Lookahead / bframes / badapt : 40 / 6 / 2
x265 [info]: b-pyramid / weightp / weightb : 1 / 0 / 0
x265 [info]: References / ref-limit cu / depth : 1 / off / off
x265 [info]: AQ: mode / str / qg-size / cu-tree : 3 / 0.5 / 32 / 1
x265 [info]: Rate Control / qCompress : CRF-23.0 / 0.80
x265 [info]: VBV/HRD buffer / max-rate / init : 100000 / 100000 / 0.900
x265 [info]: tools: rd=3 psy-rd=1.50 rdoq=1 psy-rdoq=3.00 early-skip tmvp
x265 [info]: tools: fast-intra lslices=6

x265 [info]: frame I: 1478, Avg QP:22.63 kb/s: 46945.52
x265 [info]: frame P: 26825, Avg QP:23.66 kb/s: 23953.38
x265 [info]: frame B: 89359, Avg QP:24.56 kb/s: 11553.90
x265 [info]: consecutive B-frames: 13.5% 4.1% 8.4% 35.4% 11.0% 20.4% 7.0%

encoded 117662 frames in 47406.91s (2.48 fps), 14825.35 kb/s, Avg QP:24.33
Encoding finished 11.08.2016 11:29:31,81

brumsky

11th August 2016, 18:18

Source BD3D (left view) and x265 OverUnder:
http://s017.radikal.ru/i412/1608/fa/3f426035ee29t.jpg (http://s017.radikal.ru/i412/1608/fa/3f426035ee29.png) http://s019.radikal.ru/i628/1608/54/a848aa509ceft.jpg (http://s019.radikal.ru/i628/1608/54/a848aa509cef.png)
http://s019.radikal.ru/i608/1608/c9/413e05b5a5d7t.jpg (http://s019.radikal.ru/i608/1608/c9/413e05b5a5d7.png) http://s017.radikal.ru/i430/1608/64/2b9894ea5133t.jpg (http://s017.radikal.ru/i430/1608/64/2b9894ea5133.png)
http://s008.radikal.ru/i304/1608/e2/518734f53638t.jpg (http://s008.radikal.ru/i304/1608/e2/518734f53638.png) http://s020.radikal.ru/i700/1608/96/8629436f2d56t.jpg (http://s020.radikal.ru/i700/1608/96/8629436f2d56.png)
http://s014.radikal.ru/i327/1608/32/c909fb5cf3f9t.jpg (http://s014.radikal.ru/i327/1608/32/c909fb5cf3f9.png) http://s018.radikal.ru/i520/1608/42/c05f5a4658f2t.jpg (http://s018.radikal.ru/i520/1608/42/c05f5a4658f2.png)
http://s013.radikal.ru/i325/1608/53/945196b7b48et.jpg (http://s013.radikal.ru/i325/1608/53/945196b7b48e.png) http://s018.radikal.ru/i505/1608/d2/ff31ab6a803ft.jpg (http://s018.radikal.ru/i505/1608/d2/ff31ab6a803f.png)
http://s019.radikal.ru/i641/1608/5c/b4103d9816f2t.jpg (http://s019.radikal.ru/i641/1608/5c/b4103d9816f2.png) http://s013.radikal.ru/i324/1608/d1/6daa59a8c2f1t.jpg (http://s013.radikal.ru/i324/1608/d1/6daa59a8c2f1.png)

Settings: same as my sample above (http://forum.doom9.org/showthread.php?p=1776813#post1776813), just --psy-rdoq up from 2.5 to 3.0.

x265 log:
x265 [info]: HEVC encoder version 2.0+10-5a0e139e2938
x265 [info]: build info [Windows][GCC 5.3.0][64 bit] 10bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX
x265 [warning]: Specifying a decoder level with constant rate factor rate-contro
l requires
x265 [warning]: enabling VBV with vbv-bufsize=100000kb vbv-maxrate=100000kbps. V
BV outputs are non-deterministic!
x265 [info]: Main 10 profile, Level-5 (High tier)
x265 [info]: Thread pool created using 8 threads
x265 [info]: frame threads / pool features : 3 / wpp(68 rows)
x265 [info]: Coding QT: max CU size, min CU size : 32 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge : umh / 25 / 7 / 2
x265 [info]: Keyframe min / max / scenecut : 23 / 250 / 40
x265 [info]: Lookahead / bframes / badapt : 40 / 6 / 2
x265 [info]: b-pyramid / weightp / weightb : 1 / 0 / 0
x265 [info]: References / ref-limit cu / depth : 1 / off / off
x265 [info]: AQ: mode / str / qg-size / cu-tree : 3 / 0.5 / 32 / 1
x265 [info]: Rate Control / qCompress : CRF-23.0 / 0.80
x265 [info]: VBV/HRD buffer / max-rate / init : 100000 / 100000 / 0.900
x265 [info]: tools: rd=3 psy-rd=1.50 rdoq=1 psy-rdoq=3.00 early-skip tmvp
x265 [info]: tools: fast-intra lslices=6

x265 [info]: frame I: 1478, Avg QP:22.63 kb/s: 46945.52
x265 [info]: frame P: 26825, Avg QP:23.66 kb/s: 23953.38
x265 [info]: frame B: 89359, Avg QP:24.56 kb/s: 11553.90
x265 [info]: consecutive B-frames: 13.5% 4.1% 8.4% 35.4% 11.0% 20.4% 7.0%

encoded 117662 frames in 47406.91s (2.48 fps), 14825.35 kb/s, Avg QP:24.33
Encoding finished 11.08.2016 11:29:31,81

What's the point of using VBV with such high values? With a value of 100000 Kbps it'll almost never kick in. Plus when it does it using an extreme denoise to bring the bitrate down and blurs the shit out of it. I tried this with several test clips, it looks like shit when it kicks in. My VBV value was significantly less then yours on purpose, 4096 - 4608. My test clips are intentional noisy.

Your encodes are not even coming close to the limit you have set.

They do look great by the way!

Also, do you think subme 7 is necessary? the Placebo preset only goes up to 5. I've also read that me-range star is optimized more than umh...

brumsky

11th August 2016, 18:21

Increase CRF. Not all content needs the same CRF value for 'equal' quality, far from it.

I'm not afraid to go up to CRF 25 or even 26 for extremely grainy content, though this is with qcomp 0.8 so in your case probably 23 or 24 at default qcomp. But even then you'll still end up with overall bitrates of 6-7mbps on occasions unless you do some pre denoising. Though personally I've always found the results of denoising undesirable whenever I've tried it, be it externally or just using x264/5's built in --nr commands.

What I do is test a couple of 5 min segments from each film first to find the suitable CRF value. Which can be anywhere from CRF 20-22 for bitrates of 2-3mbps or CRF 25-26 for bitrates of up to 6-7mbps. The average target I'm happy with is CRF 23 with qcomp 0.8 at about 4mbps so that's where I always start. If tests come in around that then away we go with the full encode. But if CRF 23 is giving me too low (say 1-1.5mbps or less) or too high (7mbps+) a bitrate then I'll adjust along the CRF scale from there.

Is obviously more effort and time but worth it for me. I also quite enjoy doing it too so... :)

Thanks Rainydog, that's what I've been doing. Just thought there might be another way.

I did mess with VBV and the results look like shit. Seriously, blurry and very ugly! I'd much rather let the bitrate spike then enable VBV haha!

RainyDog

11th August 2016, 19:08

I'm not sure...

--rd 3 and above need for work --psy-rd (http://x265.readthedocs.io/en/default/cli.html#cmdoption--psy-rd)

and

--rdoq-level is 1 or 2 need for work --psy-rdoq (http://x265.readthedocs.io/en/default/cli.html#cmdoption--psy-rdoq)

I think rd and rdoq are independent methods. Or not?

Sorry, I wasn't entirely clear. It's RD=4 or above that you need to set for --psy-rdoq to be active. At RD=3 or below psy-rdoq is disabled even if you set a value for it.

RainyDog

11th August 2016, 19:12

Thanks Rainydog, that's what I've been doing. Just thought there might be another way.

I did mess with VBV and the results look like shit. Seriously, blurry and very ugly! I'd much rather let the bitrate spike then enable VBV haha!

Well I set a vbv maxrate and buffer size too brumsky but can't say I've ever noticed any nasty looking frames because of it. I've seen bitrates spike way higher than the source at 1080p with qcomp 0.8 otherwise.

youli

11th August 2016, 20:02

@brumsky
Encoder enabling VBV automatically if level-idc and tier has been specified. I have nothing to do with this.
UMH is a very good choice for x264, so maybe for x265 too :)
STAR works slowly just like ESA or TESA for x264.
For x264 I always use subme 11, for x265 only 7.. why not? probably it is the compensation for "no-weightp" and "no-weightb".

Sorry, I wasn't entirely clear. It's RD=4 or above that you need to set for --psy-rdoq to be active. At RD=3 or below psy-rdoq is disabled even if you set a value for it.
Where can i read about it? thanks.

LigH

11th August 2016, 20:31

@ brumsky:

What's the point of a space wasting fullquote (including all images)? Please try to reduce the quoted part to the relevant part...

But the warning starts one line above. CRF combined with a decoder level would require a VBV limitation which may not be present in the command line. So it appears to be not specifically optimized for consumer players.

Motenai Yoda

11th August 2016, 21:48

Sorry, I wasn't entirely clear. It's RD=4 or above that you need to set for --psy-rdoq to be active. At RD=3 or below psy-rdoq is disabled even if you set a value for it.

as the umpteenth time NO rd 4 is the SAME as rd 3
rdoq is disabled (rdoq-level 0) on any preset lower than slow and enabled (rdoq-level 2) on slow and higher

burfadel

12th August 2016, 05:44

Does it work off the base profile people select, or as a result of another setting? With just the settings I listed, in fact even with much fewer settings it works. Based on what you just said it would suggest x265 is broken for allowing it and having it work effectively!

brumsky

12th August 2016, 16:01

Well I set a vbv maxrate and buffer size too brumsky but can't say I've ever noticed any nasty looking frames because of it. I've seen bitrates spike way higher than the source at 1080p with qcomp 0.8 otherwise.

That would make sense given how high you have VBV set. Try just for shits and giggles setting it to something like 10000.

This is from the x265 docs.

Note that when VBV is enabled (with a valid --vbv-bufsize), VBV emergency denoising is turned on. This will turn on aggressive denoising at the frame level when frame QP > QP_MAX_SPEC (51), drastically reducing bitrate and allowing ratecontrol to assign lower QPs for the following frames. The visual effect is blurring, but removes significant blocking/displacement artifacts.

qtwigg

12th August 2016, 18:30

Hello everyone.

Has any one been experiencing off colours?
Seems to have been happening since 2.0

Problem 1
Source
https://is04.ezphotoshare.com/2016/08/09/cgeqe.png
Encode
https://is04.ezphotoshare.com/2016/08/09/cXV1D.png

Problem 2
Source
https://cdn.discordapp.com/attachments/213375923263635456/213547910661406720/Source_2.png
Encode
https://cdn.discordapp.com/attachments/213375923263635456/213548512997015552/aq3_deblock.mkv.0000.png
Other methods tried
Encode
https://cdn.discordapp.com/attachments/213375923263635456/213549774794653706/Main422-intra.mkv.0000.png
Encode
https://cdn.discordapp.com/attachments/213375923263635456/213550171194261504/Merange.mkv.0000.png
Encode
https://cdn.discordapp.com/attachments/213375923263635456/213563397797707776/main444-10.mkv.0000.png

Problem 3
Source
https://is03.ezphotoshare.com/2016/08/11/i1z2n.png
Encode
https://is04.ezphotoshare.com/2016/08/11/i10A1.png

The following info pertains to problem 2 only, prob 1 and 3 have very much similar settings, but we focused on problem 2 mainly.
This is the basic Command line we used :
--ctu 32 --max-tu-size 16 --weightb --lookahead-slices 0 --preset veryslow --crf 19 --profile main10 --rd 5 --limit-refs 3 --no-rect --no-amp --no-limit-modes --aq-mode 3 --subme 5 --me 3 --bframes 8 --rc-lookahead 80 --ref 6 --no-strong-intra-smoothing --no-constrained-intra --high-tier --rdoq-level 1 --psy-rd 0.8 --psy-rdoq 7.0 --no-sao --deblock -1:-1 --qcomp 0.7 --cbqpoffs -3 --crqpoffs -3

The Variations I've tried go like this, I used the same above template of settings and then tried various variations of each setting one a time... :

aq-mode 2,3 (found 3 to be better for overall detail)
psyrdoq 2,5,7
psy-rd 0.5,0.8,1.1
cb and cr offsets 0:0,-3:-3,-4:-4,-6:-6
merange 57,80,120 (slight improvement in 80 for the color thing)
profiles main10,main422-10-intra,main422-10,main444-10,mian444-10-intra (some of it was better in 422 chroma subsampling)
deblock -1:-1,-2:-2,-3:-3

Thinking maybe it was the AviSynth Filter's fault I tried DirectShowSource, FFMS2 and LAV but the problem persisted still.

The main issue : In certain regions of the encodes the Greens are stronger than the source and the Reds and Blues are lighter than the source... Leading to certain dark shades to become brighter infecting even the details in some regions....

:)

LigH

12th August 2016, 19:37

The usual "first guess without looking" will probably be a wrong colorimetry (Rec.601 vs. Rec.709)... not sure if they should be explicitly flagged with CLI options.

BTW, showing such huge images inline hurts the readability of your post, thumbnails with links to the originals would have been sufficient.

littlepox

12th August 2016, 20:15

How did you convert these video to RGB screenshots?

Jamaika

12th August 2016, 20:21

@littlepox normally
ffmpeg.exe -i "image%%003d.png" -f yuv4mpegpipe -s 1920x1080 -c:v wrapped_avframe -pix_fmt yuv420p - | x265 ...
ffmpeg.exe -i input.mts -f image2 -c:v png -ss 00:00:00.xxx -frames:v 1 -pix_fmt rgb24 "image001.png"
ffmpeg.exe -i input.mts -f image2 -c:v png -ss 00:00:00.xxx -t 00:00:00.yyy -frames:v 100 -pix_fmt rgb24 "image%%003d.png"

The main issue : In certain regions of the encodes the Greens are stronger than the source and the Reds and Blues are lighter than the source... Leading to certain dark shades to become brighter infecting even the details in some regions....
Sources are slightly brighter.
In Example 3 is difficult to determine. Frames disagree. There is a considerable loss of sharpness on the headscarf. Colormatrix on the appearance of images on the PC isn't affected because it is "undef". Unfortunately this erroneous way of conceiving. Change to bt709 because it is a standard. Otherwise it is more yellow.

Edit:
When we convert using ffmpeg with rgb24 to yuv420p is always changing color_range. The film is faded. To align the gamma must be -vf scale=out_color_matrix=bt709:out_range=full:flags=lanczos+accurate_rnd+full_chroma_int+full_chroma_inp